How to Pull Chicago Crime Data With Sodapy

Getting Started — Sign Up

Michael Shoemaker
6 min readSep 2, 2022

Chicago Data Portal has a lot of great, publicly available data sets for anyone to use. While you can simply go to the website and download these, there is a more precise way to do so which can be especially useful when creating a data pipeline. Let’s jump right in.

First you will need to go to the Chicago Data Portal and create an account. https://data.cityofchicago.org/ Again, this is not necessary if you simply want to download the data manually.

Click Sign In in the upper right hand corner. Then click Sign Up on the next page.

You will need to fill out some simple information to create your account.

Creating a SODA API Token

Once you are able to login head over to https://data.cityofchicago.org/profile/edit/developer_settings

For Sodapy we will need to create an App Token which will be used in our application. Simply click Create App Token

Note: You can use sodapy without a token. However, it will slow down the rate at which you are able to download data.

Only the first two fields need to be filled out in order to create the token. If the Application Name is already in use though you will receive an error like the one shown below.

Once you have your token created you should see it listed on the page. You just need the App Token and do not need to worry about the secret.

Sodapy Python Code

First make sure you have sodapy installed which you can check by running:

pip freeze|grep sodapy

or

pip3 freeze|grep sodapy

Depending on how you have pip installed

In my case the output shows

If you do not see it listed simply install it with pip install sodapy or pip3 install sodapy. You will also need pandas and can repeat the same process just covered to make sure it is installed.

The Chicago Data Portal has great examples for downloading their data sets when you click on the API button and go to API Docs on any data set.

Scroll down to Code Snippets and choose the Python Pandas Tab.

Now simply copy and paste this code into a .py file. Below I removed the comments to make it a bit more readable.

import pandas as pd
from sodapy import Socrata
client = Socrata("data.cityofchicago.org", None)results = client.get("ijzp-q8t2", limit=2000)result_df = pd.DataFrame.from_records(results)

You are free to use it as is above, but since you have an App Token you can speed up the results by replacing the None argument above with your app token. Also note, that the “ijzp-q8t2” is simply the dataset at data.cityofchicago.org. If you navigate to a different dataset, you simple need to change that part in the code above.

For example if you wanted to retrieve building permits this would just need to be updated to “ydr8–5enu”

Since we are focusing on the crime dataset in this article we can be a bit more granular in the results we request. I haven’t found a lot of documentation on querying with SODA, but this should be enough to get you started. https://dev.socrata.com/docs/queries/query.html

From what I’ve seen and found in experimenting is that you can pass a query to the API Call. You can break it up into parts such as select, where etc. or you can create a query and pass it to the parameter “query”. This is pretty much SQL, but is called SoQL “Socrata Query Language”. The only difference I see is that you do not need a FROM clause because it knows the data based on where you are sending the API call to.

Here is an example of a more granular SoQL query using sodapy:

client = Socrata("data.cityofchicago.org", <your token here>)query = """SELECT date \
,primary_type \
,latitude \
,longitude \
WHERE community_area='13' and Date>='2021-01-01' \
limit 20000000"""
results = client.get("ijzp-q8t2", query=query)result_df = pd.DataFrame.from_records(results)

Breaking down the above it:

  1. Define the client using your token
  2. Create a SoQL query string (I added limit 20000000 because I noticed the amount of results returned were limited and if you pass a query to the get function you can not also use the limit parameter)
  3. Make the call to the API passing the query as a parameter and store the results in a variable called “results”
  4. Results are returned as a list of dictionaries which can be converted to a dataframe by using Pandas DataFrame.from_records function

That’s it for now. I hope you’ve enjoyed. As always, if I got anything wrong or if you have any questions, feel free to let me know in the comments.

Bonus

Now that we’ve come this far let’s have a little bit of fun. Lets visualize crime in community area 13 between June 10th and June 12th 2022.

We’ll modify our query like so:

client = Socrata(“data.cityofchicago.org”, token)

query = “””SELECT \
,latitude \
,longitude \
WHERE community_area=’13' \
and Date >’2022–06–10' \
and Date < ‘2022–06–12
limit 20000000"””
results = client.get(“ijzp-q8t2”, query=query)
result_df = pd.DataFrame.from_records(results)
#Make sure there aren't any blank values
result_df.dropna(axis=0,how='any',subset=['latitude','longitude'],inplace=True)

And then steal…..I mean borrowing the below code from this StackOverflow Post. https://stackoverflow.com/questions/39401729/plot-latitude-longitude-points-from-dataframe-on-folium-map-ipython

import folium#create a map
this_map = folium.Map(prefer_canvas=True)
def plotDot(point):
folium.CircleMarker(location=[point.latitude, point.longitude],
radius=2,
weight=5).add_to(this_map)
#use df.apply(,axis=1) to "iterate" through every row in your dataframe
df[['latitude','longitude']].apply(plotDot, axis = 1)
#Set the zoom to the maximum possible
this_map.fit_bounds(this_map.get_bounds())
#Save the map to an HTML file
this_map.save('simple_dot_plot.html')
this_map

You should see something like this:

You can play around with adding different colors depending on the crime (primary type) or maybe automate screen shots to make a GIF using Selenium and Chrome Driver. Hopefully this got the creative juices going for you or helped you get through a hiccup in your code.

Thanks for reading. :-)

--

--