Dataism: Assigning a 0-100 Percentile Score to Police Depts.

In this exercise, we walk through how to rank police departments on a scale from 0-100 based on the fraction of the population in their jurisdiction that is arrested for misdemeanors.

The best department, the one with the lowest frequency of misdemeanor arrests, gets a score of 100.

This can be interpreted as a measure of the extent to which the department engages in 'broken windows' policing i.e. targeting minor crimes like vandalism and public drinking. The theory is that this creates an atmosphere of order and lawfulness, thereby preventing more serious crimes. However, evidence shows that this kind of policing reinforces systemic racial bias in the criminal justice system. (see 'Weapons of Math Destruction' for a breakdown of this topic)

Assignment: Read through the following steps slowly. The code is all written, so you just have to run it and see what it does. If you wanna code something yourself, try the challenge exercise as the end.

Important: You can use the same methods in this exercise to create 0-100 scores for all sorts of things. For example, an activist group might want to

  • Grade countries on their ecological footprints, or

  • Score companies on gender/racial bias in hiring practices...

At least while society is easily convinced by such metrics, these could be effective ways to draw attention ;)

1. Import Python Libraries

First we import the Python libraries we need. The 'as pd' and 'as np' allow us to write 'pd' and 'np' in our code instead of having to write 'pandas' and 'numpy' all the time. (Remember, you can run a cell by highlighting it and pressing 'shift' + 'enter')

pip install xlrd
2.7s
Bash in Python
import pandas as pd
import numpy as np
import xlrd
0.6s
Python

2. Import the Police Scorecord Data

Now we import the data from its URL location on the course github.

Note: You can't just use the URL that you get when you click on the filename in Github. You have to right-click on 'View Raw' and select 'Copy Link Location.'

Police Scorecard Source Data.xlsx
## Here we define a variable named 'data_url' (we can call it whatever we want)
## and set it equal to the URL of the data file.
data_url = 
Police Scorecard Source Data.xlsx
## Now we use the pandas read_excel() function to import the data using the URL.
## You can also pass it the directly location of a data file, but this is easiest for the exercise.
police_scorecard_data = pd.read_excel(data_url)
## We can also substitute 'police_scorecard_data' for whatever name we want.
## Formally, the police_score_card variable now refers to a 'dataframe' object.
## Print the first five rows so you can see how it looks.
police_scorecard_data.head(5)
3.0s
Python

3. Extracting the data about misdemeanor arrests and population.

The above data table has many more columns then we need. We are only interested in

  • Agency Name

  • Total Population of Jurisdiction

  • Total Misdemeanor Arrests (MACR 2016)

To clean things up, we create a new dataframe and name it misdemeanor_data, containing only what we're interested in.

## Use the filter function to select only the columns we care about and group
## them into a new dataframe.
misdemeanor_data = police_scorecard_data.filter(['Agency Name', 'Total Population of Jurisdiction', 'Total Misdemeanor Arrests (MACR 2016)'], axis =1)
misdemeanor_data.head(5)
0.2s
Python
Agency NameTotal Population of JurisdictionTotal Misdemeanor Arrests (MACR 2016)
0Alameda Police Department79654894.0
1Alhambra Police Department86475642.0
2Anaheim Police Department3548915803.0
3Antioch Police Department1126302235.0
4Bakersfield Police Department37974117643.0
5 items

4. Reindex the rows by column name

Currently the rows in our data table are indexed by the numbers 0, 1, 2 etc. These numbers aren't really connected to the data, and it's more convenient for us to index the rows by Agency Name.

misdemeanor_data = misdemeanor_data.set_index('Agency Name')
misdemeanor_data.head(5)
0.3s
Python
Agency NameTotal Population of JurisdictionTotal Misdemeanor Arrests (MACR 2016)
Alameda Police Department79654894.0
Alhambra Police Department86475642.0
Anaheim Police Department3548915803.0
Antioch Police Department1126302235.0
Bakersfield Police Department37974117643.0
5 items

5. Compute the frequency of misdemeanor arrests

Here we compute the percentage of the population in a jurisdiction that is arrested for misdemeanor. This is given by

formula not implemented

For example, if Exampleton Police Department arrested 100 people for misdemeanors and has a population of 10,000, then % Misdemeanor Arrests = 100/10,000 = 1%.

## First we extract the column contain populations.
population = misdemeanor_data['Total Population of Jurisdiction']
population.head(5)
0.5s
Python
Alameda Police Department79654
Alhambra Police Department86475
Anaheim Police Department354891
Antioch Police Department112630
Bakersfield Police Department379741
4 items
## Then the column containing
misdemeanor_arrests = misdemeanor_data['Total Misdemeanor Arrests (MACR 2016)']
misdemeanor_arrests.head(5)
0.2s
Python
Alameda Police Department894.0
Alhambra Police Department642.0
Anaheim Police Department5803.0
Antioch Police Department2235.0
Bakersfield Police Department17643.0
4 items
## Now we add a new column to misdemeanor_data and set it equal to
## misdemeanor_arrests/population as in the equation on the start of this
## section. We multiply this by 100, to convert the result into a percentage.
misdemeanor_data['% Misdemeanor Arrests'] =  100*misdemeanor_arrests / population
misdemeanor_data.head(5)
0.3s
Python
Agency NameTotal Population of JurisdictionTotal Misdemeanor Arrests (MACR 2016)% Misdemeanor Arrests
Alameda Police Department79654894.01.1223541818364426
Alhambra Police Department86475642.00.7424111014744146
Anaheim Police Department3548915803.01.6351499474486533
Antioch Police Department1126302235.01.9843736127141969
Bakersfield Police Department37974117643.04.646061394476762
5 items

The % Misdemeanor Arrests column shows how many out of every 1000 people were arrested for misdemeanors in that district.

6. Computing Percentiles

Now we want to rank the police departments based on the percentage of misdemeanor arrests using percentiles. If a department gets a score of 95, that means it's better then 95% of all CA police departments.

To compute this, you just have to order the departments from best (smallest % misdemeanor arrests) to worst (largest % misdemeanor arrests).

Suppose there are 50 departments in total. The first department on the list is the best and gets a percentile score of 100. The second on the list is the same or better than 49/50 = 98% other departments and gets a score of 98. The next on the list gets 48/50 = 96 and so on...

NOTE: You have to modify this if you have departments with the same % misdemeanor arrests (they won't be in the same position on the list, but they should get the same score). This is usually unlikely, but the idea is the same in any case.

## First reorder the departments from best to worst
misdemeanor_data = misdemeanor_data.sort_values('% Misdemeanor Arrests')
misdemeanor_data.head(5)
0.2s
Python
Agency NameTotal Population of JurisdictionTotal Misdemeanor Arrests (MACR 2016)% Misdemeanor Arrests
Monterey County Sheriff's Department99533439.00.4410597490279606
San Jose Police Department10557785741.00.5437696182341364
Richmond Police Department111493661.00.5928623321643511
Elk Grove Police Department1691531031.00.60950736906824
Marin County Sheriff's Department101302637.00.6288128566069772
5 items
## Now we create an empty column that will later be filled with the percentile
## scores.
## the numpy nan function fills the column with NaN = 'not a number'
misdemeanor_data['Misdemeanor Percentile'] = np.nan 
misdemeanor_data.head(5)
0.3s
Python
Agency NameTotal Population of JurisdictionTotal Misdemeanor Arrests (MACR 2016)% Misdemeanor ArrestsMisdemeanor Percentile
Monterey County Sheriff's Department99533439.00.4410597490279606
San Jose Police Department10557785741.00.5437696182341364
Richmond Police Department111493661.00.5928623321643511
Elk Grove Police Department1691531031.00.60950736906824
Marin County Sheriff's Department101302637.00.6288128566069772
5 items
## We define a new variable which gives the total number of police department
number_of_departments = len(misdemeanor_data.index)
## print the value
print('Total number of police departments:', number_of_departments)
## The following for-loop is a simple way to do the procedure explained in the
## beginning of the section. For each department it computes the percentage
## of departments that are lower (including itself) on the list.
row_index = 0 # This is the starting position on the list. We increase it by 1
# each time the loop runs to move to keep track of where we are on the list.
for agency_name in misdemeanor_data.index: ## misdemeanor_data.index is the list
# of Agency Names ordered from best to worst. The for-loop goes over every item 
# in the list. We called these items agency_name, but you could name them whatever
# you want.
    # the code belonging to the for-loop is indented
    agency_percentile = (1 - row_index/number_of_departments)*100 # This calculates
    # the percentile. Work out the first few on paper to convince yourself.
    misdemeanor_data.at[agency_name, 'Misdemeanor Percentile'] = agency_percentile
    # This inserts the value into our table
    row_index = row_index + 1
    # We add 1 to the row_index to go to the next position on the list.
    # row_index = 0 on the first iteration, then 1,2,3, etc. all the way up to
    # number_of_departments - 1.
# Once the for-loop is over, we unindent and show the results
misdemeanor_data
0.9s
Python
Total number of police departments: 158

NOTE: The data for San Franscisco Sheriff's department in incomplete, which led to it being sorted at the bottom of the list. It's score doesn't mean anything.

NOTE: There are fancier and faster ways than using a for-loop, but this is the most basic.

CHALLENGE:

Do the same procedure Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset) and Total Arrests, 2016-2018 columns of the police_scorecard_data that we imported at the beginning of the notebook.

Here we want to compute the percentage of deadly force incidents per arrest for each department and rank them accordingly.

NOTE: You might have to reimport the data by rerunning the cells at the beginning of the notebook.

Filter and index:

  • Agency Name

  • Total Arrests, 2016-2018

  • Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset)

deadly_force = police_scorecard_data.filter(['Agency Name', 'Total Arrests, 2016-2018', 'Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset)'], axis=1)
deadly_force.set_index('Agency Name', inplace=True)
deadly_force.rename(columns={'Total Arrests, 2016-2018': 'Total Arrests', 'Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset)': 'Total Deadly Force'}, inplace=True)
deadly_force.head(5)
0.2s
Python
Agency NameTotal ArrestsTotal Deadly Force
Alameda Police Department35811
Alhambra Police Department39113
Anaheim Police Department2820017
Antioch Police Department1011510
Bakersfield Police Department5133767
5 items

Calculate what percentage of arrests involved deadly force, sort by mortality incidence rate, and drop San Francisco County Sheriff's Department because the data is incomplete.

deadly_force['Percent Deadly'] = deadly_force['Total Deadly Force']/deadly_force['Total Arrests'] * 100 
deadly_force.sort_values('Percent Deadly', inplace=True)
deadly_force.drop(index='San Francisco County Sheriff\'s Department', inplace=True)
deadly_force
0.7s
Python
deadly_force['Percent Deadly Percentile'] = deadly_force['Percent Deadly'].rank(pct=True,method='min')
deadly_force.tail(5)
#deadly_force
0.3s
Python
Agency NameTotal ArrestsTotal Deadly ForcePercent DeadlyPercent Deadly Percentile
Riverside County Sheriff's Department12084360.297914597815292950.9745222929936306
Modoc County Sheriff's Department63620.31446540880503150.9808917197452229
Orange County Sheriff's Department10833420.38770423705344780.9872611464968153
Tehama County Sheriff's Department229390.392498909725250760.9936305732484076
Richmond Police Department4864230.472861842105263161.0
5 items

SUPER CHALLENGE:

Do it with a dataset of your choice.

Runtimes (1)