David Schmudde / Nov 18 2020

Dataism: Assigning a 0-100 Percentile Score to Police Depts.

In this exercise, we walk through how to rank police departments on a scale from 0-100 based on the fraction of the population in their jurisdiction that is arrested for misdemeanors.

The best department, the one with the lowest frequency of misdemeanor arrests, gets a score of 100.

This can be interpreted as a measure of the extent to which the department engages in 'broken windows' policing i.e. targeting minor crimes like vandalism and public drinking. The theory is that this creates an atmosphere of order and lawfulness, thereby preventing more serious crimes. However, evidence shows that this kind of policing reinforces systemic racial bias in the criminal justice system. (see 'Weapons of Math Destruction' for a breakdown of this topic)

Assignment: Read through the following steps slowly. The code is all written, so you just have to run it and see what it does. If you wanna code something yourself, try the challenge exercise as the end.

Important: You can use the same methods in this exercise to create 0-100 scores for all sorts of things. For example, an activist group might want to

Grade countries on their ecological footprints, or
Score companies on gender/racial bias in hiring practices...

At least while society is easily convinced by such metrics, these could be effective ways to draw attention ;)

1. Import Python Libraries

First we import the Python libraries we need. The 'as pd' and 'as np' allow us to write 'pd' and 'np' in our code instead of having to write 'pandas' and 'numpy' all the time. (Remember, you can run a cell by highlighting it and pressing 'shift' + 'enter')

pip install xlrd

2.7s

Bash in Python

import pandas as pdimport numpy as npimport xlrd

0.6s

Python

2. Import the Police Scorecord Data

Now we import the data from its URL location on the course github.

Note: You can't just use the URL that you get when you click on the filename in Github. You have to right-click on 'View Raw' and select 'Copy Link Location.'

Police Scorecard Source Data.xlsx

## Here we define a variable named 'data_url' (we can call it whatever we want)## and set it equal to the URL of the data file.data_url = Police Scorecard Source Data.xlsx
## Now we use the pandas read_excel() function to import the data using the URL.## You can also pass it the directly location of a data file, but this is easiest for the exercise.police_scorecard_data = pd.read_excel(data_url)## We can also substitute 'police_scorecard_data' for whatever name we want.## Formally, the police_score_card variable now refers to a 'dataframe' object.## Print the first five rows so you can see how it looks.police_scorecard_data.head(5)

3.0s

Python

3. Extracting the data about misdemeanor arrests and population.

The above data table has many more columns then we need. We are only interested in

Agency Name
Total Population of Jurisdiction
Total Misdemeanor Arrests (MACR 2016)

To clean things up, we create a new dataframe and name it misdemeanor_data, containing only what we're interested in.

## Use the filter function to select only the columns we care about and group## them into a new dataframe.misdemeanor_data = police_scorecard_data.filter(['Agency Name', 'Total Population of Jurisdiction', 'Total Misdemeanor Arrests (MACR 2016)'], axis =1)misdemeanor_data.head(5)

0.2s

Python

	Agency Name	Total Population of Jurisdiction	Total Misdemeanor Arrests (MACR 2016)
0	Alameda Police Department	79654	894.0
1	Alhambra Police Department	86475	642.0
2	Anaheim Police Department	354891	5803.0
3	Antioch Police Department	112630	2235.0
4	Bakersfield Police Department	379741	17643.0

5 items

4. Reindex the rows by column name

Currently the rows in our data table are indexed by the numbers 0, 1, 2 etc. These numbers aren't really connected to the data, and it's more convenient for us to index the rows by Agency Name.

misdemeanor_data = misdemeanor_data.set_index('Agency Name')misdemeanor_data.head(5)

0.3s

Python

Agency Name	Total Population of Jurisdiction	Total Misdemeanor Arrests (MACR 2016)
Alameda Police Department	79654	894.0
Alhambra Police Department	86475	642.0
Anaheim Police Department	354891	5803.0
Antioch Police Department	112630	2235.0
Bakersfield Police Department	379741	17643.0

5 items

5. Compute the frequency of misdemeanor arrests

Here we compute the percentage of the population in a jurisdiction that is arrested for misdemeanor. This is given by

formula not implemented

For example, if Exampleton Police Department arrested 100 people for misdemeanors and has a population of 10,000, then % Misdemeanor Arrests = 100/10,000 = 1%.

## First we extract the column contain populations.population = misdemeanor_data['Total Population of Jurisdiction']population.head(5)

0.5s

Python

Alameda Police Department	79654
Alhambra Police Department	86475
Anaheim Police Department	354891
Antioch Police Department	112630
Bakersfield Police Department	379741

4 items

## Then the column containingmisdemeanor_arrests = misdemeanor_data['Total Misdemeanor Arrests (MACR 2016)']misdemeanor_arrests.head(5)

0.2s

Python

Alameda Police Department	894.0
Alhambra Police Department	642.0
Anaheim Police Department	5803.0
Antioch Police Department	2235.0
Bakersfield Police Department	17643.0

4 items

## Now we add a new column to misdemeanor_data and set it equal to## misdemeanor_arrests/population as in the equation on the start of this## section. We multiply this by 100, to convert the result into a percentage.misdemeanor_data['% Misdemeanor Arrests'] =  100*misdemeanor_arrests / populationmisdemeanor_data.head(5)

0.3s

Python

Agency Name	Total Population of Jurisdiction	Total Misdemeanor Arrests (MACR 2016)	% Misdemeanor Arrests
Alameda Police Department	79654	894.0	1.1223541818364426
Alhambra Police Department	86475	642.0	0.7424111014744146
Anaheim Police Department	354891	5803.0	1.6351499474486533
Antioch Police Department	112630	2235.0	1.9843736127141969
Bakersfield Police Department	379741	17643.0	4.646061394476762

5 items

The % Misdemeanor Arrests column shows how many out of every 1000 people were arrested for misdemeanors in that district.

6. Computing Percentiles

Now we want to rank the police departments based on the percentage of misdemeanor arrests using percentiles. If a department gets a score of 95, that means it's better then 95% of all CA police departments.

To compute this, you just have to order the departments from best (smallest % misdemeanor arrests) to worst (largest % misdemeanor arrests).

Suppose there are 50 departments in total. The first department on the list is the best and gets a percentile score of 100. The second on the list is the same or better than 49/50 = 98% other departments and gets a score of 98. The next on the list gets 48/50 = 96 and so on...

NOTE: You have to modify this if you have departments with the same % misdemeanor arrests (they won't be in the same position on the list, but they should get the same score). This is usually unlikely, but the idea is the same in any case.

## First reorder the departments from best to worstmisdemeanor_data = misdemeanor_data.sort_values('% Misdemeanor Arrests')misdemeanor_data.head(5)

0.2s

Python

Agency Name	Total Population of Jurisdiction	Total Misdemeanor Arrests (MACR 2016)	% Misdemeanor Arrests
Monterey County Sheriff's Department	99533	439.0	0.4410597490279606
San Jose Police Department	1055778	5741.0	0.5437696182341364
Richmond Police Department	111493	661.0	0.5928623321643511
Elk Grove Police Department	169153	1031.0	0.60950736906824
Marin County Sheriff's Department	101302	637.0	0.6288128566069772

5 items

## Now we create an empty column that will later be filled with the percentile## scores.## the numpy nan function fills the column with NaN = 'not a number'misdemeanor_data['Misdemeanor Percentile'] = np.nan misdemeanor_data.head(5)

0.3s

Python

Agency Name	Total Population of Jurisdiction	Total Misdemeanor Arrests (MACR 2016)	% Misdemeanor Arrests
Monterey County Sheriff's Department	99533	439.0	0.4410597490279606
San Jose Police Department	1055778	5741.0	0.5437696182341364
Richmond Police Department	111493	661.0	0.5928623321643511
Elk Grove Police Department	169153	1031.0	0.60950736906824
Marin County Sheriff's Department	101302	637.0	0.6288128566069772

5 items

## We define a new variable which gives the total number of police departmentnumber_of_departments = len(misdemeanor_data.index)## print the valueprint('Total number of police departments:', number_of_departments)## The following for-loop is a simple way to do the procedure explained in the## beginning of the section. For each department it computes the percentage## of departments that are lower (including itself) on the list.row_index = 0 # This is the starting position on the list. We increase it by 1# each time the loop runs to move to keep track of where we are on the list.for agency_name in misdemeanor_data.index: ## misdemeanor_data.index is the list# of Agency Names ordered from best to worst. The for-loop goes over every item # in the list. We called these items agency_name, but you could name them whatever# you want.    # the code belonging to the for-loop is indented    agency_percentile = (1 - row_index/number_of_departments)*100 # This calculates    # the percentile. Work out the first few on paper to convince yourself.    misdemeanor_data.at[agency_name, 'Misdemeanor Percentile'] = agency_percentile    # This inserts the value into our table    row_index = row_index + 1    # We add 1 to the row_index to go to the next position on the list.    # row_index = 0 on the first iteration, then 1,2,3, etc. all the way up to    # number_of_departments - 1.# Once the for-loop is over, we unindent and show the resultsmisdemeanor_data

0.9s

Python

Total number of police departments: 158

NOTE: The data for San Franscisco Sheriff's department in incomplete, which led to it being sorted at the bottom of the list. It's score doesn't mean anything.

NOTE: There are fancier and faster ways than using a for-loop, but this is the most basic.

CHALLENGE:

Do the same procedure Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset) and Total Arrests, 2016-2018 columns of the police_scorecard_data that we imported at the beginning of the notebook.

Here we want to compute the percentage of deadly force incidents per arrest for each department and rank them accordingly.

NOTE: You might have to reimport the data by rerunning the cells at the beginning of the notebook.

Filter and index:

Agency Name
Total Arrests, 2016-2018
Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset)

deadly_force = police_scorecard_data.filter(['Agency Name', 'Total Arrests, 2016-2018', 'Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset)'], axis=1)deadly_force.set_index('Agency Name', inplace=True)deadly_force.rename(columns={'Total Arrests, 2016-2018': 'Total Arrests', 'Total Deadly Force Incidents 2016-2018 (All Firearms Discharges + Other Force Resulting in Death or Serious Injury) (URSUS Dataset)': 'Total Deadly Force'}, inplace=True)deadly_force.head(5)

0.2s

Python

Agency Name	Total Arrests	Total Deadly Force
Alameda Police Department	3581	1
Alhambra Police Department	3911	3
Anaheim Police Department	28200	17
Antioch Police Department	10115	10
Bakersfield Police Department	51337	67

5 items

Calculate what percentage of arrests involved deadly force, sort by mortality incidence rate, and drop San Francisco County Sheriff's Department because the data is incomplete.

deadly_force['Percent Deadly'] = deadly_force['Total Deadly Force']/deadly_force['Total Arrests'] * 100 deadly_force.sort_values('Percent Deadly', inplace=True)deadly_force.drop(index='San Francisco County Sheriff\'s Department', inplace=True)deadly_force

0.7s

Python

deadly_force['Percent Deadly Percentile'] = deadly_force['Percent Deadly'].rank(pct=True,method='min')deadly_force.tail(5)#deadly_force

0.3s

Python

Agency Name	Total Arrests	Total Deadly Force	Percent Deadly	Percent Deadly Percentile
Riverside County Sheriff's Department	12084	36	0.29791459781529295	0.9745222929936306
Modoc County Sheriff's Department	636	2	0.3144654088050315	0.9808917197452229
Orange County Sheriff's Department	10833	42	0.3877042370534478	0.9872611464968153
Tehama County Sheriff's Department	2293	9	0.39249890972525076	0.9936305732484076
Richmond Police Department	4864	23	0.47286184210526316	1.0

5 items

SUPER CHALLENGE:

Do it with a dataset of your choice.

Dataism: Assigning a 0-100 Percentile Score to Police Depts.

1. Import Python Libraries

2. Import the Police Scorecord Data

3. Extracting the data about misdemeanor arrests and population.

4. Reindex the rows by column name

5. Compute the frequency of misdemeanor arrests

6. Computing Percentiles

CHALLENGE:

SUPER CHALLENGE:

Runtimes (1)