Dataism: CA Police Scorecard

This notebook is authored by Alexandre Puttick and modified by myself (see section #8). Original is on GitHub.

Setup a few requirements before beginning.

from IPython.display import HTML 
0.2s
Python
pip install xlrd
3.0s
Bash in Python

1. Police Brutality

As I write this (May 30th), fires are burning in the streets across the United States. Protesters are smashing windows and clashing with the police. They chant "Hands up! Don't shoot" and "I can't breathe."

Protests against police brutality erupted after a video went viral in which an officer is seen using his knee to pin a black man, George Floyd, to the ground by the neck. Floyd says he can't breathe, says "please" and "mama." Then he is quiet.

The officer, Derek Chauvin, was fired and has been charged with third-degree murder. According to an affidavit released by the prosecutors, Floyd had already been arrested and placed in a squad car when Chauvin arrived at the scene. He pulled George Floyd out of the car, placed his knee onto his neck and held him down on the ground. After five minutes Floyd became unresponsive. Another officer couldn't find a pulse. Chauvin continued to hold his knee down on George Floyd's neck for another three minutes.

Meanwhile the governor of Minnesota has not ruled out the possibility of bringing in the U.S. military to bring protests under control.

The act of violence committed against an unarmed black civilian by a police officer in this case was particularly egregious and the protests have escalated commensurately, but a version of this story has played out over and over again, and will continue to if nothing changes.

Background

There were 1,099 killings by police in 2019. This is close to the average over the last 10 years. For comparison, the total number of on record killings by law enforcement officers in Germany since 1952 is about half of that.

Excessive police brutality affects members of all racial groups, but victims are diproportionately black. The ratio is skewed even more amongst victims who were unarmed when killed.

Similar racial bias exists in cases of non-lethal excessive force and arrests for drug possession and misdemeanors.

So far the adoption of body cams and attempts to diversify the police force have led to very little change in officer behavior and the degree to which police are held accountable.

2. The Project

See https://policescorecard.org/.

Campaign Zero created a metric for scoring California police departments based on three categories:

  • Police violence

  • Police accountability

  • Approach to policing

A seperate score was created for each category, with the overall grade given by the average of the three scores.

Project goal

“The scorecard is designed to help communities, researchers, police leaders and policy-makers take informed action to reduce police use of force and improve accountability and public safety in their jurisdictions.”

-CA Police Scorecard Website

Here's a video of Sam Sinyangwe, the head data scientist on the project, talking about using of technology to help end police violence:

Methods

An explantion of their methods can be found here.

A Jupyter notebook of their methods for measuring racial bias in arrests and deadly force can be found here: https://github.com/campaignzero/ca-police-scorecard. We explain a part of their methods for measuring racial bias in a later section of this notebook.

Getting involved

Of course you can always donate, but you can go deeper by signing up here, where you can request to join a workgroup in

  • Policy research & advocacy

  • Data collection/ Analysis

  • Design/Develop Platforms

  • Elections/Political campaigns

Or suggest your own idea.

3. The Data

All of the data used to compute scores (along with extra data for further study) is contained in this spreadsheet.

Where did the data come from?

  • Deadly force, civilian complaints and arrests, from official databases:

    • CA DOJ's Openjustice

    • FBI's uniform crime reporting statistics (UCR)

    • CA monthly arrests and citation register

  • Police use of force, use of force complaints, police policy manuals:

    • directly from police agencies via public records requests

Racial bias data

For this course we will focus on the part of the score coming from racial bias exhibited by police in the use of deadly force.

You can find the data here.

The bias data also includes data relating to racial bias in arrests for drug possession.

4. Metrics

Let's take a moment to explore the concept of a metric, which, in this course, has the following meaning:

  • metric: A numerical quantity to measure the magnitude of a (qualitative) concept.

The police scorecard is a metric that evaluates police departments based on the aforementioned categories. Some other prominent examples:

  • GDP has become the overarching measure of economic and social progress.

  • University rankings purportedly measure the "quality of the education" that students receive.

  • Credit scores measure the "creditworthiness" of individuals.

CAUTION !!!

Even when defined with the best intentions and care, essentially every metric (including the Police Scorecard) is subject to several pitfalls:

Oversimplification

It is impossible for a metric to capture the full complexity of a situation. At best, the choice of which factors to include and exclude is extensively studied and justified. At worst, some old white dudes in a room decided on a whim.

It is an immense problem that we tend as a society to view numerical measurements of precise and objective and don't question the choice of which factors are included/excluded in a metric. Here's economist Simon Kuznets on GDP:

"The valuable capacity of the human mind to simplify a complex situation in a compact characterization becomes dangerous when not controlled in terms of definitely stated criteria. With quantitative measurements especially, the definiteness of the result suggests, often misleadingly, a precision and simplicity in the outlines of the object measured. Measurements of national income are subject to this type of illusion and resulting abuse, especially since they deal with matters that are the center of conflict of opposing social groups where the effectiveness of an argument is often contingent upon oversimplification. [...]"

-Simon Kuznets

It's less pretty, but it often makes much more sense not to just rank entities in a one-dimensional line. A lot of information gets lost in the process.

The Police Scorecard does a good job presenting all of the parts that contribute to their metric.

Optimizing a number

Post WWII, GDP growth became the standard for measure a country's development and the health of its economy. Now the leading perogative of most countries is to sustainably maximize it. This can have strange side effects:

  • The world's "strongest" economy has crumbling roads, undrinkable water and generally poor infrastucture.

  • Many value GDP growth over saving lives, preserving nature, increasing access to basic human rights...

  • Volunteer work, household labor, contributions to (creative) commons etc.apparently add no value to the economy

Once we've replaced a complex goal with optimizing a number, we might cease to ask ourselves if the metric is in fact in line with our goals.

Instead we are likely to justify our actions by pointing to improvements in an "objective" (not really) number.

Metrics can be gamed

Once a concept had been reduced to a metric which is then widely adopted, the model can be exploited to "win."

Take the college ranking example:

  • Some universities have confessed to: paying the fees for students to retake their SATs, falsifying SAT scores, acceptance and graduation rates etc.

  • Universities spend huge amounts of money on marketing and change their application procedures so that their acceptance rates go down.

Such tactics have paid off and significantly raised the university in questions rank. But how much to selection rates and SAT scores have to do with the quality of the education and experience students receive at the institution?

What's excluded often matters!

What if GDP factored in things like environmental health, quality of infrastructure, wealth disparity, household labor, public commons etc.?

Moral judgements are often encoded in what we include/exclude from a model.

Can a single metric be useful?

  • Its easy to compare different values

  • Easier for humans to understand/tells a good story

  • If well-designed, can be a good way to motivate people/groups and to easily track the effects of new policies

5. Numerical Proxies

In building a metric to measure the combination of police violence, accountability and approach to policing, we need numerical proxies for each category. For us, this means

  • numerical proxy-- quantitative data that we can use as a stand in for the quality we want to factor into the metric.

The proxies determine the data we wish to use. On the other hand, the proxies we choose might depend on what data is obtainable.

Let's cover each of the numerical proxies in the Police Scorecard:

Police violence

The proxies for police violence were:

  • percentage of violent non-lethal force used per arrest

  • percentage of deadly force used per arrest

  • number of unarmed civilians killed or seriously injured

  • racial bias in arrests and deadly force

    • Note: the project leaders invented a seperate metric for measuring racial bias.

Police accountability

  • percentage of civilian complaints sustained

  • percentage of discrimination and excessive force complaints sustained

  • percentage of complaints alleging police committed a criminal offense sustained

Approach to policing

  • percentage of misdemeanor arrests per population (as a proxy for "broken windows" or "zero tolerance" policing.)

  • percent homicides cleared (as a proxy for effectiveness at solving serious crimes)

Notes/Questions

  • Note that some of the proxies overlap with each other.

  • Why is each proxy a reasonable one to use for each category? (See https://policescorecard.org/about for an explanation of their choices)

  • What potential pitfalls are there in using these numbers?

6. Scoring racial bias in police use of deadly force

Studies show that relative to population, black men in the U.S. have twice the risk of being killed by a police officer than white men. (See the link for commentary on how to interpret this statistic).

Campaign Zero's model for racial bias in use of deadly force is based on the following question:

What would deaths by police look like in a given city if the victims had been white?

  • The main idea is that they consider the chances a white person has of dying in an encounter the police as 'normal' or 'baseline.'

  • Campaign Zero's method attempts to measure racial bias in deadly force by looking at how far from 'normal' the number of black victims is, where normal is the amount that would have died if they were white.

NOTE: Their model examines bias against both black and latinx populations. We focus on the black population for simplicity.

NOTE: You could do something very similar to measure other sorts of bias! For example,

  • Gender bias in job hiring

  • Race bias in dating

You could also do something like defining a baseline level of carbon emission (say for companies), and compare how far from the baseline different companies emissions are.

General population vs. Arrested individuals

There is a choice when comparing use of deadly force across different racial groups.

  • We can look at the number of victims of deadly compared to the entire population, or

  • we can look at the number of victims of deadly force compared to the number of arrests.

Campaign Zero chooses the latter and gives a good explanation in their Juptyer Noteboook:

"Imagine that every interaction with police is a single round of Russian roulette and white people are given two levels of advantage: first, they simply don't have to play the game as often, and when they do, they're given a different revolver with fewer chambers loaded. Both lead to disproportionate use of deadly force against communities of color. The first advantage (not having to interact with the police as much in the first place) is accounted for in the arrest disparity scores, and here we want to examine what happens given that an arrest did occur, and how that changes when a black person is arrested."

Probability of being killed for white residents

The probability of a white person being killed by the police while being arrested is estimated as follows:

We want to calculate this probability separately for each police department. This will establish what is 'normal' for that city, and we want to compare that to the amount of black victims in the same city. This pp might be relatively high in an urban area with lots of poor white people, but low in a rich, white community.

NOTE: The actual scorecard doesn't assume that p=0p=0 for a department that has killed zero white residents. The assumption is that in such cases, too few white residents were arrested to get a good estimate of the probability. It's never impossible that a white residence will be killed by the police. For such departments, the model takes the statewide probability as a baseline instead. Is this a good solution?

Assumption on Distribution of Deaths

Imagine that the above probability pp of a white resident being killed by the police in a town called Exampleton is 1%, i.e., one person killed for every 100 arrests.

Suppose that 1,000 black people are arrested in Exampleton. If there is no racial bias, you would expect approximately 0.01×1,000=100.01\times 1,000 = 10 black victims of deadly violence. Maybe not exactly 10 though. There's a high chance that the number would be close to 10, and a very low chance that it would be far from 10, say 900 deaths, for example.

If there are indeed 900 black victims of deadly violence in Exampleton (this is a very evil town), our metric for bias should say that the police in Exampleton are very biased.

Normal distributions

The Campaign Zero model makes an important assumption:

Assumption: The number of deadly force incidents is normally distributed.

What does this mean?

Normal distributions are best understood visually:

This is also called a Gaussian distribution or a bell curve. It appears all the time in nature when a number usually clusters around some average μ\mu. For example, the heights of individual humans are normally distributed around an average of μ170cm.\mu \approx 170\, cm.

The bump in the middle indicates that most people's heights are close to average. Really high or low values have very low probability of occuring. These are the so called outliers.

Standard deviations

The width of the curve is determined by the standard deviation σ\sigma, which roughly says how much the data varies around the average. In the above picture you can see that if data follows a normal distribution, then most values (95%) fall within two standard deviations of the average.

Returning to the height example, you might expect σ10cm\sigma \approx 10\, cm since most people are between 17020=150cm170-20 = 150\, cm and 170+20=190cm170 + 20 = 190\, cm tall. (Here we computed μ2σ\mu-2\sigma and μ+2σ.\mu+2\sigma.).

Relation to racial bias score

The bias score in Campaign Zero's model is based on how many standard deviations away from average/normal the number of black victims of deadly police violence is.

  • One standard deviation larger than expected indicates slight bias.

  • Three standard deviations bigger indicates extreme bias.

Here is the general (arbitrary police department) description of the above Exampleton example:

Consider a department for which a white person has a probability pp of being killed by a police officer. If n=# of black arrests by departmentn = \textnormal{\# of black arrests by department} then we expect approximately μ=np\mu = n\cdot p cases of deadly violence against blacks if there is no bias.

Under the assumption that the distribution is normal, the standard deviation is given by

If x=# black victims of deadly violence,x = \textrm{\# black victims of deadly violence},$ then the distance zz (in standard deviations) between xx and μ\mu) is given by

  • This number, called the Z-score of xx, is the basis for the racial bias score.

NOTE: Here we have a problem! For departments where no white residents were killed, we have p=0p = 0. Then σ=0\sigma = 0 and z=undefinedz = \textrm{undefined}, since you can't divide by zero.

To address this problem, Campaign Zero replaces the probability pp for the department with the statewide probability pstate=# white residents killed statewide# white residents arrested statewidep_{state} = \frac{\textrm{\# white residents killed statewide}}{\textrm{\# white residents arrested statewide}}

Is this reasonable? The problem is that not enough white people were arrested by the department in question to determine the "true probability" that a white person would be killed during arrest by the police. Using the Russian roulette analogy, if the game is only played a few times, it's difficult to determine how many chambers are loaded. Just because no one died, doesn't mean there are no loaded chambers. However, using the statewide probability is likely an overestimate. Many places with zero white victims of deadly force are likely to be safer than the state average.

Return to Exampleton

If we go back to Exampleton, Campaign Zero's, we had p=1%p = 1\% and n=10,000n = 10,000. Now suppose that the Exampleton police are responsible for the deaths of x=120x= 120 black victims. This is somewhat higher than the expected number of 100 deaths if the victims were white. Here we have

Then the Z-score is z=xμσ=12010010=2.z = \frac{x -\mu}{\sigma} = \frac{120-100}{10} = 2. This indicates quite high bias.

From Z-Score to Percentile

After computing the Z-score for each police department, they are ranked from best to worst and assigned a percentile score from 0-100. If Exampleton gets a score of 75, then it had a better Z-score than 75% of police departments.

7. Code for the Racial Bias Score

In this final section, we walk through the code for carrying out the process described in section 6.

### import the python libraries we need
import pandas as pd
import numpy as np
# Allow pandas to display many rows and columns from our data tables
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500) 
# NOTE: It is customary to 'import pandas as pd' and 'import numpy as np,'
# meaning you add 'as pd' and 'as np' to the above lines of code.
# This just means that in the code you can write pd and np instead
# of pandas and numpy.
# %matplotlib inline
0.1s
Python

a) Loading the racial bias data into the Jupyter notebook:

NOTE: YOU WILL GET ERRORS IF YOU DON'T RUN THE CODE CELLS IN ORDER.

disparity analysis 2016-2018.xlsx
data_url = 
disparity analysis 2016-2018.xlsx
### Then we create a 'bias_data' variable (you can name it what you want)
### and use the pandas library to read the excel file from the given URL. 
### pd.read_excel can also read excel tables saved to a drive.
bias_data = pd.read_excel(data_url) ## The 'bias_data' variable now contains our data table
bias_data.head(5) ## Displays the first five lines of the excel table.
0.9s
Python

b) Extracting the columns of interest

To analyze racial bias in the use of deadly force against black residents, we only need the following columns from the above data table:

  • White Victims of Deadly Force

  • Black Victims of Deadly Force

  • Black Arrests

  • White Arrests

The following code creates a new data table called 'bias_deadly_force_data', which isolates the columns we are interested in:

### Filter for the columns we want analyze
bias_data = bias_data.filter(['Agency Name', 'White Victims of Deadly Force', 'Black Victims of Deadly Force', 'White Arrests', 'Black Arrests'] ,axis=1)
### Display the first five lines of the new data table
bias_data.head(5)
0.4s
Python
Agency NameWhite Victims of Deadly ForceBlack Victims of Deadly ForceWhite ArrestsBlack Arrests
0Alameda Police Department101322.062456627342989.061762664816
1Alhambra Police Department00345.54136184580756279.5143500281373
2Anaheim Police Department417464.1393442622952075.5122950819673
3Antioch Police Department353398.30353430353444260.4968814968815
4Bakersfield Police Department211020459.63214627326710191.098962548745
5 items

c) Compute the probability pp of being killed if white.

If p=0p = 0, replace with, pstate=# white residents killed statewide# white residents arrested statewidep_{state} = \frac{\textrm{\# white residents killed statewide}}{\textrm{\# white residents arrested statewide}}

### Compute the baseline probability of being killed by the police if white.
## Extract the column from the table containing the number of white victims of
## deadly force.
white_deadly_force = bias_data['White Victims of Deadly Force']
# Show first five rows
white_deadly_force.head(5)
0.4s
Python
01
10
24
33
421
4 items
### Do the same for white_arrests
white_arrests = bias_data['White Arrests']
### We divide white_deadly_force by white_arrests to obtain probability,
### as in the above equation.
white_deadly_force_by_arrest = white_deadly_force / white_arrests # This is p.
white_deadly_force_by_arrest.head(5)
0.2s
Python
00.0007563939169341953
10.0
20.0005358956760466712
30.0008827934202219035
40.0010264114158975806
4 items
## compute p_state
state_deadly_force_per_white_arrest = float(white_deadly_force.sum()) / white_arrests.sum()
## For states with p = 0, replace p with p state.
## Do this with np.where(p > 0, p, p_state).
## This checks p for each department.
## If p > 0 , don't replace
## If p = 0, replace with p_state
white_deadly_force_by_arrest = np.where(white_deadly_force>0,white_deadly_force_by_arrest,state_deadly_force_per_white_arrest)
0.1s
Python

d) Compute the expected number of black victims for each department

Expected number of black victims, μ=np\mu = n\cdot p

## Extract the column from the table containing the number of black arrests
black_arrests = bias_data['Black Arrests'] # This is n
## Compute mu = n*p
black_mean_by_arrest = black_arrests * white_deadly_force_by_arrest # This is mu
black_mean_by_arrest.head(5)
0.3s
Python
00.7481203007518796
10.1247282756180061
21.112258064516129
33.761138613861386
410.460260315702023
4 items

e) Compute the standard deviation for each department.

# First compute n*p*(p-1)
var = black_arrests * white_deadly_force_by_arrest * (1 - white_deadly_force_by_arrest)
# Then take the square root
black_stdev_by_arrest = np.sqrt(var) # This is the standard deviation sigma
0.1s
Python

f) Compute the Z-score for each department.

# Get x = number of black victims
black_deadly_force = bias_data['Black Victims of Deadly Force']
## Now that we don't have zeros in the denominators, we can compute
## z = (x - mu) / sigma
z_scores = (black_deadly_force - black_mean_by_arrest) / black_stdev_by_arrest
## Add Z-scores to the data table
bias_data['Z_Scores'] = z_scores
bias_data.head(5)
0.2s
Python
Agency NameWhite Victims of Deadly ForceBlack Victims of Deadly ForceWhite ArrestsBlack ArrestsZ_Scores
0Alameda Police Department101322.062456627342989.061762664816-0.8652667812552873
1Alhambra Police Department00345.54136184580756279.5143500281373-0.3532477292578565
2Anaheim Police Department417464.1393442622952075.5122950819673-0.10647095950785526
3Antioch Police Department353398.30353430353444260.49688149688150.6390794133727277
4Bakersfield Police Department211020459.63214627326710191.098962548745-0.1423820631239293
5 items

g) Plot the data

import matplotlib.pyplot as plt
# Use mathplot library to view the z-scores
plt.hist(bias_data['Z_Scores'], bins =5)
print('Number of depts.', len(bias_data))
plt.gcf()
1.1s
Bias Plot #1Python

Interpretation

Interestingly, the about 50 (height of the middle bar) of the 100 departments have a bias score near zero. About 30 have a negative bias score (which indicates that they are actually biased against whites). About 20 departments seem to exhibit clear bias.

Is this right? Probably the reason is that we used the statewide probability for departments with no white victims (which was most of them). This likely overestimates the chances that a white person is killed, and there for overestimates the expected number of black deaths. This makes the real number of black deaths seem less biased.

In their work, Campaign Zero also estimated bias against latinx residents and took the maximum between black bias and latinx bias. This might help correct the problem.

f) Convert to percentile.

To convert Z-scores to 0-100 percentile scores, do the same procedure as in the Week 2 assignment.

A fancier way is this:

## Compute the percentile score
bias_data['bias_percentile'] = 1.0 - bias_data.Z_Scores.rank(pct=True)
## Sort from least to most biased
bias_data.sort_values('Z_Scores').head(10)
0.5s
Python

8. Recompute Without Incomplete Data

NOTE: The actual scorecard doesn't assume that p=0p=0 for a department that has killed zero white and black residents. The assumption is that in such cases, too few white residents were arrested to get a good estimate of the probability. It's never impossible that a white residence will be killed by the police. Rather than take the statewide average for white victims, I drop the agencies where there are no victims of deadly force (black or white).

Drop the agencies and note the following table will be missing several state agencies.

nonzero_data = bias_data.filter(['Agency Name', 'White Victims of Deadly Force', 'Black Victims of Deadly Force', 'White Arrests', 'Black Arrests'], axis=1)
nonzero_data = nonzero_data[nonzero_data['White Victims of Deadly Force'] != 0]
nonzero_data = nonzero_data[nonzero_data['Black Victims of Deadly Force'] != 0]
nonzero_data.head(5)
0.4s
Python
Agency NameWhite Victims of Deadly ForceBlack Victims of Deadly ForceWhite ArrestsBlack Arrests
2Anaheim Police Department417464.1393442622952075.5122950819673
3Antioch Police Department353398.30353430353444260.4968814968815
4Bakersfield Police Department211020459.63214627326710191.098962548745
10Chico Police Department318795.789628385472833.337392399748
11Chino Police Department112182.0377358490564623.4393530997305
5 items

Previously we were looking at 100 agencies in California. Now we are only looking at 31.

print('Number of depts in total:', len(bias_data), '\nNumber of depts with > 0 black & > 0 white instance of deadl force:', len(nonzero_data))
0.3s
Python

Calculate the probability: p=# white residents killed by department# white residents arrested by departmentp = \frac{\textrm{\# white residents killed by department}}{\textrm{\# white residents arrested by department}}

### We divide white_deadly_force by white_arrests to obtain probability,
### as in the above equation.
probability_white_deadly_force = nonzero_data['White Victims of Deadly Force'] / nonzero_data['White Arrests'] # This is p.
probability_white_deadly_force.head(5)
0.3s
Python
20.0005358956760466712
30.0008827934202219035
40.0010264114158975806
100.0003410722773903667
110.0004582872163807416
4 items

Calculate the expected number of black victims, μ=np\mu = n\cdot p, where nn is the number of black arrests.

expected_black_victim_rate = nonzero_data['Black Arrests'] * probability_white_deadly_force
expected_black_victim_rate.head(5)
0.1s
Python
21.112258064516129
33.761138613861386
410.460260315702023
100.2842282822603317
110.28571428571428575
4 items

Compute the standard deviation, σ=np(1p)\sigma = \sqrt{np(1-p)}.

# First compute n*p*(p-1)
var = nonzero_data['Black Arrests'] * probability_white_deadly_force * (1- probability_white_deadly_force)
# Then take the square root
black_stdev_by_arrest = np.sqrt(var) # This is the standard deviation sigma
black_stdev_by_arrest.head(3)
0.1s
Python
21.0543538354028532
31.9385092998075935
43.2325723170720755
2 items

Compute the z-score, z=xμσz = \frac{x - \mu}\sigma, where xx is the number of black victims of deadly force.

z = (nonzero_data['Black Victims of Deadly Force'] - expected_black_victim_rate) / black_stdev_by_arrest
nonzero_data['Z Score'] = z
nonzero_data.head(5)
0.4s
Python
Agency NameWhite Victims of Deadly ForceBlack Victims of Deadly ForceWhite ArrestsBlack ArrestsZ Score
2Anaheim Police Department417464.1393442622952075.5122950819673-0.10647095950785526
3Antioch Police Department353398.30353430353444260.49688149688150.6390794133727277
4Bakersfield Police Department211020459.63214627326710191.098962548745-0.1423820631239293
10Chico Police Department318795.789628385472833.3373923997481.3428112295302388
11Chino Police Department112182.0377358490564623.43935309973051.3366125208765693
5 items

The resulting plot is not scientific, but it does bis in a way that we would expect:

# Use mathplot library to view the z-scores
plt.clf()
plt.hist(nonzero_data['Z Score'], bins =5)
print('Number of depts.', len(nonzero_data))
plt.gcf()
1.0s
Bias Plot #2Python
Runtimes (1)