Fatal Police Shootings EDA

08 Jul 2018 - Tags: eda and visualization

The Washington Post maintains a database of fatal police shootings in the US since 2015. We’ll take a quick look at the data, and also analyze how biased shootings are by state or race.

Let’s load the packages we’ll need, load the dataset, and take a look at it.

# Packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import poisson 
import urllib.request, json 

# Plot settings
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
sns.set()

PK = pd.read_csv('fatal-police-shootings-data.csv', 
                 encoding='latin1')

PK.sample(n=10)

	id	name	date	manner_of_death	armed	age	gender	race	city	state	signs_of_mental_illness	threat_level	flee	body_camera
2687	2984	Jeffory Hughes	2017-09-26	shot	gun	50.0	M	W	Fairview	TN	False	attack	Not fleeing	False
426	542	Raymond Phillips	2015-06-11	shot	gun	86.0	M	NaN	Columbia	TN	True	attack	Not fleeing	False
3155	3496	Justin Lee Dietrich	2018-03-12	shot	vehicle	32.0	M	W	West Fargo	ND	False	attack	Car	False
1589	1776	Jesse James Romero	2016-08-09	shot	gun	14.0	M	H	Los Angeles	CA	False	attack	Foot	True
343	439	Shaun Johnson	2015-05-08	shot	sword	35.0	M	W	Kearny	AZ	False	attack	Not fleeing	False
3015	3346	Humberto Vera-Munoz	2018-01-24	shot	unknown weapon	27.0	M	H	Sparks	NV	False	other	Foot	False
1822	2028	Kenton Kobza	2016-11-08	shot and Tasered	unarmed	19.0	M	W	Hays	KS	False	attack	Car	False
3150	3499	Juan Garcia Alvarez	2018-03-11	shot	gun	57.0	M	H	Midland	TX	False	attack	Not fleeing	False
1160	1323	Earl Hicks	2016-02-28	shot	gun	44.0	M	W	Rapid City	SD	True	attack	Not fleeing	False
964	1108	Guadalupe Quiroz	2015-12-21	shot	gun	34.0	M	H	Hemet	CA	False	attack	Not fleeing	False

Let’s look at the distribution of these killings across the US. We’ll use Bokeh to plot the number of killings in each city as a circle, where the area of the circle corresponds to the number of people shot by police in that city. We’ll use OpenStreetMap Nominatim to geocode the locations of each city.

# Import bokeh components
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, Circle, HoverTool
from bokeh.tile_providers import CARTODBPOSITRON_RETINA
output_notebook()

# Functions to convert lat and lon to Mercator coords
def lat_to_mercY(lat):
    """Convert Latitude to Mercator Y"""
    return np.log(np.tan(np.pi/4+np.radians(lat)/2))*6378137.0

def lon_to_mercX(lon):
    """Convert Longitude to Mercator X"""
    return np.radians(lon) * 6378137.0

# Count shootings per city
PK['citystate'] = PK.city.astype(str)+' '+PK.state.astype(str)
PKc = PK.citystate.value_counts().to_frame()

# Look up latitude + longitude of cities w/ OSM geocoding
cities = [c.replace(' ', '%20') for c in PKc.index.tolist()]
Surl = 'http://nominatim.openstreetmap.org/search/'
Furl = '%20USA?format=json'
X = []
Y = []
for c in cities:
    try:
        with urllib.request.urlopen(Surl+c+Furl) as url:
            data = json.loads(url.read().decode())
        X.append(float(data[0]['lon']))
        Y.append(float(data[0]['lat']))
    except (IndexError, UnicodeEncodeError): #couldn't geocode city
        X.append(np.nan)
        Y.append(np.nan)

# Data source table for Bokeh
source = ColumnDataSource(data=dict(
    X = lon_to_mercX(X),
    Y = lat_to_mercY(Y),
    size = 7*np.sqrt(PKc.citystate),
    num_shot = PKc.citystate,
    city = PKc.index.tolist()
))

# Tooltips (on hover)
tooltips=[('People shot', "@num_shot"), 
          ('City', "@city")]

# Plot the points
p = figure(plot_width=800, plot_height=500, 
           x_range=(-7300000, -14500000), 
           y_range=(2500000,7000000),
           title='Police Shootings by City', 
           active_scroll="wheel_zoom", tooltips=tooltips)
p.add_tile(CARTODBPOSITRON_RETINA) #set background map
p.circle('X', 'Y', source=source, size='size', #plot each station
         fill_alpha=0.5, line_color=None, color='crimson')
p.axis.visible = False
show(p)

Full screen map

The Post is only recording killings where the individual killed was shot by police. Therefore, all the people killed by police in the dataset were shot, but around 6% were also tasered.

# Plot the manner of death
ax = sns.countplot(x='manner_of_death', data=PK)
plt.title('Manner of death')
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    y = p.get_bbox().get_points()[1,1]
    ax.annotate('{:.2g}%'.format(100.*y/len(PK)), 
                (x.mean(), y), ha='center', va='bottom')
plt.show()

svg

The dataset also contains information about what type of weapon (if any) the individual killed was using. The most common weapon by far was a gun followed by a knife, or a vehicle. Though an alarmingly high proportion of individuals shot by police were unarmed or “armed” with a toy weapon.

# Plot the 20 most common weapons used by individuals shot
plt.figure(figsize=(4.7, 5))
ax = sns.countplot(y='armed', data=PK,
                   order = PK.armed.value_counts().iloc[:20].index)
plt.title('Weapon used by person shot')
plt.show()

svg

Most of the individuals shot and killed by police were between the ages of 20 and 40, with very few older than 80 or younger than 16. Though there are 17 children under the age of 16 that were killed by police.

sns.distplot(PK.age[~np.isnan(PK.age)])
plt.title('Age of individuals shot by police')
plt.show()

svg

PK[PK.age<16]

	id	name	date	manner_of_death	armed	age	gender	race	city	state	signs_of_mental_illness	threat_level	flee	body_camera	citystate
435	551	Zane Terryn	2015-06-14	shot	gun	15.0	M	W	Cocoa	FL	True	attack	Not fleeing	False	Cocoa FL
596	715	Andre Green	2015-08-09	shot	vehicle	15.0	M	B	Indianapolis	IN	False	attack	Not fleeing	False	Indianapolis IN
776	916	Jorge Santiago Tapia	2015-10-15	shot	gun	15.0	M	H	Miami	FL	False	attack	Car	False	Miami FL
833	980	Jeremy Mardis	2015-11-03	shot	unarmed	6.0	M	W	Marksville	LA	False	other	Car	True	Marksville LA
1018	1165	Ciara Meyer	2016-01-11	shot	unarmed	12.0	F	W	Penn Township	PA	False	other	Not fleeing	False	Penn Township PA
1298	1477	Jorevis Scruggs	2016-04-19	shot	gun	15.0	M	B	St. Louis	MO	False	attack	Other	False	St. Louis MO
1589	1776	Jesse James Romero	2016-08-09	shot	gun	14.0	M	H	Los Angeles	CA	False	attack	Foot	True	Los Angeles CA
1682	1883	Tyre King	2016-09-14	shot	toy weapon	13.0	M	B	Columbus	OH	False	other	Foot	False	Columbus OH
1849	2054	Luke Smith	2016-11-19	shot and Tasered	knife	15.0	M	A	Watsonville	CA	False	other	Foot	True	Watsonville CA
2271	2714	Daezion Turner	2017-04-24	shot	gun	15.0	M	B	Killeen	TX	False	attack	Not fleeing	False	Killeen TX
2286	2538	Jordan Edwards	2017-04-29	shot	unarmed	15.0	M	B	Balch Springs	TX	False	undetermined	Not fleeing	True	Balch Springs TX
2302	2557	Jacob Peterson	2017-05-06	shot	toy weapon	15.0	M	W	San Diego	CA	True	attack	Not fleeing	True	San Diego CA
2555	3060	Kemonte Cobbs	2017-08-03	shot	gun	15.0	M	B	Gary	IN	False	attack	Foot	False	Gary IN
2657	2950	Ruben Urbina	2017-09-15	shot	crowbar	15.0	M	H	Haymarket	VA	False	other	Not fleeing	False	Haymarket VA
2807	3117	Jason Ike Pero	2017-11-08	shot	knife	14.0	M	N	Odanah	WI	False	other	Not fleeing	False	Odanah WI
2845	3154	Seth William Johnson	2017-11-20	shot	knife	15.0	M	W	Gulfport	MS	False	undetermined	Not fleeing	False	Gulfport MS
2910	3229	Kameron Prescott	2017-12-21	shot	unarmed	6.0	M	NaN	Schertz	TX	False	other	Not fleeing	False	Schertz TX

Most of the individuals shot by police were male.

# Plot police killings by gender
ax = sns.countplot(x = "gender", data = PK,
                   order = PK.gender.value_counts().index)
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    y = p.get_bbox().get_points()[1,1]
    ax.annotate('{:.2g}%'.format(100.*y/len(PK)), 
                (x.mean(), y), ha='center', va='bottom')
plt.title('Police Killings by Gender')
plt.show()

svg

According to the US census, the population of the United States is 50.8% female and 49.2% male. How does this distribution match up to the distribution of police killings. That is - if police were killing people completely at random, what would the killings distribution look like?

First let’s define a function which will return a dataframe with the actual percentages and also the population percentages for any per-group count.

def ActualVsPopulation(df, pop, group):
    """Get actual per-group % vs population group % """
    d = {group: [], 'type': [], 'percent': []}
    tot_pop = float(sum(pop.values()))
    for g in df[group].dropna().unique(): #for each group

        # Actual percentages
        d[group].append(g)
        d['type'].append('Killings')
        d['percent'].append(100
                            * df[df[group]==g].id.count()
                            / df.id.count())

        # Percentages if statistic followed pop distribution
        d[group].append(g)
        d['type'].append('Population') #based on pop %age
        d['percent'].append(100*pop[g]/tot_pop)
        
    return pd.DataFrame(data=d)

Now we can see the actual killings distribution versus the population distribution:

# Plot percent police killings by gender vs population percentages
# Data from: https://www.census.gov/quickfacts/fact/table/US
pop_g = {'M': 49.2, 'F': 50.8} #percent population by gender 
df = ActualVsPopulation(PK, pop_g, 'gender')
sns.barplot(x="gender", y="percent", hue="type", 
            data=df, palette=["r", "C0"])
plt.title('Actual Police Killings vs Population Distribution' +
          ' (by Gender)')
plt.show()

svg

Next we’ll plot the number of police killings by race.

# Plot police killings by race
ax = sns.countplot(x="race", data=PK,
                   order = PK.race.value_counts().index)
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    y = p.get_bbox().get_points()[1,1]
    ax.annotate('{:.2g}%'.format(100.*y/len(PK)), 
                (x.mean(), y), ha='center', va='bottom')
plt.title('Police Killings by Race')
plt.show()

svg

How does this match up to the population distribution?

# Plot percent police killings by race vs population percentages
# Data from: https://www.census.gov/quickfacts/fact/table/US
pop_r = {'W': 60.7, # white - % population by race 
         'B': 13.4, # black or african american
         'H': 18.1, # hispanic or latino
         'A': 5.8,  # asian
         'N': 1.5,  # american indian, alaska native, naitive hawaiian, and other pacific islander
         'O': 0.5}  # other
df = ActualVsPopulation(PK, pop_r, 'race')
order = PK.race.value_counts().index, palette=["r", "C0"]
sns.barplot(x="race", y="percent", hue="type", data=df,
            order = order)
plt.title('Actual Police Killings vs Population Distribution' +
          ' (by Race)')
plt.show()

svg

As you can see above, Black and Other are the only two racial categories with more police killings than would be expected if police killings were random. And that effect isn’t small - there are nearly twice as many police killings of black individuals than would be expected if these killings were random.

Let’s plot the number of police killings by state:

plt.figure(figsize=(6, 13))
sns.countplot(y="state", 
              data=PK,
              order=PK.state.value_counts().index)
plt.title('Police Killings By State')
plt.show()

svg

And how does this match up to the population distribution of the states?

# Population for each state 
# https://www.census.gov/data/datasets/2017/demo/popest/state-total.html
pop_s ={'AL': 4874747,
        'AK': 739795,
        'AZ': 7016270,
        'AR': 3004279,
        'CA': 39536653,
        'CO': 5607154,
        'CT': 3588184,
        'DE': 961939,
        'DC': 693972,
        'FL': 20984400,
        'GA': 10429379,
        'HI': 1427538,
        'ID': 1716943,
        'IL': 12802023,
        'IN': 6666818,
        'IA': 3145711,
        'KS': 2913123,
        'KY': 4454189,
        'LA': 4684333,
        'ME': 1335907,
        'MD': 6052177,
        'MA': 6859819,
        'MI': 9962311,
        'MN': 5576606,
        'MS': 2984100,
        'MO': 6113532,
        'MT': 1050493,
        'NE': 1920076,
        'NV': 2998039,
        'NH': 1342795,
        'NJ': 9005644,
        'NM': 2088070,
        'NY': 19849399,
        'NC': 10273419,
        'ND': 755393,
        'OH': 11658609,
        'OK': 3930864,
        'OR': 4142776,
        'PA': 12805537,
        'RI': 1059639,
        'SC': 5024369,
        'SD': 869666,
        'TN': 6715984,
        'TX': 28304596,
        'UT': 3101833,
        'VT': 623657,
        'VA': 8470020,
        'WA': 7405743,
        'WV': 1815857,
        'WI': 5795483,
        'WY': 579315}

# Plot percent police killings by state vs population percentages
df = ActualVsPopulation(PK, pop_s, 'state')
order = PK.state.value_counts().index, palette=["r", "C0"]
plt.figure(figsize=(6, 13))
sns.barplot(y="state", x="percent", hue="type", data=df,
            order = order)
plt.title('Actual Police Killings vs Population Distribution' +
          ' (by State)')
plt.show()

svg

Instead of plotting both the actual killings against the population distributions, we can compute a single number - a score which tells us how likely it is to see a number of killings (for a state or race) if killings were random, given the population distribution. We’ll use a Poisson distribution to model the likelihood of observing the number of killings given the number of people in a certain group (state or race).

A Poisson distribution models the likelihood of observing some number of events (\( k \) events) within a set time window, given the constant (or expected) rate for which they are thought to occur (\( \lambda \) events per time window). We want to know the likelihood of the number of police killings of people from a given state or of a given race, so we’ll set the \( \lambda \) parameter to be the number of individuals in a certain population (race or state) which we would expect to be killed by police given the national average, and find the probability at \( k= \) the number of people killed by police in that population. If police killings don’t depend on state or race, then the number of people killed by police in that population (\( k \)) should be proportional to the population (\( \lambda \)). The further \( k \) is from \( \lambda \), the less likely it is that police killlings are biased in that population.

But we don’t just want to find the likelihood that police killings are biased in different populations, we also want to know the direction in which they are biased. That is, are there more or less police killings of people in a certian population than would be expected in that population if killings were random? Therefore, we’ll use the cumulative probability of the Poisson distribution, which will tell us the likelihood of observing a given number of killings or less. That way, if the value of the cumulative poisson distribution is closer to 0, that means there were less killings of a certain population than would be expected if killings were random. If the value is closer to 1, there were more killings than would be expected if killings were random. And if the value is around 0.5, police shootings of people in that population were relatively unbiased.

The cumulative probability of the Poisson distribution can be computed by:

\[p(X \leq k) = e^{-\lambda} \sum_{i=0}^{\left \lfloor{k}\right \rfloor} \frac{\lambda^i}{i !}\]

where again \( k \) is the number of observed killings and \( \lambda \) is the expected number of killings if killings were random (given the population distribution).

If there are \( N_p \) individuals in a population, \( N_k \) people shot and killed by police in the USA, and \( N_{USA} \) people in the USA, we would expect to see, on average,

\[\lambda = N_p \frac{N_k}{N_{USA}}\]

people in that population shot by police if shootings were random. Then the Poisson and cumulative Poisson distributions tell us how likely it is, if killings were random, to observe the actual number of people killed in that population:

lam = 10 #lambda, the "expected" number
k1 = 5   #k, the actual number, when more than expected
k2 = 15  #k, the actual number, when less than expected

xx = np.arange(0,30)
ppd = poisson.pmf(xx, lam) #poisson probability distribution
cppd = poisson.cdf(xx, lam) #cumulative poisson distribution

p_k1 = poisson.pmf(k1, lam) #probability of observing k1
p_k2 = poisson.pmf(k2, lam) #and k2
c_k1 = poisson.cdf(k1, lam) #cumulative probability of observing k1
c_k2 = poisson.cdf(k2, lam) #and k2

plt.figure()
plt.plot(xx,ppd, 'o')
plt.plot([k1, k1], [0, p_k1], 'g--')
plt.plot([0, k1], [p_k1, p_k1], 'g--')
plt.plot([k2, k2], [0, p_k2], 'r--')
plt.plot([0, k2], [p_k2, p_k2], 'r--')
plt.title(r'Poisson Distribution, \\( \lambda=10 \\)')
plt.xlabel('k')
plt.ylabel('Probability')
plt.xlim([0, 30])
plt.ylim([0, 0.14])
plt.show()

plt.figure()
plt.plot(xx,cppd, 'o')
c_k1 = poisson.cdf(k1, lam)
plt.plot([k1, k1], [0, c_k1], 'g--')
plt.plot([0, k1], [c_k1, c_k1], 'g--')
plt.plot([k2, k2], [0, c_k2], 'r--')
plt.plot([0, k2], [c_k2, c_k2], 'r--')
plt.title(r'Poisson Cumulative Distribution, \\( \lambda=10 \\)')
plt.xlabel('k')
plt.ylabel('Cumulative Probability')
plt.xlim([0, 30])
plt.ylim([0, 1])
plt.show()

svg

We can see that when there are less killings in a population than would be expected if killings were random, the cumulative probability value approaches 0, while when there are more than expected, the cumulative probability value approaches 1.

Let’s plot that metric for each state, to see which states have less police killings than the national average (\( p(X \leq k) \lessapprox 0.5 \)) and which states have more police killings than the national average (\( p(X \leq k) \gtrapprox 0.5 \)). I’ve plotted the bars such that they are anchored at 0.5 to make this distinction more clear.

# National average rate of police shootings per person
natl_rate = PK.shape[0]/sum(pop_s.values()) 

# Compute the cumulative Poisson distribution value for each state
cum_poisson_vals = [] #to store cumulative poisson values
for state in pop_s.keys():
    lam = natl_rate*pop_s[state] #shootings if matched avg (lambda)
    k = PK[PK.state==state].shape[0] #actual number of shootings
    cum_poisson_vals.append(poisson.cdf(k, lam))

# Plot cumulative Poisson distribution values
plt.figure(figsize=(6, 13))
EK = pd.DataFrame(data={'state': list(pop_s.keys()), 
                        'cum_poisson': cum_poisson_vals})
EK.sort_values('cum_poisson', inplace=True)
y = np.arange(len(EK.state))
plt.barh(y, EK.cum_poisson-0.5, left=0.5)
ax = plt.gca()
ax.set_yticks(y)
ax.set_yticklabels(EK.state)
ax.invert_yaxis()
plt.plot([0.5, 0.5], [y[0]-1, y[-1]+1], 'k', linewidth=2)
plt.title('Police Shootings: Cumulative Probability\n' +
          'given national average, by State')
plt.ylabel('State')
plt.xlabel('Cumulative Poisson Probability')
plt.text(0.505, y[0]-1.5, 
         (r'Less shootings \\( \leftarrow \\)' + 
          r'\\( \rightarrow \\) More shootings'),
         horizontalalignment='center', color='gray')
plt.show()

svg

So, Arizona has the highest frequency of police shootings given its population, while New York has the least.

We can plot the same statistic per race to see how racially biased police shootings are.

# Compute the cumulative Poisson distribution value for each race
cum_poisson_vals = []
for race in pop_r.keys():
    lam = PK.shape[0]*pop_r[race]/100 #shootings if matched avg
    k = PK[PK.race==race].shape[0] #actual number of shootings
    cum_poisson_vals.append(poisson.cdf(k, lam))

# Plot cumulative Poisson distribution values
plt.figure()
EK = pd.DataFrame(data={'race': list(pop_r.keys()), 
                        'cum_poisson': cum_poisson_vals})
EK.sort_values('cum_poisson', inplace=True)
y = np.arange(len(EK.race))
plt.barh(y, EK.cum_poisson-0.5, left=0.5)
plt.plot([0.5, 0.5], [y[0]-1, y[-1]+1], 'k', linewidth=2)
ax = plt.gca()
ax.set_yticks(y)
ax.set_yticklabels(EK.race)
ax.invert_yaxis()
plt.title('Police Shootings: Cumulative Probability\n' +
          'given national average, by Race')
plt.ylabel('Race')
plt.xlabel('Cumulative Poisson Probability')
plt.text(0.506, y[0]-0.75, 
         (r'Less shootings \\( \leftarrow \\)' + 
          r'\\( \rightarrow \\) More shootings'),
         horizontalalignment='center', color='gray')
plt.show()

svg

This shows that police shootings are extremely racially biased: there is a near-zero probability of observing so few shootings of white and asian individuals if shootings were random, and a near-zero probability of observing so many shootings of black and other individuals if shootings were random.

To examine the dates of the shootings, we’ll have to convert the dates which were loaded as general objects into datetimes.

# Convert date from object to datetime
PK.date = pd.to_datetime(PK.date)

And now we can see how the shooting frequency has changed over time:

# Plot shootings by month
plt.figure(figsize=(5,8))
order = sorted(PK.date.dt.strftime('%Y %m %B').unique())
sns.countplot(y = PK.date.dt.strftime('%Y %m %B'), 
              order = order)
plt.title('Police Killings By Month since start of dataset')
plt.show()

svg

Unfortunately it doesn’t look like there is a decrease in the frequency of shootings. (Note that the data used for this analysis is as of early July 2018)

We can also look at the number of killings as a function of the day of the week.

# Plot shootings by day of week
dow_map={0:'M', 1:'T', 2:'W', 3:'Th', 4:'F', 5:'Sa', 6:'Su'}
sns.countplot(x=PK.date.dt.dayofweek.map(dow_map), 
              order=dow_map.values())
plt.title('Police Killings By Day of Week')
plt.show()

svg

Unfortunately the dataset does not contain the times during the day of the shootings, so we can’t analyze that.

Around a quarter of individuals shot and killed by police were experiencing a mental-health crisis, according to the Washington Post (the source of the database).

# Plot how many individuals showed signs of mental illness
ax = sns.countplot(x='signs_of_mental_illness', data=PK)
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    y = p.get_bbox().get_points()[1,1]
    ax.annotate('{:.2g}%'.format(100.*y/len(PK)), 
                (x.mean(), y), ha='center', va='bottom')
plt.title('Individuals showing signs of mental illness\n'+
          'when shot by police')
plt.show()

svg

Most individuals were not fleeing when they were shot by police.

# Plot how many individuals were fleeing when shot
ax = sns.countplot(x='flee', data=PK)
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    y = p.get_bbox().get_points()[1,1]
    ax.annotate('{:.2g}%'.format(100.*y/len(PK)), 
                (x.mean(), y), ha='center', va='bottom')
plt.title('Method of fleeing\nwhen shot by police')
plt.show()

svg

According to the Post, “news reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident” during around 10% of the shootings.

# Plot how many shootings have body camera footage
ax = sns.countplot(x='body_camera', data=PK)
for p in ax.patches:
    x = p.get_bbox().get_points()[:,0]
    y = p.get_bbox().get_points()[1,1]
    ax.annotate('{:.2g}%'.format(100.*y/len(PK)), 
                (x.mean(), y), ha='center', va='bottom')
plt.title('Whether an officer was wearing a body camera'+
          '\nat the time of the shooting')
plt.show()

svg

The percentages of police shootings where the officer was wearing a body camera varied widely between states. In some states, officers were not wearing body cameras during the shootings. Other states had higher rates of body camera usage. However, keep in mind that for states with less shootings, we have a noisier estimate of actual body camera usage.

# Count the proportion of shootings w/ body camera by state
pcPK = PK.groupby('state').agg({'body_camera': 'mean'})

# Plot percent shootings w/ body camera by state
plt.figure(figsize=(6, 13))
order = pcPK.body_camera.sort_values(ascending=False).index
sns.barplot(y = pcPK.index, 
            x = 100.*pcPK.values.flatten(),
            order = order)
plt.title('Percent of Police Killings\n' + 
          'with body camera, by State')
plt.xlabel('Percent of Shootings with body camera')
plt.xlim([0, 100])
plt.show()

svg

Brendan Hasz

Brendan Hasz

Fatal Police Shootings EDA