华盛顿邮报破案率数据

华盛顿邮报破案率数据

本文是Friday follow-up: Washington Post Homicides Database的学习笔记。

整理数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library(tidyverse)
library(ggrepel)
library(awtools)

homicides <- read_csv('https://raw.githubusercontent.com/washingtonpost/data-homicides/master/homicide-data.csv')

homicides <- homicides %>%
mutate(race = ifelse(victim_race == 'White', 'white', 'minority'),
arrested = ifelse(disposition == 'Closed by arrest', 1, 0),
open = ifelse(disposition != 'Closed by arrest', 1, 0))

cities <- homicides %>%
group_by(city) %>%
summarise(city.rate = sum(arrested)/sum(arrested, open)*100)

city.homicides <- homicides %>%
group_by(city, race) %>%
summarise(rate = sum(arrested)/sum(arrested, open)*100) %>%
left_join(., cities)

library(reshape2)
hs <- city.homicides[,1:3] %>%
dcast(city ~ race) %>%
mutate(diff = abs(white - minority)) %>%
left_join(cities) %>%
arrange(-diff) %>%
slice(1:10) %>%
left_join(city.homicides) %>%
select(c('city', 'diff', 'city.rate', 'rate'))

绘图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
ggplot(city.homicides,
aes(x = rate,
y = city.rate,
color = race,
group = city)) +
geom_line(color = ifelse(
city.homicides$city %in% hs$city,
'#444444', '#dedede')) +
geom_point(alpha = ifelse(
city.homicides$city %in% hs$city,
0.75, 0.25),
size = ifelse(
city.homicides$city %in% hs$city,
2, 1)) +
geom_label_repel(
data = hs,
aes(x = rate,
y = city.rate,
label = paste0(city, ' ', round(abs(diff)), '%')),
nudge_x = -1.5,
color = '#444444',
family = 'STSongti-SC-Bold',
size = 3) +
hrbrthemes::theme_ipsum(
base_family = 'STSongti-SC-Bold'
) +
a_main_color("种族") +
theme(legend.position = 'top') +
labs(x = '分种族破案率',
y = '种族破案率',
title = '破案率与种族差异',
subtitle = '带标签的十个城市是种族差异最大的十个城市')

# R

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×