美国传染病数据可视化

美国传染病数据可视化

本文是A look into U.S. infectious diseases的学习笔记,对美国的几种疾病的死亡率进行了可视化。

按照作者的介绍,他的想法来源于More Americans Are Dying From Suicide, Drug Use And Diarrhea,例如该文中使用的一张图:

美国下呼吸道感染死亡率分布。

首先作者使用自己整理的数据绘制了一幅直方图,这幅图展示了六种疾病死亡率变化最大(1980——2014)的几个县的数据:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
library(tidyverse)
# devtools::install_git("https://github.com/hrbrmstr/albersusa.git")
# albersusa是创建出色地图投影的绝佳方案。
library(albersusa)
library(RColorBrewer)
# devtools::install_git("https://github.com/awhstin/awtools.git")
# awtools包是作者自己的包,里面有一些很有趣的主题和函数。
library(awtools)

# download.file('https://raw.githubusercontent.com/awhstin/Dataset-List/master/diseases.csv', 'diseases.csv')
diseases <- read.csv('diseases.csv', stringsAsFactors = FALSE)

# 这个数据库是每个县的每种疾病每年有一个观测值。下面我们想要找到哪些州在这段时间内的变化最大。

state_diseases <- diseases %>%
select(c(2, 4:6)) %>%
mutate(state = gsub(".*, ", "", Location))

top_states <- state_diseases %>%
group_by(Disease, state) %>%
summarise(mean.change = mean(Percent.Change)) %>%
arrange(desc(mean.change)) %>%
group_by(Disease) %>%
slice(1:5) %>%
ungroup()

# 绘图
top_states %>%
ggplot(aes(x = state,
y = mean.change,
fill = Disease)) +
geom_bar(stat = 'identity',
show.legend = F) +
facet_wrap(~Disease, ncol = 2,
scales = 'free') +
hrbrthemes::theme_ipsum(base_family = 'STSongti-SC-Bold') +
theme(
axis.text.x = element_text(angle = 90,
hjust = 1)
) +
labs(caption = "数据来源:IHME全球健康数据\nhttp://ghdx.healthdata.org/us-data")

下面我们想绘制每个县每种疾病的感染率走势图,然后高亮出变化最大的那个县:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
top_diseases <- diseases %>%
arrange(desc(abs(Percent.Change))) %>%
group_by(Disease) %>%
slice(1:8)

top_diseases %>%
ggplot(aes(x = year,
y = rate,
color = Disease)) +
geom_line(data = diseases,
aes(group = Location),
colour = "grey",
alpha = 0.2) +
geom_line(show.legend = F) +
geom_label(data = subset(top_diseases, year == 1995),
aes(
label = gsub(',', '\n', Location),
x = year,
y = rate * 1.5,
group = Disease),
show.legend = F,
family = 'STSongti-SC-Light',
size = 3,
nudge_y = 1) +
facet_wrap(~Disease,
ncol = 2,
scales = 'free_y') +
a_primary_color() +
labs(title = '美国感染病',
subtitle = '十万人死亡率,1980——2014按县汇总,高亮的线为变化最大的县。',
caption = '数据来源:IHME全球健康数据\nhttp://ghdx.healthdata.org/us-data') +
hrbrthemes::theme_ipsum(base_family = 'STSongti-SC-Bold')

绘制地图:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
change_disease <- diseases %>%
select(c(2, 3, 4, 5)) %>%
distinct()

cty_sf <- counties_sf('aeqd')
cty_sf$Location <- paste0(cty_sf$name,
' ',
cty_sf$lsad,
', ',
cty_sf$state)
cty_disease <- left_join(cty_sf, change_disease,
by = c('Location'))

us <- usa_sf('aeqd')
us_map <- fortify(us, region = "name")

# 下呼吸道感染分布
subset(cty_disease, Disease == 'Lower respiratory infections') %>%
ggplot(aes(
fill = Percent.Change,
color = Percent.Change
)) +
geom_sf() +
scale_fill_distiller(palette = 'Spectral',
na.value = 'white') +
scale_color_distiller(palette = 'Spectral',
na.value = 'white') +
geom_sf(data = us_map, color = 'white',
size = 0.25, fill = NA) +
hrbrthemes::theme_ipsum(base_family = 'STSongti-SC-Bold') +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) +
labs(title = '下呼吸道感染死亡率分布',
subtitle = '1980——2014十万人中的死亡率变化',
caption = '数据来源:IHME全球健康数据\nhttp://ghdx.healthdata.org/us-data')

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 结核病分布
subset(cty_disease, Disease == 'Tuberculosis') %>%
ggplot(aes(
fill = Percent.Change,
color = Percent.Change
)) +
geom_sf() +
scale_fill_distiller(palette = 'Spectral',
na.value = 'white') +
scale_color_distiller(palette = 'Spectral',
na.value = 'white') +
geom_sf(data = us_map, color = 'white',
size = 0.25, fill = NA) +
hrbrthemes::theme_ipsum(base_family = 'STSongti-SC-Bold') +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) +
labs(title = '结核病感染死亡率分布',
subtitle = '1980——2014十万人中的死亡率变化',
caption = '数据来源:IHME全球健康数据\nhttp://ghdx.healthdata.org/us-data')

# R

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×