僵尸有多危险?

僵尸有多危险?

本文是学习zonination/zombies的笔记,本文的一个改进点是原文作者分组汇总的代码太太太麻烦了,我是直接使用SQL语句进行分组汇总。

闲话

据原作者的介绍,作者是受下图启发进行的这个分析:

确实非常有趣的图表。如果需要阅读原作者的推文,点击下面的链接即可跳转:Which Medium Has the Deadliest Zombies: TV, Movies, or Video Games?
作者还为我们配上了非常刺激的图片:

下面开始学习吧!作者收集了13部电影的问卷调查数据,数据如下:
zombies.csv

这13部电影分别是:

  1. 行尸走肉
  2. 僵尸世界大战
  3. 僵尸肖恩
  4. 活死人黎明
  5. 权利的游戏
  6. 僵尸之地
  7. 萤火虫
  8. 求生之路
  9. 鬼玩人
  10. 生化危机
  11. 使命召唤
  12. 我是传奇
  13. 惊变28天

作为一个丧尸剧的观影爱好者,大部分我都看过。。。昨天还刚刚看了行尸走肉的最新篇。

整理数据

首先把数据整理一下:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# 这份数据是针对人们对几部僵尸电影中僵尸的速度、力量和智能的打分结果。
zombies <- read.csv("zombies.csv")
library(ggplot2)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
zombies[zombies == "1: Least"] <- 1
zombies[zombies == "7: Greatest"] <- 7
zombies <- as.matrix(sapply(zombies, as.numeric))

# 1. 行尸走肉
wd <- data.frame(zombies[, 2:4])
names(wd) <- c("intel", "speed", "strength")
wd$what <- "行尸走肉"
wd$media <- "电视剧"
wd$kind <- "感染"

# 2. 僵尸世界大战
wwz <- data.frame(zombies[, 5:7])
names(wwz) <- c("intel", "speed", "strength")
wwz$what <- "僵尸世界大战"
wwz$media <- "电影"
wwz$kind <- "感染"

# 3. 僵尸肖恩
shaun <- data.frame(zombies[, 8:10])
names(shaun) <- c("intel", "speed", "strength")
shaun$what <- "僵尸肖恩"
shaun$media <- "电影"
shaun$kind <- "不清楚"

# 4. 活死人黎明
dawn <- data.frame(zombies[, 11:13])
names(dawn) <- c("intel", "speed", "strength")
dawn$what <- "活死人黎明"
dawn$media <- "电影"
dawn$kind <- "不清楚"

# 5. 权利的游戏
got <- data.frame(zombies[, 14:16])
names(got) <- c("intel", "speed", "strength")
got$what <- "权利的游戏"
got$media <- "电视剧"
got$kind <- "着魔"

# 6. 僵尸之地
zl <- data.frame(zombies[, 17:19])
names(zl) <- c("intel", "speed", "strength")
zl$what <- "僵尸之地"
zl$media <- "电影"
zl$kind <- "感染"

# 7. 萤火虫
firefly <- data.frame(zombies[, 20:22])
names(firefly) <- c("intel", "speed", "strength")
firefly$what <- "萤火虫"
firefly$media <- "电视剧"
firefly$kind <- "精神病"

# 8. 求生之路
left4dead <- data.frame(zombies[, 23:25])
names(left4dead)<-c("intel","speed","strength")
left4dead$what<-"求生之路"
left4dead$media <- "游戏"
left4dead$kind <- "感染"

# 9. 鬼玩人
armyofdarkness<-data.frame(zombies[,26:28])
names(armyofdarkness)<-c("intel","speed","strength")
armyofdarkness$what<-"鬼玩人"
armyofdarkness$media <- "电影"
armyofdarkness$kind <- "感染"

# 10. 生化危机
residentevil<-data.frame(zombies[,29:31])
names(residentevil)<-c("intel","speed","strength")
residentevil$what<-"生化危机"
residentevil$media <- "游戏"
residentevil$kind <- "感染"

# 11. 使命召唤
callofduty<-data.frame(zombies[,32:34])
names(callofduty)<-c("intel","speed","strength")
callofduty$what<-"使命召唤"
callofduty$media <- "游戏"
callofduty$kind <- "不清楚"

# 12. 我是传奇
iamlegend<-data.frame(zombies[,35:37])
names(iamlegend)<-c("intel","speed","strength")
iamlegend$what<-"我是传奇"
iamlegend$media <- "电影"
iamlegend$kind <- "感染"

# 13. 惊变28天
days28<-data.frame(zombies[,38:40])
names(days28)<-c("intel","speed","strength")
days28$what<-"惊变28天"
days28$media <- "电影"
days28$kind <- "感染"

# 合并数据集
alldata <- rbind(
wd, wwz, shaun, dawn,
got, zl, firefly,
left4dead, armyofdarkness,
residentevil, callofduty,
iamlegend, days28
)

展示所有数据

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
p1 <- ggplot(alldata,
aes(strength + speed, intel)) +
geom_jitter(size = 4,
alpha = 0.7,
colour = "steelblue") +
labs(title = "僵尸的速度、力量与智商\n",
x = "速度和力量",
y = "智商") +
stat_smooth(method = 'lm', colour = 'darkorange', se = F) +
scale_x_continuous(limits = c(2, 14)) +
scale_y_continuous(limits = c(1, 7)) +
theme_bw(base_family = 'STSong',
base_size = 18) +
theme(
plot.title = element_text(
hjust = 0.1,
family = 'STSongti-SC-Bold'
),
plot.margin = grid::unit(c(1, 0.8, 1.8, 0.8), "cm")
)
ggdraw(p1) +
draw_label("数据来源:https://github.com/zonination/zombies", x = 0.8, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.55, y = 0.02, width = 0.06, height = 0.06)

看来大多数观众都觉得电视剧、电影、游戏里的🧟‍🧟‍并不是很厉害。

按照僵尸来源分类

第二幅图是按照僵尸来源于哪本作品进行分类绘图:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
p2 <- ggplot(alldata,
aes(strength + speed, intel)) +
geom_jitter(size = 4,
alpha = 0.7,
aes(colour = what)) +
labs(title = "僵尸的速度、力量与智商\n",
x = "速度和力量",
y = "智商",
colour = '僵尸来源') +
stat_smooth(method = 'lm', colour = 'darkorange', se = F) +
scale_x_continuous(limits = c(2, 14)) +
scale_y_continuous(limits = c(1, 7)) +
scale_colour_manual(values = c(brewer.pal(10, "Paired"), brewer.pal(3, "Set3"))) +
theme_bw(base_family = 'STSong',
base_size = 18) +
theme(
plot.title = element_text(
hjust = 0.1,
family = 'STSongti-SC-Bold'
),
plot.margin = grid::unit(c(1, 0.8, 1.8, 0.8), "cm")
)

ggdraw(p2) +
draw_label("数据来源:https://github.com/zonination/zombies", x = 0.8, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.55, y = 0.02, width = 0.06, height = 0.06)

按照僵尸产生的原因

僵尸的产生是有多种理论的,现在比较流行的是感染某种🦠,例如行尸走肉生化危机

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
p3 <- ggplot(alldata,
aes(strength + speed, intel)) +
geom_jitter(size = 4, alpha = 0.7,
colour = "steelblue") +
labs(title = "人们对各种类型僵尸的评价\n",
x = "速度和力量",
y = "智商") +
geom_density2d(alpha = 0.5) +
facet_wrap(~kind, ncol = 2) +
scale_x_continuous(limits = c(2, 14)) +
scale_y_continuous(limits = c(1, 7)) +
scale_colour_manual(values = c(brewer.pal(10, "Paired"), brewer.pal(3, "Set3"))) +
theme_bw(base_family = 'STSong',
base_size = 18) +
theme(
plot.title = element_text(
hjust = 0.1,
family = 'STSongti-SC-Bold'
),
plot.margin = grid::unit(c(1, 0.8, 1.8, 0.8), "cm"))

ggdraw(p3) +
draw_label("数据来源:https://github.com/zonination/zombies", x = 0.8, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.55, y = 0.02, width = 0.06, height = 0.06)

这里绘制的是二维核密度估计,可以反映观点的分布。

观众对各种题材僵尸的评价

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
p4 <- ggplot(alldata,
aes(strength + speed, intel)) +
geom_jitter(size = 4, alpha = 0.7,
colour = "steelblue") +
labs(title = "人们对各种题材僵尸的评价\n",
x = "速度和力量",
y = "智商") +
geom_density2d(alpha = 0.5) +
facet_wrap(~media, ncol = 2) +
scale_x_continuous(limits = c(2, 14)) +
scale_y_continuous(limits = c(1, 7)) +
scale_colour_manual(values = c(brewer.pal(10, "Paired"), brewer.pal(3, "Set3"))) +
theme_bw(base_family = 'STSong',
base_size = 18) +
theme(
plot.title = element_text(
hjust = 0.1,
family = 'STSongti-SC-Bold'
),
plot.margin = grid::unit(c(1, 0.8, 1.8, 0.8), "cm"))

ggdraw(p4) +
draw_label("数据来源:https://github.com/zonination/zombies", x = 0.8, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.55, y = 0.02, width = 0.06, height = 0.06) +
draw_image("zombie-circle.jpg",
x = 0.54, y = 0.115, width = 0.42, height = 0.42)

计算均值

比起原作者的代码,使用SQL语句确实非常简洁,但是却没有去掉缺失值。

1
2
3
library(sqldf)
library(ggrepel)
avgdf <- sqldf("select avg(speed) as speed, avg(strength) as strength, avg(intel) as intel, what, media as Medium, kind as Type from alldata group by what")

由于没有去掉缺失值,所以我的计算结果比作者的计算结果偏小。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
p <- ggplot(data = avgdf) +
geom_point(aes(x = strength + speed,
y = intel,
colour = what,
size = 7)) +
geom_text_repel(
aes(x = strength + speed,
y = intel, label = what),
family = 'STSong',
box.padding = unit(.75, 'lines')) +
scale_x_continuous(
limits = c(min(avgdf$speed+avgdf$strength), max(avgdf$speed+avgdf$strength)),
breaks = c(min(avgdf$speed+avgdf$strength), (min(avgdf$speed+avgdf$strength) + max(avgdf$speed+avgdf$strength))/2 , max(avgdf$speed+avgdf$strength)),
labels = c("弱爆了",
"强壮/快速",
"极具攻击性")
) +
scale_y_continuous(
limits = c(min(avgdf$intel), max(avgdf$intel)),
breaks = c(min(avgdf$intel), (min(avgdf$intel) + max(avgdf$intel))/2 , max(avgdf$intel)),
labels = c(
"无智能/脑残",
"有一定智能",
"具有高度智能"
)
) +
scale_colour_manual(values = c(brewer.pal(10, "Paired"), brewer.pal(3, "Set3"))) +
labs(title = "僵尸有多危险?\n") +
theme_bw(base_family = 'STSong',
base_size = 18) +
theme(
plot.title = element_text(
hjust = 0.1,
family = 'STSongti-SC-Bold'
),
plot.margin = grid::unit(c(1, 0.8, 1.8, 0.8), "cm"),
legend.position = "none",
axis.title = element_blank())

ggdraw(p) +
draw_label("数据来源:https://github.com/zonination/zombies", x = 0.8, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.55, y = 0.02, width = 0.06, height = 0.06)

# R

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了659.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×