云雨图

云雨图

本文是学习(翻)云(覆)雨图的笔记。当我们想表现数据的分布时,我们常常会绘制直方图和核密度图,但是这两种图的信息量都是非常低的。而云雨图(rainplot)不仅形象地表现了数据的分布还非常美观,感觉很不错。在最后我又绘制了自己本科前3年绩点分布的云雨图。

首先是表现数据的分布的一些方案:

violin + boxplot + raw data

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
library(ggplot2)
theme_set(theme_bw(base_size = 18,
base_family = 'STSong') +
theme(plot.margin =
grid::unit(c(0.8, 0.8, 0.8, 0.8), "cm")) +
theme(plot.title = element_text(family = 'STSongti-SC-Bold',
hjust = 0.1)))

ggplot(iris,
aes(x = Species,
y = Petal.Length,
fill = Species)) +
geom_violin(alpha = 0.5) +
geom_boxplot(width = 0.1) +
geom_jitter() +
theme(legend.position = "none")

violin + mean+-sd + raw data

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
library(dplyr)
d <- group_by(iris, Species) %>%
summarise(mean = mean(Petal.Length),
sd = sd(Petal.Length))

ggplot(iris,
aes(Species, Petal.Length,
fill = Species)) +
geom_violin(alpha = 0.5) +
geom_jitter() +
geom_pointrange(data = d,
aes(y = mean,
ymin = mean - sd,
ymax = mean + sd,
colour = Species),
size = 2) +
theme(legend.position = "none")

但是小提琴图虽然美观,但是存在一个问题(个人感觉)是,小提琴图是对称的,因此其对于图表空间是浪费的,所以不如把它的一半换成其它图层。

云雨图1:加上均值和标准误差

原推文作者余光创博士为了方便绘制云雨图特意写了一个R包gglayer,这个包提供了绘制半个小提琴图的图层函数。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# devtools::install_github("GuangchuangYu/gglayer")
library(gglayer)
ggplot(iris,
aes(Species, Petal.Length,
fill = Species)) +
geom_flat_violin(position = position_nudge(x = 0.2)) +
geom_jitter(aes(colour = Species), width = 0.15) +
geom_pointrange(aes(y = mean,
ymin = mean - sd,
ymax = mean + sd),
data = d, size = 1,
position = position_nudge(x = 0.25)) +
coord_flip() +
scale_fill_brewer(palette = 'Set2') +
scale_colour_brewer(palette = 'Set2') +
theme(legend.position = "none")

云雨图2:加上boxplot

R
1
2
3
4
5
6
7
8
9
10
11
12
13
ggplot(iris,
aes(Species, Petal.Length,
fll = Species)) +
geom_flat_violin(position = position_nudge(x = 0.3),
aes(fill = Species)) +
geom_jitter(aes(colour = Species), width = 0.15) +
geom_boxplot(width = 0.1,
position = position_nudge(x = 0.22),
aes(fill = Species)) +
scale_fill_brewer(palette = 'Set2') +
scale_colour_brewer(palette = 'Set2') +
coord_flip() +
theme(legend.position = 'none')

云雨图3:用堆叠的点图当雨点

这个是我觉得最漂亮的!

R
1
2
3
4
5
6
7
8
9
10
11
12
ggplot(iris,
aes(Species,
Petal.Length,
fill = Species)) +
geom_flat_violin(position = position_nudge(x = 0.2)) +
geom_dotplot(binaxis = 'y',
stackdir = 'down',
dotsize = 0.35) +
geom_boxplot(width = 0.1, position = position_nudge(x = 0.1)) +
scale_fill_brewer(palette = 'Set2') +
coord_flip() +
theme(legend.position = "none")

以后要多用用云雨图!

大学成绩分布的云雨图

这个数据是从教务处爬取的:

因为只有一个页面,所以浪费时间构造请求头进行请求是没有意义的,直接copy保存为一个html文件即可:
我的成绩单.html
然后使用xml包对这个文件进行处理即可:

R
1
2
3
4
5
6
7
8
9
library(XML)
html <- htmlParse("我的成绩单.html", encoding = 'UTF-8')
df <- readHTMLTable(html)
scores <- as.data.frame(df$Tbl_h)
scores <- scores[-1:-2,]
names(scores) <- c("学年", "学期", "课程名称", "学分", "成绩", "绩点", "考试日期", "考试性质", "课程类别", "修学类别", "取消否")

scores$成绩 <- as.numeric(as.character(scores$成绩))
scores$学年 <- factor(as.character(scores$学年), levels = rev(c("2015-2016", "2016-2017", "2017-2018")))

然后就可以绘图了:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
library(cowplot)
(p <- ggplot(scores,
aes(x = 学年,
y = 成绩,
fill = 学年)) +
geom_jitter(aes(colour = 学年), width = 0.15, size = 3) +
geom_boxplot(width = 0.1, position = position_nudge(x = 0.2)) +
geom_flat_violin(position = position_nudge(x = 0.3)) +
scale_fill_brewer(palette = 'Set2') +
scale_colour_brewer(palette = 'Set2') +
coord_flip() +
labs(title = "大一~大三成绩的分布\n",
y = '成绩') +
scale_x_discrete(labels = c("大三", "大二", "大一")) +
scale_y_continuous(breaks = seq(70, 100, 5),
labels = paste0(seq(70, 100, 5), "分")) +
theme_bw(base_size = 18,
base_family = 'STSong') +
theme(plot.title = element_text(family = 'STSongti-SC-Bold',
hjust = 0.1)) +
theme(axis.title.y = element_blank()) +
theme(plot.margin = grid::unit(c(1.5, 1, 2, 1), "cm")) +
theme(legend.position = 'none'))

ggdraw(p) +
draw_label("数据来源:暨南大学教务处", x = 0.84, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.66, y = 0.02, width = 0.06, height = 0.06)

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(p1 <- ggplot(scores,
aes(x = 学年,
y = 成绩,
fill = 学年)) +
geom_dotplot(binaxis = 'y',
stackdir = 'down',
dotsize = 0.35) +
geom_boxplot(width = 0.1, position = position_nudge(x = 0.2)) +
geom_flat_violin(position = position_nudge(x = 0.3)) +
scale_fill_brewer(palette = 'Set2') +
scale_colour_brewer(palette = 'Set2') +
coord_flip() +
labs(title = "大一~大三成绩的分布\n",
y = '成绩') +
scale_x_discrete(labels = c("大三", "大二", "大一")) +
scale_y_continuous(breaks = seq(70, 100, 5),
labels = paste0(seq(70, 100, 5), "分")) +
theme_bw(base_size = 18,
base_family = 'STSong') +
theme(plot.title = element_text(family = 'STSongti-SC-Bold',
hjust = 0.1)) +
theme(axis.title.y = element_blank()) +
theme(plot.margin = grid::unit(c(1.5, 1, 2, 1), "cm")) +
theme(legend.position = 'none'))

ggdraw(p1) +
draw_label("数据来源:暨南大学教务处", x = 0.83, y = 0.05, fontfamily = 'STSong', size = 14) +
draw_image("https://www.czxa.top/images/default28.png",
x = 0.66, y = 0.02, width = 0.06, height = 0.06)

Stata与小提琴图

Stata中有两个命令可以用来绘制小提琴图:

vioplot

这是个外部命令,安装方法:

Stata
1
ssc install vioplot, replace

基础用法

Stata
1
2
sysuse auto, clear
vioplot mpg, over(rep78)

Stata
1
vioplot mpg, over(rep78) over(foreign)

包含缺失值

Stata
1
vioplot mpg, over(rep78, missing) over(foreign)

不包含填充值

Stata
1
vioplot mpg, over(rep78) over(foreign) nofill

添加绘图元素

1
vioplot mpg, over(rep78) horizontal name(myplot, replace) title("Violin Plot of Mileage") subtitle("By repair record") ytitle(Repair Record) ylab(, angle(horiz))

双变量

Stata
1
vioplot gear head, over(foreign)

Stata
1
vioplot gear head, over(rep78) legend( ring(0) pos(2) cols(1)) xtitle("Categories of Repair Record")

带上总分布

Stata
1
vioplot gear head, over(rep78, missing) over(foreign, total) legend(ring(0) pos(2) cols(1))

violin

这同样是一个外部命令,安装方法为:

Stata
1
ssc install violin

绘图:

Stata
1
2
sysuse auto, clear
violin length, t1title(Auto data) l1title(length of car)

这种黑色背景的图是旧的Stata绘图语法绘制出来的,虽然导出来的图片是黑色的,但是如果复制粘贴到word文档上是白底的。

Stata
1
violin length weight, n(100) width(20) round(.01)

Stata
1
violin weight, by(foreign) parzen

# R, Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了604.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×