R语言中的数据长宽变换

R语言中的数据长宽变换

旧笔记整理出来的,记不清是哪篇推文的学习笔记了。本文介绍了数据表的长宽转换(注意不是数据框)。

先生成一些模拟数据

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
library(data.table)
set.seed(123)
market <- data.table(stkcd = 1:100,
type = sample(c(0,1), 100, replace = TRUE),
industry = sample(letters, 100, replace = TRUE),
roa2013 = runif(100, min = 0, max = 1),
roa2014 = runif(100, min = 0, max = 1),
roa2015 = runif(100, min = 0, max = 1),
lev2013 = runif(100, min = 0, max = 1),
lev2014 = runif(100, min = 0, max = 1),
lev2015 = runif(100, min = 0, max = 1)
)
head(market)

stkcd type industry roa2013 roa2014 roa2015 lev2013 lev2014 lev2015
1: 1 0 p 0.2387260 0.784575267 0.9860543 0.35360608 0.2372297 0.84493354
2: 2 1 i 0.9623589 0.009429905 0.1370675 0.36644144 0.6864904 0.26013247
3: 3 0 m 0.6013657 0.779065883 0.9053096 0.28710013 0.2258184 0.02314449
4: 4 1 y 0.5150297 0.729390652 0.5763018 0.07997291 0.3184946 0.86239954
5: 5 1 m 0.4025733 0.630131853 0.3954489 0.36545427 0.1739838 0.33458796
6: 6 0 x 0.8802465 0.480910830 0.4498025 0.17801381 0.8014296 0.63178887

长变宽

R
1
2
3
4
5
6
7
8
9
10
11
12
long <- melt(market, measure = patterns("^roa", "^lev"), value.name = c("roa", "lev"))
long[, "year":=as.integer(variable)+2012][, "variable":=NULL]
setorder(long, stkcd, year)
head(long)

stkcd type industry roa lev year
1: 1 0 p 0.238726027 0.3536061 2013
2: 1 0 p 0.784575267 0.2372297 2014
3: 1 0 p 0.986054297 0.8449335 2015
4: 2 1 i 0.962358936 0.3664414 2013
5: 2 1 i 0.009429905 0.6864904 2014
6: 2 1 i 0.137067471 0.2601325 2015

上面的方法采用正则表达式来匹配变量,此外上述把变换后的因子变量variable改成来数值型变量year。否侧转化的结果就是下面的样子了:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
dta <- melt(market, measure = patterns("^roa", "^lev"), value.name = c("roa", "lev"))
dta
stkcd type industry variable roa lev
1: 1 0 p 1 0.23872603 0.35360608
2: 2 1 i 1 0.96235894 0.36644144
3: 3 0 m 1 0.60136573 0.28710013
4: 4 1 y 1 0.51502973 0.07997291
5: 5 1 m 1 0.40257334 0.36545427
---
296: 96 0 m 3 0.99294402 0.40763066
297: 97 1 k 3 0.25309899 0.10371157
298: 98 0 r 3 0.04953844 0.28056517
299: 99 0 d 3 0.68632495 0.36146622
300: 100 1 o 3 0.78692735 0.25924484
> class(dta[, variable])
[1] "factor"

另外一种稍微麻烦点的方法

R
1
2
3
4
5
6
7
8
9
10
11
12
13
A = paste("roa", 2013:2015, sep = "")
B = paste("lev", 2013:2015, sep = "")
long = melt(market, measure = list(A, B), value.name = c("roa", "lev"))
long[, "year":= as.integer(variable)+2012][, "variable":=NULL]
setorder(long, stkcd, year)
head(long)
stkcd type industry roa lev year
1: 1 0 p 0.238726027 0.3536061 2013
2: 1 0 p 0.784575267 0.2372297 2014
3: 1 0 p 0.986054297 0.8449335 2015
4: 2 1 i 0.962358936 0.3664414 2013
5: 2 1 i 0.009429905 0.6864904 2014
6: 2 1 i 0.137067471 0.2601325 2015

宽变长

R
1
2
3
4
5
6
7
8
9
10
short <- dcast(long, stkcd+type+industry~year, value.var = c("roa", "lev"))
head(short)

stkcd type industry roa_2013 roa_2014 roa_2015 lev_2013 lev_2014 lev_2015
1: 1 0 p 0.2387260 0.784575267 0.9860543 0.35360608 0.2372297 0.84493354
2: 2 1 i 0.9623589 0.009429905 0.1370675 0.36644144 0.6864904 0.26013247
3: 3 0 m 0.6013657 0.779065883 0.9053096 0.28710013 0.2258184 0.02314449
4: 4 1 y 0.5150297 0.729390652 0.5763018 0.07997291 0.3184946 0.86239954
5: 5 1 m 0.4025733 0.630131853 0.3954489 0.36545427 0.1739838 0.33458796
6: 6 0 x 0.8802465 0.480910830 0.4498025 0.17801381 0.8014296 0.63178887
# R

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×