xsv——CSV命令行工具包

xsv——CSV命令行工具包

xsv是使用Rust编写的命令行工具,可以直接在终端对csv文件进行统计分析。感觉非常好用,可以用于索引,切片,分析,拆分和连接CSV文件

安装

1
brew install xsv

使用

下面的使用以平安银行的股票交易数据集为例讲解这个命令的使用,这个数据集的准备方法如下:

1
2
3
clear all
cntrade2 1
export delimited 000001.csv, replace

查看csv文件的表头

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ cd ~/Desktop
$ xsv headers 000001.csv
1 code
2 date
3 name
4 close
5 high
6 low
7 open
8 rit
9 turnover
10 volume
11 transaction
12 tcap
13 mcap

对数据集进行描述性统计

第一种方式是使用逗号分隔,看起来比较乱,可以使用重定向符号将结果输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ xsv stats 000001.csv

field,type,sum,min,max,min_length,max_length,mean,stddev
code,Integer,6565,1,1,1,1,1,0
date,Unicode,,1991-01-02,2018-10-19,10,10,,
name,Unicode,,S深发展A,深发展A,11,12,,
close,Float,113443.96998150015,5.0999999,67.410004,1,9,17.280117285833956,10.324483924101813
high,Float,115307.91995459993,5.23,67.410004,1,9,17.564039597044886,10.470507895748508
low,Float,111639.69995759985,5.0500002,67.410004,1,9,17.005285599025115,10.174592161632214
open,Float,113388.45999869998,5.0900002,67.410004,1,9,17.271661842909374,10.322549317079572
rit,Float,2.965034000000005,-0.503676,0.254019,0,8,0.00045171145642900704,0.028901188093691323
turnover,Float,6706.049099906857,0.00030000001,19.5016,1,12,1.021485011410031,1.2220806018624286
volume,Integer,203864545200,100,508605027,3,9,31053243.747143954,51094793.28653432
transaction,Float,2969128294862.2,2924,8596941824,4,10,452266305.38647354,735084569.6811728
tcap,Float,365218983767351.5,1209299000,259273211627,10,17,55631223726.93853,57859828698.68667
mcap,Float,294691433815609.8,660751000,255461909294,9,17,44888261053.40573,52783458158.10037

$ xsv stats 000001.csv > stats.csv

另外使用xsv table命令可以实现将表格对齐:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ xsv stats 000001.csv | xsv table
field type sum min max min_length max_length mean stddev
code Integer 6565 1 1 1 1 1 0
date Unicode 1991-01-02 2018-10-19 10 10
name Unicode S深发展A 深发展A 11 12
close Float 113443.96998150015 5.0999999 67.410004 1 9 17.280117285833956 10.324483924101813
high Float 115307.91995459993 5.23 67.410004 1 9 17.564039597044886 10.470507895748508
low Float 111639.69995759985 5.0500002 67.410004 1 9 17.005285599025115 10.174592161632214
open Float 113388.45999869998 5.0900002 67.410004 1 9 17.271661842909374 10.322549317079572
rit Float 2.965034000000005 -0.503676 0.254019 0 8 0.00045171145642900704 0.028901188093691323
turnover Float 6706.049099906857 0.00030000001 19.5016 1 12 1.021485011410031 1.2220806018624286
volume Integer 203864545200 100 508605027 3 9 31053243.747143954 51094793.28653432
transaction Float 2969128294862.2 2924 8596941824 4 10 452266305.38647354 735084569.6811728
tcap Float 365218983767351.5 1209299000 259273211627 10 17 55631223726.93853 57859828698.68667
mcap Float 294691433815609.8 660751000 255461909294 9 17 44888261053.40573 52783458158.10037

还可以通过创建索引并重新运行命令来加快速度:

1
2
xsv index 000001.csv
xsv stats 000001.csv --everything | xsv table

计数

1
2
$ xsv count 000001.csv
6565

展示数据集

例如从第6555索引初列示数据集(也就是展示最后十个):

1
2
3
4
5
6
7
8
9
10
11
12
$ xsv slice 000001.csv -s 6555 | xsv table
code date name close high low open rit turnover volume transaction tcap mcap
1 2018-10-08 平安银行 10.45 10.79 10.45 10.7 -.054299 .98210001 168635852 1793455232 179430798775 179429167059
1 2018-10-09 平安银行 10.56 10.7 10.39 10.46 .010526 .61970001 106408426 1117946496 181319544025 181317895134
1 2018-10-10 平安银行 10.45 10.66 10.38 10.54 -.010417 .57959998 99520008 1045666176 179430798775 179429167059
1 2018-10-11 平安银行 9.8599997 10.16 9.6999998 10.05 -.056459 1.1619999 199514383 1994186624 169300256069 169298716479
1 2018-10-12 平安银行 10.3 10.34 9.8699999 9.9700003 .044625 .88340002 151681028 1532651392 176855237070 176853628776
1 2018-10-15 平安银行 10.11 10.47 10.09 10.39 -.018447 .81690001 140268530 1443043072 173592858910 173591280284
1 2018-10-16 平安银行 10.37 10.54 10.12 10.14 .025717 .8136 139692105 1450020352 178057165865 178055546642
1 2018-10-17 平安银行 10.33 10.55 10.14 10.5 -.003857 .78680003 135088983 1400306304 177370349411 177368736433
1 2018-10-18 平安银行 10.09 10.29 10.06 10.29 -.023233 .58310002 100115605 1015654848 173249450683 173247875180
1 2018-10-19 平安银行 10.76 10.78 9.9200001 9.9499998 .066402 1.2135 208366732 2173571328 184753626298 184751946178

列示随机行

例如随机展示10行日期和收盘价数据,注意date和close之间只有逗号,不能有空格。

1
2
3
4
5
6
7
8
9
10
11
12
$ xsv select date,close 000001.csv | xsv sample 10 | xsv table
date close
2011-12-15 15.05
2003-04-24 11.96
1997-07-29 34.200001
1993-11-04 21.85
1999-07-27 21.92
2016-08-03 9.1800003
1992-06-11 38.799999
2009-10-14 21.23
2011-11-07 16.719999
1996-05-22 18.209999

频数表

1
2
3
4
5
6
7
$ xsv select code,name 000001.csv | xsv frequency --limit 5 | xsv table

field value count
code 1 6565
name 深发展A 4914
name 平安银行 1509
name S深发展A 142

排除空值

例如,只选择close不为空的行,随机选择10行:

1
$ xsv search -s close '[0-9]' 000001.csv | xsv sample 10 | xsv table

文件交叉连接

再准备一份万科的数据集:

1
2
3
4
5
cntrade2 2
foreach i of varlist _all{
ren `i' `i'2
}
export delimited 000002.csv, replace

进行连接:

1
2
3
4
5
6
7
8
9
10
$ xsv join date 000001.csv date2 000002.csv \
| xsv select name,date,close,name2,date2,close2 \
| xsv sample 10 \
| xsv table
name date close name2 date2 close2深发展A 2004-07-22 8.2299995 万 科A 2004-07-22 5.1399999
深发展A 1995-12-05 7 深万科A 1995-12-05 3.73深发展A 1997-04-15 28.1 深万科A 1997-04-15 15.23
深发展A 2003-08-25 10.52 万 科A 2003-08-25 6.4899998平安银行 2014-07-25 10.22 万 科A 2014-07-25 9.8100004
深发展A 1999-11-09 22.25 深万科A 1999-11-09 11.16深发展A 2003-12-30 8.6999998 万 科A 2003-12-30 6.4499998
S深发展A 2006-12-14 13.38 万 科A 2006-12-14 12.86深发展A 2003-06-26 11.37 万 科A 2003-06-26 6
深发展A 2000-06-05 18.1 深万科A 2000-06-05 12.4

如果横向拼接的识别变量在两个数据集里面只有大小写有区别,那么可以在join的后面加上--no-case表示忽略大小写区别的横向拼接。

变量重排

不显示变量date2:

1
2
3
4
$ xsv join date 000001.csv date2 000002.csv \
| xsv select '!date2' \
| xsv sample 10 \
| xsv table

重名变量的选择

再准备一份深金田A的数据集,与刚刚不同,这次不再重命名了:

1
2
cntrade2 3
export delimited 000003.csv, replace

横向拼接:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ xsv join date 000001.csv date 000003.csv \
| xsv select name[0],date[0],close[0],name[1],date[1],close[1] \
| xsv sample 10 \
| xsv table

name date close name date close
深发展A 1996-03-25 7.9499998 深金田A 1996-03-25 3.53
深发展A 1994-09-15 16.1 深金田A 1994-09-15 9.9499998
深发展A 1996-10-16 16.959999 深金田A 1996-10-16 7.5
深发展A 1995-07-27 9.1599998 深金田A 1995-07-27 4.6300001
深发展A 1999-09-17 24.27 深金田A 1999-09-17 6.0900002
深发展A 1995-04-11 10.68 深金田A 1995-04-11 5.5599999
深发展A 1998-12-23 15.05 深金田A 1998-12-23 5.0999999
深发展A 1991-07-31 29.540001 深金田A 1991-07-31 6.0500002
深发展A 1994-06-15 13.28 深金田A 1994-06-15 6
深发展A 1995-04-27 9.4899998 深金田A 1995-04-27 4.8600001

纵向连接

1
2
3
4
5
6
7
8
9
10
11
12
$ cat 000001.csv 000003.csv | xsv sample 10 |xsv table
code date name close high low open rit turnover volume transaction tcap mcap
1 2016-06-14 平安银行 10.4 10.41 10.32 10.34 .006776 .2244 27363726 283578304 148810231846 126803715782
1 1996-02-07 深发展A 6.1700001 6.1799998 6.1300001 6.1700001 0 .067500003 241100 1485726.1 3191632185.88 2203994819.26
1 2006-12-28 S深发展A 14.02 14.5 13.69 13.75 .024105 3.7943001 53476036 754610496 27280426529 19759254749.3
3 1993-03-19 深金田A 45.549999 45.75 44.849998 45.099998 .009978 .5675 343500 15515450 4550979802.55 2757079415.35
1 2018-07-20 平安银行 9.1099997 9.1999998 8.6099997 8.7299995 .043528 1.0217 175427740 1576843904 156422447544 156421025063
1 2006-02-24 深发展A 7.0300002 7.04 6.9000001 7.0100002 .002853 1.8837 26548048 185345904 13679129707.5 9907814613.950001
1 2016-06-01 平安银行 10.48 10.55 10.44 10.51 -.006635 .45660001 55675091 584161152 149954925937 127779128980
1 1995-06-29 深发展A 9.25 9.2799997 9.1800003 9.2799997 .001082 .080300003 238700 2203315 3987384901.5 2750687614.5
1 2002-03-26 深发展A 11.3 11.52 11.2 11.52 -.019097 .29800001 4199422 47472544 21987790283.7 15925790204.5
1 2006-11-23 S深发展A 12.65 13.18 12.51 12.88 -.017857 3.4340999 48399487 623603328 24614650184.8 17828428857.2

展平视图

例如显示000001.csv的第一组观测值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ xsv slice -i 0 000001.csv | xsv flatten
code 1
date 1991-01-02
name 深发展A
close 67.410004
high 67.410004
low 67.410004
open 67.410004
rit
turnover .15880001
volume 58900
transaction 3976000
tcap 4577139000
mcap 2500911000

其它的很多用法就知道怎么用的了。

# shell

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了659.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×