Stata中的变量名与标签

Stata中的变量名与标签

本文是旧笔记的汇总,本文汇总了Stata中标签相关内容的用法。

数据标签

文件标签

Stata
1
2
3
clear all
sysuse auto, clear
label data "1978年美国汽车数据"

变量标签

Stata
1
label var make "汽车型号"

赋值标签

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
. sum foreign

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
foreign | 74 .2972973 .4601885 0 1

. label list
origin:
0 Domestic
1 Foreign

. list foreign in 1/5

+----------+
| foreign |
|----------|
1. | Domestic |
2. | Domestic |
3. | Domestic |
4. | Domestic |
5. | Domestic |
+----------+

. list foreign in 1/5, nolabel

+---------+
| foreign |
|---------|
1. | 0 |
2. | 0 |
3. | 0 |
4. | 0 |
5. | 0 |
+---------+

. label drop origin

. label list

. label define origin 0 国产车 1 进口车

. label values foreign origin

. label list
origin:
0 国产车
1 进口车

为变量生成赋值标签

Stata
1
2
3
4
5
6
7
8
9
clear all
set obs 100
gen major = mod(_n, 4) + 1
label define ml 1 "金融学"
label define ml 2 "计量经济学", add
label define ml 3 "保险学", add
label define ml 4 "统计学", add
label value major ml
list in 1/5

labelrename:数值标签重命名

这是一个外部命令,首先需要安装:

Stata
1
net install dm0012.pkg, from("http://www.stata-journal.com/software/sj5-2/")

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
. sysuse auto, clear
(1978 Automobile Data)

. * 制作两个重复的赋值标签

. label def rep1 1 "1" 2 "2" 3 "3" 4 "4" 5 "5"

. labelbook

------------------------------------------------------------------------------------
value label origin
------------------------------------------------------------------------------------

values labels
range: [0,1] string length: [7,8]
N: 2 unique at full length: yes
gaps: no unique at length 12: yes
missing .*: 0 null string: no
leading/trailing blanks: no
numeric -> numeric: no
definition
0 Domestic
1 Foreign

variables: foreign


------------------------------------------------------------------------------------
value label rep1
------------------------------------------------------------------------------------

values labels
range: [1,5] string length: [1,1]
N: 5 unique at full length: yes
gaps: no unique at length 12: yes
missing .*: 0 null string: no
leading/trailing blanks: no
numeric -> numeric: yes
definition
1 1
2 2
3 3
4 4
5 5

variables:


.
. * 赋值标签重命名

. labelrename rep1 rep3

Value label rep1 renamed to rep3
Note: value label rep1 was not attached to any variable

. labelbook

------------------------------------------------------------------------------------
value label origin
------------------------------------------------------------------------------------

values labels
range: [0,1] string length: [7,8]
N: 2 unique at full length: yes
gaps: no unique at length 12: yes
missing .*: 0 null string: no
leading/trailing blanks: no
numeric -> numeric: no
definition
0 Domestic
1 Foreign

variables: foreign


------------------------------------------------------------------------------------
value label rep3
------------------------------------------------------------------------------------

values labels
range: [1,5] string length: [1,1]
N: 5 unique at full length: yes
gaps: no unique at length 12: yes
missing .*: 0 null string: no
leading/trailing blanks: no
numeric -> numeric: yes
definition
1 1
2 2
3 3
4 4
5 5

variables:

可以看到,数值标签rep1被重命名为rep3了。

labellist:列示赋值标签

这也是一个外部命令,安装:

Stata
1
ssc install labellist

使用:

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
. sysuse auto, clear
(1978 Automobile Data)

. labellist
origin:
0 Domestic
1 Foreign

. ret list

scalars:
r(origin_min) = 0
r(origin_max) = 1
r(origin_nemiss) = 0
r(origin_k) = 2

macros:
r(foreign_labels) : ""Domestic" "Foreign""
r(foreign_values) : "0 1"
r(foreign_lblname) : "origin"
r(foreign_varlabel) : "Car type"
r(varlist) : "make price mpg rep78 headroom trunk weight length turn.."

注意这里的返回值里面有r(varlist)存储着所有的变量名。

labelsof:查看某个变量的赋值标签

因为这也是一个外部命令,首先安装:

Stata
1
ssc install labelsof

使用:

Stata
1
2
3
4
5
6
7
8
9
. sysuse auto, clear
(1978 Automobile Data)

. labelsof foreign

foreign (origin):

0 Domestic
1 Foreign

查看返回值:

Stata
1
2
3
4
5
6
. ret list

macros:
r(name) : "origin"
r(values) : "0 1"
r(labels) : "`"Domestic"' `"Foreign"'"

定义一个赋值标签:

Stata
1
2
3
4
5
6
7
8
. label define yesno 1 "yes" 2 "no" .a "no answer"

. labelsof yesno, label

yesno:
1 yes
2 no
.a no answer

lablist:返回变量的值标签

这是个外部命令,首先安装:

Stata
1
ssc install lablist

使用:

Stata
1
sysuse auto, clear

检查所有变量的赋值标签

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
. lablist

Variable: make
No value label present

Variable: price
No value label present

Variable: mpg
No value label present

Variable: rep78
No value label present

Variable: headroom
No value label present

Variable: trunk
No value label present

Variable: weight
No value label present

Variable: length
No value label present

Variable: turn
No value label present

Variable: displacement
No value label present

Variable: gear_ratio
No value label present

Variable: foreign
Value label: origin
origin:
0 Domestic
1 Foreign

. ret list

scalars:
r(k) = 2
r(hasemiss) = 0
r(max) = 1
r(min) = 0

macros:
r(names) : "origin"

查看某个变量的赋值标签

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
. lablist mpg

Variable: mpg
No value label present

. ret list

. lablist foreign

Variable: foreign
Value label: origin
origin:
0 Domestic
1 Foreign

. ret list

scalars:
r(k) = 2
r(hasemiss) = 0
r(max) = 1
r(min) = 0

macros:
r(names) : "origin"

查看赋值标签的同时列示变量标签

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
. lablist, varlabel

Variable: make
Variable label: Make and Model
No value label present

Variable: price
Variable label: Price
No value label present

Variable: mpg
Variable label: Mileage (mpg)
No value label present

Variable: rep78
Variable label: Repair Record 1978
No value label present

Variable: headroom
Variable label: Headroom (in.)
No value label present

Variable: trunk
Variable label: Trunk space (cu. ft.)
No value label present

Variable: weight
Variable label: Weight (lbs.)
No value label present

Variable: length
Variable label: Length (in.)
No value label present

Variable: turn
Variable label: Turn Circle (ft.)
No value label present

Variable: displacement
Variable label: Displacement (cu. in.)
No value label present

Variable: gear_ratio
Variable label: Gear Ratio
No value label present

Variable: foreign
Variable label: Car type
Value label: origin
origin:
0 Domestic
1 Foreign

查看某个变量的赋值标签和变量标签

Stata
1
2
3
4
5
6
7
8
. lablist foreign, var

Variable: foreign
Variable label: Car type
Value label: origin
origin:
0 Domestic
1 Foreign

只列示有赋值标签的变量

Stata
1
2
3
4
5
6
7
. lablist, nounlabelled

Variable: foreign
Value label: origin
origin:
0 Domestic
1 Foreign

labeldup:报告和选择性的移除重复的数值标签

这个命令和刚刚的labelrename命令是同一个命令包的命令,所以不用再安装了。

Stata
1
2
3
4
5
6
7
8
9
10
* 报告和选择性的移除重复的数值标签
sysuse auto, clear
* 制作两个重复的数值标签
label def rep1 1 "1" 2 "2" 3 "3" 4 "4" 5 "5"
label def rep2 1 "1" 2 "2" 3 "3" 4 "4" 5 "5"
* 报告所有的重复的数值标签
labeldup
labeldup rep1, select
* 检查所有标签的概况
labelbook

findname:非常强大的变量筛选命令

首先安装:

Stata
1
ssc install findname

使用:

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
sysuse auto, clear

* 查看所有变量名并存储在返回值中
findname
ret list

* 查看所有字符串变量名并存储在返回值中
findname, type(string)
ret list

* 编辑所选变量
edit `r(varlist)'
* 浏览所选变量
browse `r(varlist)'

* 查看所有的str1, str2, ..., str20变量名并存储在返回值中
findname, type(1/20)

* 查看所有的数值型变量并存储在返回值中
findname, type(numeric)
* 把数值型变量排列在前排
order `r(varlist)'
* 对所有的数值型变量进行描述性统计
summarize `r(varlist)'

* byte / int
findname, type(byte int)

* float
findname, type(float)

* not float
findname, type(float) not

* 所有的日期变量
findname, format(%t* %-t*)

* 所有的只有整数值的变量
findname, all(@ == int(@))

* 所有的含负值的变量
findname, any(@ < 0)

* 所有的左对齐的字符串变量
findname, format(%-*s)

* 所有含千分符格式的变量
findname, format(*c)

* 所有含赋值标签的变量
findname, vall

* 所有赋值标签为origin的变量
findname, vall(origin)

* 所有带注释的变量
findname, char
* 为mpg变量添加注释
notes mpg: hidden treasure
findname, charname(note*)
* 查看mpg变量的注释
notes mpg

* 查看注释中含treasure的变量名
findname, chartext(*treasure*)

ds命令:列出符合条件的变量名称

这个命令的功能和findname命令的有些相似, findname可以看作ds的增强版。

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
sysuse auto, clear
* 列出所有变量
ds
ret list

* 列出所有的字符串变量并启动编辑窗口
ds, has(type string)
ed `r(varlist)'

* 列出str1, str2, str3, str4变量
ds, has(type 1/4)

* 列示所有的数值型变量
ds, has(type numeric)
order `r(varlist)'

* 列示出所有含赋值标签的变量
ds, has(vall)

* 列出所有的日期变量
ds, has(format %t* %-t*)

* 列示出所有左对齐的字符串变量
ds, has(format %-*s)

* 列示出所有含千分符的变量
ds, has(format *c)

* 列示所有含字符注释的变量
ds, has(char)

niceloglabels:产生对数坐标轴刻度

这是个外部命令,首先安装:

Stata
1
ssc install niceloglabels

使用:

Stata
1
2
3
4
sysuse census, clear

niceloglabels pop, local(yla) style(125)
quantile pop, yscale(log) ylabel(`yla', angle(horizontal)) rlopts(lcolor(none))

Stata
1
2
3
* 使用指数形式
niceloglabels pop, local(yla) style(125) powers
quantile pop, yscale(log) ylabel(`yla', angle(horizontal)) rlopts(lcolor(none))

Stata
1
2
3
4
generate pop2 = pop/1e6
label var pop2 "Population (m)"
niceloglabels pop2, local(yla) style(125)
quantile pop2, yscale(log) ylabel(`yla', angle(horizontal)) rlopts(lcolor(none))

Stata
1
2
3
4
5
6
7
8
9
. * 直接根据指定范围生成对数数列

. niceloglabels 2e2 2e4, local(yla) style(125)
200 500 1000 2000 5000 10000 20000

. niceloglabels 1e1 1e9, local(yla) style(1) powers
10 "10{sup:1}" 100 "10{sup:2}" 1000 "10{sup:3}" 10000 "10{sup:4}" 100000
> "10{sup:5}" 1000000 "10{sup:6}" 10000000 "10{sup:7}" 100000000 "10{sup:8
> }" 1000000000 "10{sup:9}"

retainlbl:存储变量标签

首先这是个外部命令,需要自行安装:

Stata
1
net install retainlbl.pkg, from("http://digital.cgdev.org/doc/stata/MO/Misc/")

使用:

Stata
1
2
3
4
5
6
7
webuse college, clear
* 把变量的标签存储起来
retainlbl gpa hour year, store
* 按年份分组求变量gpa和hour的加权均值
collapse (mean) gpa hour [fw = number], by(year)
* 将存储的标签映射给归总后的新变量
retainlbl gpa hour, restore addprefix("mean: ")

长标签自动换行

下面这个代码可以使得标签在指定长度处自动换行,实际上大多数时候手动换行即可(也就是每一行用双引号扩起来):

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sysuse auto, clear

label define origin 0 `"group0 group0 group0 group0 group0 group0 group0 group0 group0 group0"', modify

label define origin 1 "group1 group1 group1 group1 group1 group1 group1 group1 group1 group1 ", modify

local relabels
local relabels1

levelsof for, local(groups)

local s_len = 20
foreach g of local groups{
local label : label origin `g'
local len : length local label
if `len' > `s_len'{
forval i = 1/`=`len'/`s_len'+1'{
local p1: piece `i' `s_len' of `"`label'"', nobreak
local relabels `"`relabels' `=char(34) + "`p1'" + char(34)' "'
}
local relabels1 `relabels1' `=`g'+1' `"`relabels'"'
local relabels
}
}
di `relabels1'
gr hbar mpg, over(for, relabel(`relabels1'))

这里的relabels1的内容是:

Stata
1
2
3
4
. di `relabels1'
1 "group0 group0 group0" "group0 group0 group0" "group0 group0 group0" "gr
> oup0" 2 "group1 group1 group1" "group1 group1 group1" "group1 group1 grou
> p1" "group1"

包含总和的条形图

Stata
1
2
3
4
5
6
7
8
sysuse auto, clear
save auto1, replace
replace foreign = 3
label define orgin 0 "Foreign" 1 "Domestic" 3 "Total", add
label value foreign orgin
append using auto1

gr bar (mean) price, over(rep78, ) over(foreign) showyvars ascategory asyvar leg(off)

不同颜色的柱形图

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
clear
input ///
id w pos mark
1 1 1 69.55
2 1 2 65.16
3 1 3 64.91
4 1 4 64.53
5 1 5 63.70
6 0 6 84.58
7 0 7 84.51
8 0 8 84.12
9 0 9 83.34
10 0 10 82.8
end

label define kk ///
1 "Barbora - CZE " ///
2 "Christina - GER " ///
3 "Linda - GER " ///
4 "Sunette - RSA" ///
5 "Huihui - CHN" ///
6 "Keshorn - TRI " ///
7 "Oleksandr - UKR " ///
8 "Antti - FIN " ///
9 "Vitezslav - CZE" ///
10 "Tero - FIN"

label value pos kk

label define w 1 "Women" 0 "Men"
label value w w

gr bar (asis) mark, over(id) ///
over(pos, label(ang(45))) ///
over(w) yla(0(10)90, ang(45)) ///
blabel(bar, pos(inside) ///
format(%9.1f) color(black)) ///
leg(off) ///
bargap(5) ///
title("London Olympics 2012" "Javelin") ///
nofill yti("Metres") ///
bar(1, color(gold*0.8)) ///
bar(2, color(gold)) ///
bar(3, color(green*0.8)) ///
bar(4, color(green)) ///
bar(5, color(sienna*0.8)) ///
bar(6, color(sienna)) ///
bar(7, color(cyan*0.8)) ///
bar(8, color(cyan)) ///
bar(9, color(pink*0.8)) ///
bar(10, color(pink))
gr export btysdzt.png, replace

断层图

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
clear all
sysuse nlsw88, clear
gen ind_gr = industry
recode ind_gr 1/5 = 1 6 = 2 7 = 3 8/10 = 4 11 = 5 12 = 6
label define ind_gr 1 "手工业" ///
2 "贸易业" ///
3 "金融业" ///
4 "其它服务业" ///
5 "专业服务" ///
6 "公共部门"
label val ind_gr ind_gr
* 下面生成百分比排名序列
egen n = count(wage)
egen i = rank(wage) // 按照wage的大小顺序生成一个排名序列,其中最小值对应于1,最大值对应于这个序列的长度。
keep n i wage ind_gr
gsort i
gen hazen = (i - 0.5)/n * 100
label var hazen "收入的百分比排名"
mkspline s_w = hazen, cubic nknots(5)
mlogit ind_gr s_w*
predict pr*

* 绘图
gen zero = 0
gen one = 100
gen l1 = (pr1)*100
gen l2 = (pr1+pr2)*100
gen l3 = (pr1+pr2+pr3)*100
gen l4 = (pr1+pr2+pr3+pr4)*100
gen l5 = (pr1+pr2+pr3+pr4+pr5)*100

gsort hazen

* 搜集第二个纵轴的标签
local mid = l1[_N]/2
local yaxis `"`mid' "制造业""'
local mid = (l2[_N]-l1[_N])/2 + l1[_N]
local yaxis `"`yaxis' `mid' "贸易业""'
local mid = (l3[_N]-l2[_N])/2 + l2[_N]
local yaxis `"`yaxis' `mid' "金融业""'
local mid = (l4[_N]-l3[_N])/2 + l3[_N]
local yaxis `"`yaxis' `mid' "其他服务""'
local mid = (l5[_N]-l4[_N])/2 + l4[_N]
local yaxis `"`yaxis' `mid' "专业服务""'
local mid = (100-l5[_N])/2 + l5[_N]
local yaxis `"`yaxis' `mid' "公共部门""'

* 绘图
tw ///
rarea zero l1 hazen, yaxis(1) fcolor(navy) || ///
rarea l1 l2 hazen, yaxis(2) fcolor(maroon) || ///
rarea l2 l3 hazen, fcolor(green*0.4) || ///
rarea l3 l4 hazen, fcolor(orange*0.6) || ///
rarea l4 l5 hazen, fcolor(green*0.8) || ///
rarea l5 one hazen, fcolor(red*1.2) ///
yti("百分比") ylabel(`yaxis', axis(2) ang(-45) nogrid) ///
ysc(r(0 100) axis(1)) ysc(r(0 100) axis(2)) ///
yti("", axis(2)) plotr(margin(zero)) ///
aspect(1) leg(off) scheme(s1mono)
gr export 断层图.png, replace

分类汇总箱线图

Stata
1
2
3
4
5
6
7
8
9
10
11
sysuse auto, clear
gen order = _n
* 把数据集扩充3次
expand 3
bysort order: gen which = _n
drop if which == 1 & price > 5000
drop if which == 2 & price > 10000
label def which 1 " <= $5000" 2 "<= $10000" 3 "all"
label val which which
gr box mpg, over(which) over(foreign)
gre 分类汇总箱线图

分组着色柱条

这个分组着色的方案是通过twoway来实现的:

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
clear all

input ///
id w pos mark
1 0 6 84.58
2 0 7 84.51
3 0 8 84.12
4 0 9 83.34
5 0 10 82.8
6 1 1 69.55
7 1 2 65.16
8 1 3 64.91
9 1 4 64.53
10 1 5 63.70
end

label define w 1 "Women" 0 "Men"
label value w w

label define kk ///
1 "Keshorn - TRI " ///
2 "Oleksandr - UKR " ///
3 "Antti - FIN " ///
4 "Vitezslav - CZE" ///
5 "Tero - FIN" ///
6 " " ///
7 "Barbora - CZE " ///
8 "Christina - GER " ///
9 "Linda - GER " ///
10 "Sunette - RSA" ///
11 "Huihui - CHN"

label value id kk

replace id = id + 1 if id > 5

set obs `=_N+1'
replace id = 6 in l

tw bar mark id if id == 1, base(0) yla(0(10)90) ///
xti("Men Women") ///
xla(1(1)11, valuelabel ang(45)) ///
barw(0.4) color(cyan) || ///
bar mark id if id == 2, barw(0.4) color(cyan) || ///
bar mark id if id == 3, barw(0.4) color(cyan) || ///
bar mark id if id == 4, barw(0.4) color(cyan) || ///
bar mark id if id == 5, barw(0.4) color(cyan) || ///
bar mark id if id == 6, barw(0.4) color(cyan) || ///
bar mark id if id == 7, barw(0.4) color(red) || ///
bar mark id if id == 8, barw(0.4) color(red) || ///
bar mark id if id == 9, barw(0.4) color(red) || ///
bar mark id if id == 10, barw(0.4) color(red) || ///
bar mark id if id == 11, barw(0.4) color(red) || ///
sc mark id, mla(mark) ms(none) mlabsize(small)///
mlabpos(12) ||, ///
leg(off) title("London Olympics 2012" "Javelin") ///
plotr(m(l = 5 b = 0))

gr export fzzszt2.png, replace

三组处理效果图

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
* 生成示例数据
clear
set obs 76
gen treatment = mod(_n,3)+1

gen w_before = rnormal(82, 4.8)
replace w_before = rnormal(81, 5.7) if treatment == 2
replace w_before = rnormal(83, 5) if treatment == 3

gen w_after = rnormal(85.6, 8.3)
replace w_after = rnormal(81.1, 4.7) if treatment == 2
replace w_after = rnormal(90, 5.4) if treatment == 3

label var w_before " Weight before treatment, lb"
label var w_after " Weight after treatment, lb"
label define treat ///
1 "Cognitive behavioural" ///
2 "Control" ///
3 "Family therapy"

label values treatment treat

bysort treatment (w_before w_after): gen order1 = _n - _N/2
tw ///
pcarrow w_before order1 w_after order1 || ///
sc w_before order1, ms(o) msize(small) ///
xla(none) xti("") yla(, ang(0)) yti("Weight lb") ///
by(treatment, row(1) note("") legend(off))
gr export 三组处理效果图2.png, replace

local命令之label、type和dir子命令

local + label:获取变量的变量标签

Stata
1
2
3
4
5
6
7
8
9
/*1:使用local命令命名文件*/
clear all
set more off

global PATH "D:\数据库\中国市长市委书记数据库"
cd "$PATH"

import excel using ".\市委书记.xlsx",clear firstrow /*firstrow表示第一行是表头*/
des
Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/*我们想把变量名称换成英文的*/
foreach var of varlist *{
local label_name : var label `var'
if `"`label_name'"'=="省级政区代码"{
rename `var' p_id
}
if `"`label_name'"'=="省级政区名称"{
rename `var' p_name
}
if `"`label_name'"'=="地市级政区代码"{
rename `var' c_id
}
if `"`label_name'"'=="地市级政区名称"{
rename `var' c_name
}
if `"`label_name'"'=="年份"{
rename `var' year
}
if `"`label_name'"'=="市委书记姓名"{
rename `var' l_name
}
if `"`label_name'"'=="出生年份"{
rename `var' b_year
}
if `"`label_name'"'=="出生月份"{
rename `var' b_month
}
if `"`label_name'"'=="籍贯省份代码"{
rename `var' bp_id
}
if `"`label_name'"'=="籍贯省份名称"{
rename `var' bp_name
}
if `"`label_name'"'=="籍贯地市代码"{
rename `var' bc_id
}
if `"`label_name'"'=="籍贯地市名称"{
rename `var' bc_name
}
if `"`label_name'"'=="性别"{
rename `var' sex
}
if `"`label_name'"'=="民族"{
rename `var' nation
}
if `"`label_name'"'=="教育"{
rename `var' edc
}
if `"`label_name'"'=="是否是党校教育(是=1,否=0)"{
rename `var' dedc
}
if `"`label_name'"'=="专业:人文"{
rename `var' wk
}
if `"`label_name'"'=="专业:社科"{
rename `var' sk
}
if `"`label_name'"'=="专业:理工"{
rename `var' lg
}
if `"`label_name'"'=="专业:农科"{
rename `var' nk
}
if `"`label_name'"'=="专业:医科"{
rename `var' yk
}
if `"`label_name'"'=="是否是经济师(是=1,否=0)"{
rename `var' jjs
}
if `"`label_name'"'=="是否是工程师(是=1,否=0)"{
rename `var' gcs
}
if `"`label_name'"'=="入党年份"{
rename `var' rd_year
}
if `"`label_name'"'=="工作年份"{
rename `var' work_year
}
if `"`label_name'"'=="是否入伍(是=1,否=0)"{
rename `var' jm
}
if `"`label_name'"'=="工作经历:团委"{
rename `var' gztw
}
if `"`label_name'"'=="工作经历:秘书长副秘书长、办公室主任、**助理"{
rename `var' gzzl
}
}
最外层是针对数据集中的每一个变量的一个循环,通过foreach实现,其中的varlist*是使用了通配符,*表示任意字符,这里的含义就是var这个local会依次取数据集中的所有变量。进入循环后,首先是通过local命令获得当前变量的标签信息,并存放在局部宏label_name中。最后通过if语句判断,如果是对应标签,则改成相应的变量名。

local + type:分离不同类型的变量

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/*利用type分离不同类型的变量*/
/*下面要对随机生成的变量x进行缩尾处理*/
clear all
set more off
set obs 100
gen x = uniform()
winsor2 x, label cuts(1,99)
sum x*, d

/*那么哪些变量需要进行缩尾处理呢,连续变量。Stata中的变量有以下几种,我们自己随机生成一下*/
clear all
set more off
set obs 10
forvalues i = 1(2)5 {
gen x_`i' = uniform()
}
forvalues i=2(2)6{
gen x_`i'=uniform()
tostring x_`i',replace force
}
gen int x_7 = int(x_1*100)
gen double x_8 = x_3
gen byte x_9 = x_5>0.5
gen float x_10 = x_1
d x*

/*显然字符型变量是不能做winsor的。其次,对于byte类型的变量,常见的就是用于虚拟变量,也是不应该做winsorize的。
所以,我们的思路概括为:
1:针对所有变量循环;
2:如果变量类型为byte或者str#,则不做winsorize;
3:如果变量类型为其他,则做winsorize。*/
clear all
set more off

global PATH "D:\数据库\中国市长市委书记数据库"
cd "$PATH"

import excel using ".\市委书记.xlsx",clear firstrow /*firstrow表示第一行是表头*/

foreach var of varlist *{
local my_type: type `var'
disp "type of `var' is: `my_type'"
if strpos("`my_type'","str"){
continue
}
else if "`my_type'"=="byte"{
continue
}
else{
winsor2 `var', cuts(1,99) replace label
}
}

local + dir:删除特定文件

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/*erase命令可以删除磁盘文件,local下的dir函数可以识别出文件名*/
clear all
set more off
global PATH "D:\Desktop\Stata笔记\示例文件夹"
cd "$PATH"

set obs 10
gen x=uniform()
outsheet using ".\1.txt",replace
outsheet using ".\1 (1).txt",replace
outsheet using ".\2.txt",replace
outsheet using ".\2 (1).txt",replace
outsheet using ".\3.txt",replace
outsheet using ".\3 (1).txt",replace

local filelist: dir . files "* (1)*.txt"
/*上面的这句代码将当前目录下所有包含“ (1)”、并且以“.txt”为后缀名的文件名都放在了local filelist中*/
foreach file of local filelist{
disp "erase `file'"
erase ".\\`file'"
}
/*下面模拟生成子目录并在子目录下也生成备份冲突文件*/
clear all
set more off
global PATH "D:\Desktop\Stata笔记\示例文件夹"
cd "$PATH"

set obs 10
gen x=uniform()
outsheet using ".\1.txt",replace
outsheet using ".\1 - 副本.txt",replace
outsheet using ".\2.txt",replace
outsheet using ".\2 - 副本.txt",replace
outsheet using ".\3.txt",replace
outsheet using ".\3 - 副本.txt",replace

cap mkdir ".\abc"
cap mkdir ".\def"

cd "${PATH}\abc"
outsheet using ".\1.txt",replace
outsheet using ".\1 - 副本.txt",replace
outsheet using ".\2.txt",replace
outsheet using ".\2 - 副本.txt",replace
outsheet using ".\3.txt",replace
outsheet using ".\3 - 副本.txt",replace

cd "${PATH}\def"
outsheet using ".\1.txt",replace
outsheet using ".\2.txt",replace
outsheet using ".\3.txt",replace

cd "${PATH}"
cap program drop erasefile
program define erasefile
syntax, fromdir(string)
/*对于当前文件夹(由参数fromdir给定)下的对象,可以分为文件和文件夹两类
其中文件可以通过local localname: dir dirname files "*" 存放在局部宏localname中
而文件夹则可以通过 local localname: dir dirname dirs "*"存放*/

//1:文件->输出
local flist: dir "`fromdir'" files "* - 副本*.txt"
foreach file of local flist {
disp "erase `fromdir'/`file'"
erase "`fromdir'/`file'"
}
//2:子目录->递归调用主函数
local dlist: dir "`fromdir'" dirs "*"
foreach dir of local dlist{
erasefile , fromdir("`fromdir'/`dir'")
}
end

local cdir = "`c(pwd)'"
erasefile, fromdir("`cdir'")

总结

1
2
3
4
文件命名label命令:local local_name: var label `var'
文件类型type命令 :local my_type: type `var'
文件查找dir命令 : local filelist: dir "文件路径/`file'" "* - 副本*.txt"
文件夹查找dir命令:local dlist: dir "`文件路径'" dirs "*"

lookfor:在所有的变量名或者标签中进行字符串查找

Stata
1
2
3
webuse nlswork, clear
lookfor code
lookfor married

分组着色柱条

这里是利用gr bar命令的选项进行分组着色的:

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
clear

input ///
id w pos mark
1 1 1 69.55
2 1 2 65.16
3 1 3 64.91
4 1 4 64.53
5 1 5 63.70
6 0 6 84.58
7 0 7 84.51
8 0 8 84.12
9 0 9 83.34
10 0 10 82.8
end

label define kk ///
1 "Barbora - CZE " ///
2 "Christina - GER " ///
3 "Linda - GER " ///
4 "Sunette - RSA" ///
5 "Huihui - CHN" ///
6 "Keshorn - TRI " ///
7 "Oleksandr - UKR " ///
8 "Antti - FIN " ///
9 "Vitezslav - CZE" ///
10 "Tero - FIN"

label value pos kk

label define w 1 "Women" 0 "Men"
label value w w

gen g1 = mark if _n < 6
gen g2 = mark if _n > 5

gr bar (asis) g1 g2, ascat ///
over(mark, sort(mark) descending ///
relabel( ///
5 "Barbora - CZE " ///
4 "Christina - GER " ///
3 "Linda - GER " ///
2 "Sunette - RSA" ///
1 "Huihui - CHN" ///
10 "Keshorn - TRI " ///
9 "Oleksandr - UKR " ///
8 "Antti - FIN " ///
7 "Vitezslav - CZE" ///
6 "Tero - FIN" ///
) label(ang(45))) ///
over(w) yla(0(10)90) ///
yti("Metres") blabel(bar, ///
pos(inside) format(%9.1f) ///
color(white)) ///
leg(off) bargap(5) title("London Olympics 2012" "Javelin") ///
nofill bar(1, color(dknavy)) bar(2, color(red))
gr export fzzszt1.png, replace

# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了354.2k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×