Stata旧笔记整理(十三)

Stata旧笔记整理(十三)

之前老网站上有很多没有很好整理的笔记。之前也整理过一些,但是还有两百多篇,所以就简单汇总一下,便于检索。

strtoname命令:将标签名贴到变量名上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
. * 首先生成一个示例数据

. clear all

. set obs 1
number of observations (_N) was 0, now 1

. gen v1 = 1

. gen v2 = 2

. gen v3 = 3

. label var v1 "法人代码"

. label var v2 "企业 名称"

. label var v3 "省地 代码"

. * 贴

. foreach v of varlist _all{
2. local label_v : var label `v'
3. local new_v = strtoname("`label_v'", 1)
4. rename `v' `new_v'
5. }

. * strtoname(s,p)将字符串s转换为符合Stata要求的名称。其中p取0或1。当字符串中有不符合Stata命名规则的字符时,便以_代替。

. * 例如,strtoname("a name",1) = "a_name"

. * 对于参数p,当p=1并且字符串第一个字符为数值型字符串时,会自动在前面加_。

suest-似不相关估计

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/*
Syntax: suest namelist [, options]
where namelist is a list of one or more names under which estimation
results were stored via estimates store. Wildcards may be used. *
and _all refer to all stored results. A period (.) may be used to
refer to the last estimation results, even if they have not (yet)
been stored.
*/
clear all
webuse sysdsn4, clear
mlogit insure age male
est store m1
mlogit insure age male if insure != "Uninsure":insure
est store m2
mlogit insure age male if insure != "Prepaid":insure
est store m3
hausman m2 m1, alleqs constant
hausman m3 m1, alleqs constant
* 提示需要用suest做进一步检验
webuse income, clear
reg inc edu exp if male
est store m1
reg inc edu exp if !male
est store m2
suest m1 m2, vce(cluster famid)
test [m1_mean = m2_mean]

synth——合成控制法

1
2
3
4
5
6
7
. sysuse smoking, clear
(Tobacco Sales in 39 US States)

. tsset state year
panel variable: state (strongly balanced)
time variable: year, 1970 to 2000
delta: 1 unit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
. synth cigsale beer lnincome retprice age15to24  cigsale(1988) cigsale(1980) cigsale(1975), trunit(3)  tr
> period(1989) xperiod(1980(1)1988) nested fig
----------------------------------------------------------------------------------------------------------
Synthetic Control Method for Comparative Case Studies
----------------------------------------------------------------------------------------------------------

First Step: Data Setup
----------------------------------------------------------------------------------------------------------
control units: for 38 of out 38 units missing obs for predictor beer in period 1980 -ignored for averaging
control units: for 38 of out 38 units missing obs for predictor beer in period 1981 -ignored for averaging
control units: for 38 of out 38 units missing obs for predictor beer in period 1982 -ignored for averaging
control units: for 38 of out 38 units missing obs for predictor beer in period 1983 -ignored for averaging
treated unit: for 1 of out 1 units missing obs for predictor beer in period 1980 -ignored for averaging
treated unit: for 1 of out 1 units missing obs for predictor beer in period 1981 -ignored for averaging
treated unit: for 1 of out 1 units missing obs for predictor beer in period 1982 -ignored for averaging
treated unit: for 1 of out 1 units missing obs for predictor beer in period 1983 -ignored for averaging
----------------------------------------------------------------------------------------------------------
Data Setup successful
----------------------------------------------------------------------------------------------------------
Treated Unit: California
Control Units: Alabama, Arkansas, Colorado, Connecticut, Delaware, Georgia, Idaho,
Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Minnesota,
Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Mexico,
North Carolina, North Dakota, Ohio, Oklahoma, Pennsylvania, Rhode Island,
South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia,
West Virginia, Wisconsin, Wyoming
----------------------------------------------------------------------------------------------------------
Dependent Variable: cigsale
MSPE minimized for periods: 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984
1985 1986 1987 1988
Results obtained for periods: 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
2000
----------------------------------------------------------------------------------------------------------
Predictors: beer lnincome retprice age15to24 cigsale(1988) cigsale(1980) cigsale(1975)
----------------------------------------------------------------------------------------------------------
Unless period is specified
predictors are averaged over: 1980 1981 1982 1983 1984 1985 1986 1987 1988
----------------------------------------------------------------------------------------------------------

Second Step: Run Optimization
----------------------------------------------------------------------------------------------------------
Nested optimization requested
Starting nested optimization module
Optimization done
----------------------------------------------------------------------------------------------------------
Optimization done
----------------------------------------------------------------------------------------------------------

Third Step: Obtain Results
----------------------------------------------------------------------------------------------------------
Loss: Root Mean Squared Prediction Error

---------------------
RMSPE | 1.757183
---------------------
----------------------------------------------------------------------------------------------------------
Unit Weights:

----------------------------
Co_No | Unit_Weight
---------------+------------
Alabama | 0
Arkansas | 0
Colorado | .161
Connecticut | .068
Delaware | 0
Georgia | 0
Idaho | 0
Illinois | 0
Indiana | 0
Iowa | 0
Kansas | 0
Kentucky | 0
Louisiana | 0
Maine | 0
Minnesota | 0
Mississippi | 0
Missouri | 0
Montana | .201
Nebraska | 0
Nevada | .235
New Hampshire | 0
New Mexico | 0
North Carolina | 0
North Dakota | 0
Ohio | 0
Oklahoma | 0
Pennsylvania | 0
Rhode Island | 0
South Carolina | 0
South Dakota | 0
Tennessee | 0
Texas | 0
Utah | .335
Vermont | 0
Virginia | 0
West Virginia | 0
Wisconsin | 0
Wyoming | 0
----------------------------
----------------------------------------------------------------------------------------------------------
Predictor Balance:

------------------------------------------------------
| Treated Synthetic
-------------------------------+----------------------
beer | 24.28 24.21326
lnincome | 10.07656 9.858694
retprice | 89.42222 89.41464
age15to24 | .1735324 .1735444
cigsale(1988) | 90.1 91.6356
cigsale(1980) | 120.2 120.4545
cigsale(1975) | 127.1 127.0633
------------------------------------------------------
----------------------------------------------------------------------------------------------------------

. gr export synth1.png, replace
(file synth1.png written in PNG format)

. * synth: 合成控制法估计命令

. * 结果变量(被解释变量):

. * cigsale: 人均香烟销售量

. * 预测变量(解释变量):

. * beer: 人均啤酒消费量

. * lnincome:人均GDP取自然对数

. * retprice:香烟零售价格

. * age15to24:15-24岁人口所占总人口的比重

. * cigsale(1988) cigsale(1980) cigsale(1975):分别为1988、1980、1975年的人均香烟消费量

. * trunit(3): 指定实验处理组,加州的编号为3

. * trperiod(1989):指定政策开始实施的时期

. * xperiod(1980(1)1988): 预测变量beer、lnincome、retprice、age15to24均为1980-1988年的平均值。

. * nested:嵌套优化,寻找最优控制(寻找理想控制组)

. * fig:作图

tabplot

1
2
3
4
* Stata's auto data:
sysuse auto, clear

tabplot rep78

1
tabplot rep78, showval

1
tabplot rep78, showval horizontal

1
tabplot for rep78

1
tabplot for rep78, showval

1
tabplot for rep78, percent(foreign) showval(offset(0.05) format(%2.1f))

1
tabplot for rep78, percent(foreign) sep(foreign) bar1(bcolor(red*0.5)) bar2(bcolor(blue*0.5)) showval(offset(0.05) format(%2.1f)) subtitle(% by origin)

1
tabplot rep78 mpg, xasis barw(1) bstyle(histogram)

1
2
3
4
5
6
egen mean = mean(mpg), by(rep78)

gen rep78_2 = 6 - rep78 - 0.05

bysort rep78 : gen byte tag = _n == 1
tabplot rep78 mpg, xasis barw(1) bstyle(histogram) addplot(scatter rep78_2 mean if tag)

1
2
3
egen mean2 = mean(mpg), by(foreign rep78)
egen tag = tag(foreign rep78)
tabplot foreign rep78 if tag [iw=mean2], showval(format(%2.1f)) subtitle(mean miles per gallon)

  • Stata’s radiologist assessment data:
    1
    2
    webuse rate2, clear s
    tabplot rad?, percent showval

1
2
3
count
bysort rada radb : gen show = string(_N) + " " + string(_N * 100/85, "%2.1f") + "%"
tabplot rad?, showval(show) subtitle("frequency and %")

1
tabplot rad?, showval(show) xsc(alt) subtitle("frequency and %", pos(7))

  • Doran and Hodson (1975, p.259) gave these archaeological data:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    clear
    input levels freqcores freqblanks freqtools
    25 21 32 70
    24 36 52 115
    23 126 650 549
    22 159 2342 1633
    21 75 487 511
    20 176 1090 912
    19 132 713 578
    18 46 374 266
    17 550 6182 1541
    16 76 846 349
    15 17 182 51
    14 4 51 14
    13 29 228 130
    12 135 2227 729
    end
    reshape long freq, i(levels) j(type) string
    tabplot levels type [w=freq], bfcolor(none) horizontal barw(1) percent(levels) subtitle(% at each level) showval(offset(0.45)) xsc(r(0.8 .))

  • Greenacre (2007, p.42) gave these data from the Encuesta Nacional de la Salud (Spanish National Health Survey), 1997:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
clear
input byte(agegroup health) long freq
1 1 243
1 2 789
1 3 167
1 4 18
1 5 6
2 1 220
2 2 809
2 3 164
2 4 35
2 5 6
3 1 147
3 2 658
3 3 181
3 4 41
3 5 8
4 1 90
4 2 469
4 3 236
4 4 50
4 5 16
5 1 53
5 2 414
5 3 306
5 4 106
5 5 30
6 1 44
6 2 267
6 3 284
6 4 98
6 5 20
7 1 20
7 2 136
7 3 157
7 4 66
7 5 17
end
label values agegroup agegroup
label def agegroup 1 "16-24", modify
label def agegroup 2 "25-34", modify
label def agegroup 3 "35-44", modify
label def agegroup 4 "45-54", modify
label def agegroup 5 "55-64", modify
label def agegroup 6 "65-74", modify
label def agegroup 7 "75+", modify
label values health health
label def health 1 "very good", modify
label def health 2 "good", modify
label def health 3 "regular", modify
label def health 4 "bad", modify
label def health 5 "very bad", modify
tabplot health agegroup [w=freq] , percent(agegroup) showval subtitle(% of age group) xtitle("") bfcolor(none)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
clear
input str6 sex str8 year str1 policy int freq
"male" "1" "A" 175
"male" "1" "B" 116
"male" "1" "C" 131
"male" "1" "D" 17
"male" "2" "A" 160
"male" "2" "B" 126
"male" "2" "C" 135
"male" "2" "D" 21
"male" "3" "A" 132
"male" "3" "B" 120
"male" "3" "C" 154
"male" "3" "D" 29
"male" "4" "A" 145
"male" "4" "B" 95
"male" "4" "C" 185
"male" "4" "D" 44
"male" "Graduate" "A" 118
"male" "Graduate" "B" 176
"male" "Graduate" "C" 345
"male" "Graduate" "D" 141
"female" "1" "A" 13
"female" "1" "B" 19
"female" "1" "C" 40
"female" "1" "D" 5
"female" "2" "A" 5
"female" "2" "B" 9
"female" "2" "C" 33
"female" "2" "D" 3
"female" "3" "A" 22
"female" "3" "B" 29
"female" "3" "C" 110
"female" "3" "D" 6
"female" "4" "A" 12
"female" "4" "B" 21
"female" "4" "C" 58
"female" "4" "D" 10
"female" "Graduate" "A" 19
"female" "Graduate" "B" 27
"female" "Graduate" "C" 128
"female" "Graduate" "D" 13
end
tabplot policy year [w=freq], by(sex, subtitle(% by sex and year, place(w)) note("")) percent(sex year) showval

tabstat命令——统计量表格

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
* tabstat命令: Compact table of summary statistics
sysuse auto, clear
* 默认显示均值
tabstat price weight mpg rep78
* 按照rep78分类显示均值:
tabstat price weight mpg, by(rep78)
* 按照国别分类显示均值
tabstat price weight mpg rep78, by(foreign)
* 指定统计量:
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max)
* 不显示总共:
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) nototal
* 显示统计量的名字
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) nototal long
* 使用变量的格式显示统计量
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) nototal long format
* 指定格式:
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) nototal long format(%6.2f)
* 在纵轴显示变量名称而横轴显示统计量名称
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) nototal format(%6.2f) long col(stat)

tddens

1
2
3
4
5
6
7
8
* ssc install tddens
sysuse auto, clear
tddens price mpg, s b
gre tddens
* h tddens
* A "heat map" is produced by default;
* option s:graph shows a surface plot as points;
* option b:graph shows a bar graph that offers an alternative view of the surface.

tokenize命令

  • tokenize – Divide strings into tokens
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

. * tokenize -- Divide strings into tokens

. clear all

. tokenize some words

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|"
1 = |some|, 2 = |words|, 3 = ||

. tokenize "some more words"

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|, 4 = |`4'|"
1 = |some|, 2 = |more|, 3 = |words|, 4 = ||

. tokenize `""Marcello Pagan""Rino Bellocco""'

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|"
1 = |Marcello Pagan|, 2 = |Rino Bellocco|, 3 = ||

.
. local str "A strange++string"

. tokenize `str'

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|"
1 = |A|, 2 = |strange++string|, 3 = ||

.
. tokenize `str', parse(" +")

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|, 4 = |`4'|, 5 = |`5'|, 6 = |`6'|"
1 = |A|, 2 = |strange|, 3 = |+|, 4 = |+|, 5 = |string|, 6 = ||

.
. tokenize `str', parse("")

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|, 4 = |`4'|, 5 = |`5'|, 6 = |`6'|"
1 = |A|, 2 = |strange++string|, 3 = ||, 4 = ||, 5 = ||, 6 = ||

.
. tokenize

. di "1 = |`1'|, 2 = |`2'|, 3 = |`3'|"
1 = ||, 2 = ||, 3 = ||
  • tokenize命令经常用于ado编程中,例如我的cupdate命令(这个命令的下载部分遇到了问题,所以不打算上线了)。
  • 这个命令有两个子命令:check和install
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cap prog drop _all
prog defin cupdate
syntax anything, [Path(string) Itool(string) Curlpath(string) Wgetpath(string) Axelpath(string)]

tokenize `anything'
local anything "`2'"

if "`1'" == "check"{
cupdatecheck `anything'
exit
}

else{
cupdateinstall `anything'
exit
}
end

trace用法深入挖掘

爬虫俱乐部推文学习笔记

  1. tracedepth: 我们可以通过tracedepth来设置我们在追踪嵌套程序时所追踪的层数。例如我们定义一个小程序:regprice
1
2
3
4
5
6
program define regprice
sysuse auto, clear
reg price mpg length
end

regprice //只显示回归结果
  1. 如果我们使用set trace on命令,并且set tracedepth 1设置追踪1层,显示结果如下:
1
2
3
set tracedepth 1
set trace on
regprice
  • 不仅显示回归结果,还告诉我们这一条命令先执行了sysuse auto这一条命令,然后又执行了回归命令。
1
2
3
set tracedepth 2
set trace on
regprice
  1. tracenumber
  • 我们set tracenumber on 来追踪程序嵌套的层级,默认下tracenumber是off的。
1
2
3
4
set tracedepth 2
set tracenumber on
set trace on
regprice
  • regprice这一条主程序为第一层,我们设置了tracedepth为2,所以可以追踪到第二层和第三层。
  1. tracesep: 我们可以通过tracesep来设置是否显示每一层之间的水平分割线,默认的情况下这些
    分割线是显示的,我们可以通过set tracesep off来让这些分割线不显示。
1
2
3
4
5
set tracedepth 2
set tracenumber on
set tracesep off
set trace on
regprice
  1. traceindent: 缩进
1
2
3
4
5
6
set tracedepth 2
set traceindent off //关闭缩进
set tracenumber on
set tracesep off
set trace on
regprice
  1. tracehilite:在大量程序的运行结果中找到某一个关键词
1
2
3
4
5
6
set tracedepth 2
set tracenumber on
set tracehilite reg //highlight reg
set tracesep off
set trace on
regprice
  1. traceexpand:可以通过设置set traceexpand 命令来设置是否显示宏命令中宏的内容。默认设置为on
1
2
3
4
5
6
7
8
9
10
set trace on
forvalues i = 1/2 {
disp `i'
} //每一次执行disp `i'都会告诉我们这句命令的实际意思。

set trace on
set traceexpand off
forvalues i = 1/2 {
disp `i'
}

trellis

1
2
3
4
5
6
7
8
9
10
11
* 棚架图
* ssc install trellis
webuse nhanes2f, clear s
trellis, by(health region) f(graph box copper zinc iron) fopt(leg(off) ylab(50 175 300) ysc(r(50 310))) sr(2) sc(2) ///
singleopt(leg(on ring(0) pos(1) col(1) bm(tiny) ///
symx(*0.2) keyg(*0.2) region(m(zero) lw(none))) ysc(r(50 310)))
gr export trellis.png, width(3600) height(2400)
fs *.gph
foreach i in `r(files)'{
erase `i'
}

triplot

  • 仅作演示,数据没有意义
1
2
3
4
5
6
7
sysuse auto, clear
egen x = rowtotal(gear_ratio headroom rep78)
replace gear_ratio = gear_ratio/x
replace headroom = headroom/x
replace rep78 = rep78/x
triplot gear_ratio headroom rep78
gre triplot

ttable2命令——多变量均值检验

1
2
3
4
5
6
7
* ttable2命令:多变量均值检验
ssc install ttable2
clear all
sysuse auto, clear
ttable2 price mpg rep78 headroom trunk weight length turn, by(foreign)
*输出结果:
logout, save(tt2) word replace: ttable2 price mpg rep78 headroom trunk weight length turn, by(foreign)

scatter绘图

控制散点

1
2
3
4
5
6
tw sc ownhome propval100 ///
[aw = rent700], ///
ms(Sh) ///
mcolor(maroon) ///
msize(small)
gr export sc1.png, replace

使用标签

1
2
3
4
5
6
tw sc ownhome propval100, ///
mla(stateab) ///
mlabsize(large) ///
mlabpos(12) ///
ms(i)
gr export sc2.png, replace

egen+mlabvpos改变标签的方向减少遮盖:

1
2
3
4
5
6
egen clock = mlabvpos(ownhome propval100)
tw sc ownhome propval100, ///
mla(stateab) ///
mlabsize(small) ///
mlabvpos(clock)
gr export sc3.png, replace

connect选项

1
2
3
4
5
6
set scheme vg_rose
tw sc fv ownhome propval100, ///
connect(l i) ///
sort ///
ms(i .)
gr export sc4.png, replace
  • 连接第一个而不连接第二个
  • 第一个不显示点,第二个使用默认点型,i表示”invisible symbol”

图例

1
2
3
4
5
6
7
tw sc fv ownhome propval100, ///
connect(l i) ///
sort ///
ms(i .) ///
leg(label(1 "Pred. Perc. Own") ///
order(2 1) c(1))
gr export sc5.png, replace

轴标题/标签

1
2
3
4
5
6
7
8
9
10
tw sc fv ownhome propval100, ///
connect(l i) ///
sort ///
ms(i .) ///
leg(label(1 "Pred. Perc. Own") ///
order(2 1) c(1)) ///
yti("Percent who own homw", size(small)) ///
xla(#10) ///
yla(40(5)80, alt nogrid)
gr export sc6.png, replace

添加线条

1
2
3
4
5
6
7
8
9
10
11
tw sc fv ownhome propval100, ///
connect(l i) ///
sort ///
ms(i .) ///
leg(label(1 "Pred. Perc. Own") ///
order(2 1) c(1)) ///
yti("Percent who own homw", size(small)) ///
xla(#10) ///
yla(40(5)80, alt nogrid) ///
yline(55 75)
gr export sc7.png, replace

改变坐标系的方向

1
2
3
4
5
6
7
8
9
10
11
tw sc fv ownhome propval100, ///
connect(l i) ///
sort ///
ms(i .) ///
leg(label(1 "Pred. Perc. Own") ///
order(2 1) c(1)) ///
yti("Percent who own homw", size(small)) ///
xla(#10) ///
yla(40(5)80, alt nogrid) ///
xsc(alt)
gr export sc8.png, replace

分面

1
2
3
tw sc fv ownhome propval100, ///
by(nsw, total)
gr export sc9.png, replace

tw spike/dropline/dot/pcspike/pccapsym/pcarrow/pci/pcarrowi绘图

这些命令的使用与scatter大致相近。

spike

1
2
3
4
5
6
7
vguse allstates, clear
tw spike r yhat, ///
lcolor(red) ///
lw(thick) ///
base(10) ///
yline(10)
gr export spikery.png, replace

1
2
3
4
5
6
7
8
tw spike r yhat, ///
lcolor(red) ///
lw(thick) ///
base(10) ///
horiz ///
xti(Title for x axis) ///
yti(Title for y axis)
gr export spikery1.png, replace

dropline

1
2
3
4
5
6
7
tw dropline r yhat, ///
ms(D) ///
msize(large) ///
mcolor(purple) ///
mlwidth(thick) ///
lcolor(red)
gr export droplinery.png, replace

dot

1
2
3
4
5
6
7
vguse spjanfeb2001, clear
tw dot close tradeday, ///
msize(large) ms(O) ///
mfcolor(eltgreen) ///
mlcolor(emerald) ///
mlwidth(thick)
gr export dotry.png, replace

pcspike

1
2
3
4
5
6
7
8
9
10
11
vguse nlswide1, clear
egen clock = mlabvpos(wage88 hours88)
tw ///
pcspike wage68 hours68 wage88 hours88 || ///
sc wage88 hours88, ///
ms(i) ///
mlabel(occ) ///
mlabsize(small) ///
mlabvpos(clock) ///
sch(vg_blue)
gr export pcspikery.png, replace

pccapsym

1
2
3
4
tw ///
pccapsym wage68 hours68 wage88 hours88, ///
mlabel(occ) mlabsize(small) headlabel
gr export pccapsymry.png, replace

  • headlabel:控制标签放在头部

pcscatter: no lines

1
2
3
4
tw ///
pcscatter wage68 hours68 wage88 hours88, ///
mlabel(occ) mlabsize(small)
gr export pcscatterry.png, replace

pcarrow

1
2
3
4
tw ///
pcarrow wage68 hours68 wage88 hours88, ///
mlabel(occ) mlabsize(small) headlabel
gr export pcarrow.png, replace

pci/pcarrowi

1
2
3
4
5
6
7
vguse allstates, clear
tw ///
sc ownhome propval100 || ///
pci 42.5 26 42.5 61.3, ///
lwidth(medthick) ///
lcolor(red)
gr export pci1.png, replace

1
2
3
4
5
6
7
8
9
tw ///
sc ownhome propval100 || ///
pcarrowi 42.5 26 42.5 61.3, ///
lwidth(medthick) ///
lcolor(red) ///
msize(10) ///
barbsize(6) ///
mcolor(maroon)
gr export pcarrowi2.png, replace

venndiag——韦恩图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
clear
input hayfever eczema asthma freq
1 0 0 31088
1 1 0 9863
0 1 0 43522
0 1 1 9258
0 0 1 35299
1 0 1 11024
1 1 1 6200
0 0 0 345262
end
list
expand freq
venndiag asthma eczema hayfever

vtokenize——分隔变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
. * 默认使用空格分隔

. sysuse auto, clear
(1978 Automobile Data)

. keep make

. vtokenize make

. list if make_1 != "" & make_2 != "" & make_3 != "", sep(0)

+-----------------------------------------------+
| make make_1 make_2 make_3 |
|-----------------------------------------------|
17. | Chev. Monte Carlo Chev. Monte Carlo |
23. | Dodge St. Regis Dodge St. Regis |
27. | Linc. Mark V Linc. Mark V |
36. | Olds Cutl Supr Olds Cutl Supr |
38. | Olds Delta 88 Olds Delta 88 |
49. | Pont. Grand Prix Pont. Grand Prix |
50. | Pont. Le Mans Pont. Le Mans |
65. | Renault Le Car Renault Le Car |
+-----------------------------------------------+
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
. * 增加点作为分隔符,同时依然使用空格分隔并把分隔后的变量前缀记为ba
> r

. sysuse auto, clear
(1978 Automobile Data)

. keep make

. vtokenize make, stub(bar) parse(".")

. list if bar_1 != "" & bar_2 != "" & bar_3 != "" & bar_4 != "", sep
> (0)

+---------------------------------------------------+
| make bar_1 bar_2 bar_3 bar_4 |
|---------------------------------------------------|
17. | Chev. Monte Carlo Chev . Monte Carlo |
23. | Dodge St. Regis Dodge St . Regis |
27. | Linc. Mark V Linc . Mark V |
49. | Pont. Grand Prix Pont . Grand Prix |
50. | Pont. Le Mans Pont . Le Mans |
+---------------------------------------------------+

.
. * 不再使用空格分隔

. sysuse auto, clear
(1978 Automobile Data)

. keep make

. vtokenize make, stub(bar) parse(".") nospace

. list if bar_1 != "" & bar_2 != "", sep(0)

+----------------------------------------------------+
| make bar_1 bar_2 bar_3 |
|----------------------------------------------------|
11. | Cad. Deville Cad . Deville |
12. | Cad. Eldorado Cad . Eldorado |
13. | Cad. Seville Cad . Seville |
14. | Chev. Chevette Chev . Chevette |
15. | Chev. Impala Chev . Impala |
16. | Chev. Malibu Chev . Malibu |
17. | Chev. Monte Carlo Chev . Monte Carlo |
18. | Chev. Monza Chev . Monza |
19. | Chev. Nova Chev . Nova |
23. | Dodge St. Regis Dodge St . Regis |
26. | Linc. Continental Linc . Continental |
27. | Linc. Mark V Linc . Mark V |
28. | Linc. Versailles Linc . Versailles |
29. | Merc. Bobcat Merc . Bobcat |
30. | Merc. Cougar Merc . Cougar |
31. | Merc. Marquis Merc . Marquis |
32. | Merc. Monarch Merc . Monarch |
33. | Merc. XR-7 Merc . XR-7 |
34. | Merc. Zephyr Merc . Zephyr |
42. | Plym. Arrow Plym . Arrow |
43. | Plym. Champ Plym . Champ |
44. | Plym. Horizon Plym . Horizon |
45. | Plym. Sapporo Plym . Sapporo |
46. | Plym. Volare Plym . Volare |
47. | Pont. Catalina Pont . Catalina |
48. | Pont. Firebird Pont . Firebird |
49. | Pont. Grand Prix Pont . Grand Prix |
50. | Pont. Le Mans Pont . Le Mans |
51. | Pont. Phoenix Pont . Phoenix |
52. | Pont. Sunbird Pont . Sunbird |
+----------------------------------------------------+

.
. * 不把分隔符作为一个变量

. sysuse auto, clear
(1978 Automobile Data)

. keep make

. vtokenize make, stub(bar) parse(".") nospace nodelimiters

. list if bar_1 != "" & bar_2 != "", sep(0)

+--------------------------------------------+
| make bar_1 bar_2 |
|--------------------------------------------|
11. | Cad. Deville Cad Deville |
12. | Cad. Eldorado Cad Eldorado |
13. | Cad. Seville Cad Seville |
14. | Chev. Chevette Chev Chevette |
15. | Chev. Impala Chev Impala |
16. | Chev. Malibu Chev Malibu |
17. | Chev. Monte Carlo Chev Monte Carlo |
18. | Chev. Monza Chev Monza |
19. | Chev. Nova Chev Nova |
23. | Dodge St. Regis Dodge St Regis |
26. | Linc. Continental Linc Continental |
27. | Linc. Mark V Linc Mark V |
28. | Linc. Versailles Linc Versailles |
29. | Merc. Bobcat Merc Bobcat |
30. | Merc. Cougar Merc Cougar |
31. | Merc. Marquis Merc Marquis |
32. | Merc. Monarch Merc Monarch |
33. | Merc. XR-7 Merc XR-7 |
34. | Merc. Zephyr Merc Zephyr |
42. | Plym. Arrow Plym Arrow |
43. | Plym. Champ Plym Champ |
44. | Plym. Horizon Plym Horizon |
45. | Plym. Sapporo Plym Sapporo |
46. | Plym. Volare Plym Volare |
47. | Pont. Catalina Pont Catalina |
48. | Pont. Firebird Pont Firebird |
49. | Pont. Grand Prix Pont Grand Prix |
50. | Pont. Le Mans Pont Le Mans |
51. | Pont. Phoenix Pont Phoenix |
52. | Pont. Sunbird Pont Sunbird |
+--------------------------------------------+

winsor2——缩尾或修剪

如何处理离群值:winsor还是cut

  • 缩尾处理: winsorize
  • 缩尾处理:取个例子,对变量进行1%的缩尾处理是指,如果一个样本的某变量值大于该变量的99分位数,则该样本的值就会被强制指定99%分位数的值;类似的,如果一个样本某变量的值小于该变量的1%分位数,则将该样本该变量的值强制指定为1%分位数的值。Stata内置的winsor命令只能对样本进行对称的缩尾处理,且每次只能处理一个变量。如果我们想对样本进行一端缩尾或者直接删除尾部极端样本,就需要使用winsor2命令。winsor2的主要功能是可以同时对多个变量进行缩尾(winsorize)或者修剪(trim)处理:
1
winsor2 varlist [if] [in], [suffix*(string) replace trim cuts(# #) by(groupvar) label]
  • suffix():设置处理后新变量的后缀名,默认为_w或_tr;
  • replace:替换原有变量;
  • trim:将特定分位数上下的数值替换为缺失值;
  • cuts(# #):在具体的分位数上下进行缩尾或者修剪处理;
  • by:分组处理;
  • label:为新变量加标签。
1
2
3
4
5
6
7
sysuse nlsw88, clear
/*不加option选项时,默认对变量进行上下1%的缩尾处理,同时生成后缀名为_w的新变量*/
keep wage ttl_exp
winsor2 wage ttl_exp
winsor2 wage ttl_exp, cuts(0.5 99.5) replace
winsor2 wage ttl_exp, cuts(1 99) trim
winsor2 wage ttl_exp, cuts(0 99) replace

xi——交互项拓展

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
. * Interaction expansion

. sysuse auto, clear
(1978 Automobile Data)

. xi: regress mpg i.rep78
i.rep78 _Irep78_1-5 (naturally coded; _Irep78_1 omitted)

Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(4, 64) = 4.91
Model | 549.415777 4 137.353944 Prob > F = 0.0016
Residual | 1790.78712 64 27.9810488 R-squared = 0.2348
-------------+---------------------------------- Adj R-squared = 0.1869
Total | 2340.2029 68 34.4147485 Root MSE = 5.2897

------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Irep78_2 | -1.875 4.181884 -0.45 0.655 -10.22927 6.479274
_Irep78_3 | -1.566667 3.863059 -0.41 0.686 -9.284014 6.150681
_Irep78_4 | .6666667 3.942718 0.17 0.866 -7.209818 8.543152
_Irep78_5 | 6.363636 4.066234 1.56 0.123 -1.759599 14.48687
_cons | 21 3.740391 5.61 0.000 13.52771 28.47229
------------------------------------------------------------------------------

. * Interpretation: i.rep78 expanded to the dummies _Irep78_1, _Irep78_2, ...,
_Irep78_5. The numbers on the end are "naturally" coded in the sense that
_Irep78_1 corresponds to rep78==1, _Irep78_2 to rep78==2, etc. Finally, the
dummy for rep78==1 was omitted.

. xi: regress mpg i.make
i.make _Imake_1-74 (_Imake_1 for make==AMC Concord omitted)

Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(73, 0) = .
Model | 2443.45946 73 33.4720474 Prob > F = .
Residual | 0 0 . R-squared = 1.0000
-------------+---------------------------------- Adj R-squared = .
Total | 2443.45946 73 33.4720474 Root MSE = 0

------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Imake_2 | -5 . . . . .
_Imake_3 | 1.73e-13 . . . . .
_Imake_4 | -5 . . . . .
_Imake_5 | 1 . . . . .
_Imake_6 | 3 . . . . .
_Imake_7 | -2 . . . . .
_Imake_8 | -7 . . . . .
_Imake_9 | -4 . . . . .
_Imake_10 | 4 . . . . .
_Imake_11 | -2 . . . . .
_Imake_12 | -6 . . . . .
_Imake_13 | -3 . . . . .
_Imake_14 | -8 . . . . .
_Imake_15 | -8 . . . . .
_Imake_16 | -1 . . . . .
_Imake_17 | 7 . . . . .
_Imake_18 | -6 . . . . .
_Imake_19 | 1.72e-13 . . . . .
_Imake_20 | 1.72e-13 . . . . .
_Imake_21 | 2 . . . . .
_Imake_22 | -3 . . . . .
_Imake_23 | 1 . . . . .
_Imake_24 | 13 . . . . .
_Imake_25 | 2 . . . . .
_Imake_26 | -1 . . . . .
_Imake_27 | 8 . . . . .
_Imake_28 | -4 . . . . .
_Imake_29 | -6 . . . . .
_Imake_30 | -5 . . . . .
_Imake_31 | -1 . . . . .
_Imake_32 | 6 . . . . .
_Imake_33 | -1 . . . . .
_Imake_34 | 3 . . . . .
_Imake_35 | 6 . . . . .
_Imake_36 | -10 . . . . .
_Imake_37 | -10 . . . . .
_Imake_38 | -8 . . . . .
_Imake_39 | 8 . . . . .
_Imake_40 | 1.71e-13 . . . . .
_Imake_41 | -8 . . . . .
_Imake_42 | -7 . . . . .
_Imake_43 | -4 . . . . .
_Imake_44 | -8 . . . . .
_Imake_45 | -2 . . . . .
_Imake_46 | -1 . . . . .
_Imake_47 | -3 . . . . .
_Imake_48 | -3 . . . . .
_Imake_49 | -4 . . . . .
_Imake_50 | -3 . . . . .
_Imake_51 | 2 . . . . .
_Imake_52 | -6 . . . . .
_Imake_53 | -8 . . . . .
_Imake_54 | 6 . . . . .
_Imake_55 | 12 . . . . .
_Imake_56 | 3 . . . . .
_Imake_57 | 4 . . . . .
_Imake_58 | -4 . . . . .
_Imake_59 | -4 . . . . .
_Imake_60 | -4 . . . . .
_Imake_61 | -3 . . . . .
_Imake_62 | -3 . . . . .
_Imake_63 | -3 . . . . .
_Imake_64 | 2 . . . . .
_Imake_65 | 4 . . . . .
_Imake_66 | 13 . . . . .
_Imake_67 | -4 . . . . .
_Imake_68 | 9 . . . . .
_Imake_69 | -4 . . . . .
_Imake_70 | 1 . . . . .
_Imake_71 | 19 . . . . .
_Imake_72 | 3 . . . . .
_Imake_73 | 3 . . . . .
_Imake_74 | -5 . . . . .
_cons | 22 . . . . .
------------------------------------------------------------------------------

.
. * 另外xi也可以作为一个命令而不仅仅是前缀

. use bpress, clear
(fictional blood-pressure data)

. tab agegrp

Age Group | Freq. Percent Cum.
------------+-----------------------------------
30-45 | 80 33.33 33.33
46-59 | 80 33.33 66.67
60+ | 80 33.33 100.00
------------+-----------------------------------
Total | 240 100.00

. xi i.agegrp
i.agegrp _Iagegrp_1-3 (naturally coded; _Iagegrp_1 omitted)

. xi: reg bp i.agegrp
i.agegrp _Iagegrp_1-3 (naturally coded; _Iagegrp_1 omitted)

Source | SS df MS Number of obs = 240
-------------+---------------------------------- F(2, 237) = 22.96
Model | 6640.15833 2 3320.07917 Prob > F = 0.0000
Residual | 34272.6375 237 144.610285 R-squared = 0.1623
-------------+---------------------------------- Adj R-squared = 0.1552
Total | 40912.7958 239 171.183246 Root MSE = 12.025

------------------------------------------------------------------------------
bp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iagegrp_2 | 4.9375 1.901383 2.60 0.010 1.19173 8.68327
_Iagegrp_3 | 12.775 1.901383 6.72 0.000 9.02923 16.52077
_cons | 148 1.344481 110.08 0.000 145.3513 150.6487
------------------------------------------------------------------------------

. xi: reg patient i.agegrp i.sex
i.agegrp _Iagegrp_1-3 (naturally coded; _Iagegrp_1 omitted)
i.sex _Isex_0-1 (naturally coded; _Isex_0 omitted)

Source | SS df MS Number of obs = 240
-------------+---------------------------------- F(3, 236) = 2760.23
Model | 280000 3 93333.3333 Prob > F = 0.0000
Residual | 7980 236 33.8135593 R-squared = 0.9723
-------------+---------------------------------- Adj R-squared = 0.9719
Total | 287980 239 1204.93724 Root MSE = 5.8149

------------------------------------------------------------------------------
patient | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iagegrp_2 | 20 .9194232 21.75 0.000 18.18867 21.81133
_Iagegrp_3 | 40 .9194232 43.51 0.000 38.18867 41.81133
_Isex_1 | 60 .7507059 79.92 0.000 58.52106 61.47894
_cons | 10.5 .7507059 13.99 0.000 9.021059 11.97894
------------------------------------------------------------------------------

zdemo&zdemo2命令

1
2
3
4
5
6
* Install
// search zdemo
// search zdemo2

* zdemo:绘制带竖直线条的标准正态分布
zdemo 1.25

1
zdemo -1 0 +1 2 -2

1
2
* zdemo2:根据给定的两组均值和标准差绘制两个正态分布
zdemo2 50 5 60 10

zipfile&unzipfile命令

  • 这两个命令是用来在Stata中压缩和解压zip文件的命令
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
clear all
* 先制造一些数据文件
sysuse auto, clear
forval i = 1/100{
outsheet using auto`i'.txt, replace
save auto`i', replace
}
zipfile *.dta, saving(zipdta, replace)
zipfile *.txt, saving(ziptxt, replace)
zipfile auto*.*, saving(zipauto, replace)
*删除数据文件
forval i = 1/100{
erase auto`i'.txt
erase auto`i'.dta
}
*文件解压
unzipfile zipauto, replace
unzipfile zipdta, replace
unzipfile ziptxt, replace
# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了607.9k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×