Stata旧笔记整理(二)

Stata旧笔记整理(二)

之前老网站上有很多没有很好整理的笔记。之前也整理过一些,但是还有两百多篇,所以就简单汇总一下,便于检索。

批量提取统计检验的结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sysuse auto, clear
mat a = J(6,2,0)
local i = 1
foreach v of varlist price mpg rep78 headroom trunk weight{
qui ttest `v' == 0
mat a[`i',1] = r(t)
mat a[`i',2] = r(p_u)
local i = `i' + 1
}
mat list a
svmat a, names(col)
// 这个命令可以将矩阵转化成dta文件里的变量,同时使用矩阵中的列名命名新变量
rename c1 t
rename c2 p_value
list t p_value in 1/6

利用蒲松投针试验计算圆周率

1
2
3
4
5
6
7
8
9
10
11
12
13
clear
set more off
set obs 1000000
gen y = uniform()*100
gen theta = uniform()*c(pi)
gen d = min(mod(y,1), 1-mod(y,1))
gen sin = 0.5*sin(theta)
gen hit = (d <= sin)
sum hit
local temp = r(mean)
di `temp'
local pi = 2/`temp'
di "Pi = `pi'"

上面的模拟程序存在的问题是使用了三角函数,因此有循环论证的嫌疑,下面我们来进行改进:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
clear all
set more off
set seed 2345
set obs 1000000
gen y0 = uniform()*100
gen x0 = uniform()*100
gen y2 = uniform()*100
gen x2 = uniform()*100
gen x1 = x0 + 1/sqrt((x2-x0)^2+(y2-y0)^2)*(x2-x0)
gen y1 = y0 + 1/sqrt((x2-x0)^2+(y2-y0)^2)*(y2-y0)
gen hit = (int(y0)~=int(y1))
sum hit
local temp = r(mean)
di `temp'
local pi = 2/`temp'
disp "Pi = `pi'"

如果上面的程序正确,100万次模拟应该可以给出比较准确的pi值,但是结果3.14535显然和pi的真实值有一定的差距。因此我们有理由怀疑以上程序的正确性。是事实上,这一误差的根源在于我们设定的方式不能给出theta的一个均匀分布。笛卡尔坐标系上的均匀分布对应于极坐标系却并非均匀分布。我们再次改进程序:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
clear all
set more off
set seed 6789
set obs 20000000
gen y0 = uniform()
gen x0 = uniform()
keep if y0^2 + x0^2 <= 1
keep if _n <= 10000000
gen x1 = uniform()*100
gen y1 = uniform()*100
gen x2 = x1 + x0/sqrt(x0^2 + y0^2)
gen y2 = y1 + y0/sqrt(x0^2 + y0^2)
gen hit = (int(y1)~=int(y2))
sum hit
local temp = r(mean)
local pi = 2/`temp'
di "Pi = `pi'"

上面程序一千万次的模拟结果为3.1427,精确度有一定提升,下面我们再重复进行多次模拟:
simulate命令的格式为:
simulate [exp_list], reps(#) [options]: command

  • exp_list: 执行完冒号后面的命令后,Stata要计算的返回值是什么,如果没有指定,则使用缺省的返回值,也就是计算所有的数值型返回值;
  • reps(#):设定simulate执行冒号后面命令的次数
1
2
3
4
5
clear all
set more off
sysuse auto, clear
simulate, reps(100): sum mpg
sum

多次蒲松投针实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
clear all
set more off
cap program drop buffon //如果内存中已经定义了buffon,则将这一子程序删除
program define buffon, rclass
version 14.2
syntax[, obs(integer 1000), grade(int 100)]
drop _all
local obs2 = int(`obs'*1.5)
set obs `obs2'
tempvar x0 y0 x1 y1 x2 y2
gen `x0' = uniform()
gen `y0' = uniform()
keep if `x0'^2 + `y0'^2 <= 1
keep if _n <= `obs'
gen `x1' = uniform()*100
gen `y1' = uniform()*100
gen `x2' = `x1' + `x0'/sqrt(`x0'^2 + `y0'^2)
gen `y2' = `y1' + `y0'/sqrt(`x0'^2 + `y0'^2)
gen hit = (int(`x1') ~= int(`x2'))
sum hit
local temp = r(mean)
local pi = 2/`temp'
return scalar pi = `pi'
end
simulate pi = r(pi), reps(1000): buffon, obs(1000000)
di "Pi = `pi'"

方圆鱼缸实验:

1
2
3
4
5
6
7
8
9
10
11
clear all
set more off
set seed 34567
set obs 1000000
gen x = uniform()*2-1
gen y = uniform()*2-1
gen inside = (x^2+y^2<=1)
sum inside
local temp = r(mean)
local pi = 4*`temp'
di "Pi = `pi'"

多次方圆鱼缸实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
clear all
set more off
cap program drop circle_square
program define circle_square, rclass
version 14.2
syntax[, obs(integer 1000)]
drop _all
set obs `obs'
tempvar x
gen `x' = uniform()*2 - 1
tempvar y
gen `y' = uniform()*2 - 1
tempvar inside
gen `inside' = (`x'^2 + `y'^2 <= 1)
summarize `inside'
local pi = 4*r(mean)
local Var = 16*r(Var)
return scalar pi = `pi'
return scalar Var = `Var'
end
simulate pi = r(pi) var = r(Var), rep(1000): circle_square, obs(1000)
di "Pi = `pi'"

生成低频时间序列

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
. use air2, clear
(TIMESLAB: Airline passengers)

. list in 1/5

+--------------------+
| air time t |
|--------------------|
1. | 112 1949 1 |
2. | 118 1949.083 2 |
3. | 132 1949.167 3 |
4. | 129 1949.25 4 |
5. | 121 1949.333 5 |
+--------------------+

. tsmktim ym, start(1949m1)
time variable: ym, 1949m1 to 1960m12
delta: 1 month

. list in 1/5

+-----------------------------+
| air time t ym |
|-----------------------------|
1. | 112 1949 1 1949m1 |
2. | 118 1949.083 2 1949m2 |
3. | 132 1949.167 3 1949m3 |
4. | 129 1949.25 4 1949m4 |
5. | 121 1949.333 5 1949m5 |
+-----------------------------+

. gen mnr = month(dofm(ym))

. list in 1/5

+-----------------------------------+
| air time t ym mnr |
|-----------------------------------|
1. | 112 1949 1 1949m1 1 |
2. | 118 1949.083 2 1949m2 2 |
3. | 132 1949.167 3 1949m3 3 |
4. | 129 1949.25 4 1949m4 4 |
5. | 121 1949.333 5 1949m5 5 |
+-----------------------------------+

. gen eoq = (mod(mnr, 3) == 0)

. list in 1/5

+-----------------------------------------+
| air time t ym mnr eoq |
|-----------------------------------------|
1. | 112 1949 1 1949m1 1 0 |
2. | 118 1949.083 2 1949m2 2 0 |
3. | 132 1949.167 3 1949m3 3 1 |
4. | 129 1949.25 4 1949m4 4 0 |
5. | 121 1949.333 5 1949m5 5 0 |
+-----------------------------------------+

. preserve

. keep if eoq
(96 observations deleted)

. tsmktim yq, start(1949q1)
time variable: yq, 1949q1 to 1960q4
delta: 1 quarter

. list in 1/5

+----------------------------------------------------+
| air time t ym mnr eoq yq |
|----------------------------------------------------|
1. | 132 1949.167 3 1949m3 3 1 1949q1 |
2. | 135 1949.417 6 1949m6 6 1 1949q2 |
3. | 136 1949.667 9 1949m9 9 1 1949q3 |
4. | 118 1949.917 12 1949m12 12 1 1949q4 |
5. | 141 1950.167 15 1950m3 3 1 1950q1 |
+----------------------------------------------------+

. restore, preserve

. gen qtr = qofd(dofm(ym))

. format qtr %tq

. l in 1/5

+--------------------------------------------------+
| air time t ym mnr eoq qtr |
|--------------------------------------------------|
1. | 112 1949 1 1949m1 1 0 1949q1 |
2. | 118 1949.083 2 1949m2 2 0 1949q1 |
3. | 132 1949.167 3 1949m3 3 1 1949q1 |
4. | 129 1949.25 4 1949m4 4 0 1949q2 |
5. | 121 1949.333 5 1949m5 5 0 1949q2 |
+--------------------------------------------------+

. collapse (sum) airsum = air, by(qtr)

. l in 1/8

+-----------------+
| qtr airsum |
|-----------------|
1. | 1949q1 362 |
2. | 1949q2 385 |
3. | 1949q3 432 |
4. | 1949q4 341 |
5. | 1950q1 382 |
|-----------------|
6. | 1950q2 409 |
7. | 1950q3 498 |
8. | 1950q4 387 |
+-----------------+

. restore, preserve

. gen qtr = qofd(dofm(ym))

. format qtr %tq

. l in 1/8

+--------------------------------------------------+
| air time t ym mnr eoq qtr |
|--------------------------------------------------|
1. | 112 1949 1 1949m1 1 0 1949q1 |
2. | 118 1949.083 2 1949m2 2 0 1949q1 |
3. | 132 1949.167 3 1949m3 3 1 1949q1 |
4. | 129 1949.25 4 1949m4 4 0 1949q2 |
5. | 121 1949.333 5 1949m5 5 0 1949q2 |
|--------------------------------------------------|
6. | 135 1949.417 6 1949m6 6 1 1949q2 |
7. | 148 1949.5 7 1949m7 7 0 1949q3 |
8. | 148 1949.583 8 1949m8 8 0 1949q3 |
+--------------------------------------------------+

. collapse airavg = air, by(qtr)

. l in 1/8

+------------------+
| qtr airavg |
|------------------|
1. | 1949q1 120.667 |
2. | 1949q2 128.333 |
3. | 1949q3 144 |
4. | 1949q4 113.667 |
5. | 1950q1 127.333 |
|------------------|
6. | 1950q2 136.333 |
7. | 1950q3 166 |
8. | 1950q4 129 |
+------------------+

. tsmktim yq, start(1949q1)
time variable: yq, 1949q1 to 1960q4
delta: 1 quarter

. list in 1/8

+---------------------------+
| qtr airavg yq |
|---------------------------|
1. | 1949q1 120.667 1949q1 |
2. | 1949q2 128.333 1949q2 |
3. | 1949q3 144 1949q3 |
4. | 1949q4 113.667 1949q4 |
5. | 1950q1 127.333 1950q1 |
|---------------------------|
6. | 1950q2 136.333 1950q2 |
7. | 1950q3 166 1950q3 |
8. | 1950q4 129 1950q4 |
+---------------------------+

. restore

. tsset ym
time variable: ym, 1949m1 to 1960m12
delta: 1 month

. tscollap (last) aireop = air (sum) airsum = air (mean) airavg = air, to(q) gen(y
> q)


Converting from M to Q

time variable: yq, 1949q1 to 1960q4
delta: 1 quarter

. list in 1/8

+------------------------------------+
| aireop airsum airavg yq |
|------------------------------------|
1. | 132 362 120.667 1949q1 |
2. | 135 385 128.333 1949q2 |
3. | 136 432 144 1949q3 |
4. | 118 341 113.667 1949q4 |
5. | 141 382 127.333 1950q1 |
|------------------------------------|
6. | 149 409 136.333 1950q2 |
7. | 158 498 166 1950q3 |
8. | 140 387 129 1950q4 |
+------------------------------------+

时间序列与带95%置信区间的边际效应预测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// ssc install parmest
// ssc install tsmktim

webuse grunfeld, clear
qui reg invest i.year i.company
margins i.year, post
parmest, saving(yeareff, replace)

use yeareff, clear
tsmktim year, start(1935)
tsline estimate || ///
tsrline min95 max95, ///
leg(c(1)) scheme(vg_rose)
gr export "时间序列与带95%置信区间的边际效应预测.png", replace

使用对数坐标轴的箱线图

1
2
3
4
5
6
7
sysuse nlsw88, clear
clonevar wagelog10 = wage
replace wagelog10 = log10(wagelog10)
mylabels 0(10)40, myscale(log10(@)) local(labels)
gr hbox wagelog10, over(ind, sort(1)) ///
nooutside yti("") yla(`labels') ///
ti("Hourly wage, 1988, woman aged 34-46", span)

使用矩阵存储数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
. use grunfeldavg, clear

. levelsof company, local(colist)
1 2 3 4 5 6 7 8 9 10

. local ncomp: word count `colist'

. di "`ncomp'"
10

. matrix table1 = J(`ncomp', 7, 0)

. matrix list table1

table1[10,7]
c1 c2 c3 c4 c5 c6 c7
r1 0 0 0 0 0 0 0
r2 0 0 0 0 0 0 0
r3 0 0 0 0 0 0 0
r4 0 0 0 0 0 0 0
r5 0 0 0 0 0 0 0
r6 0 0 0 0 0 0 0
r7 0 0 0 0 0 0 0
r8 0 0 0 0 0 0 0
r9 0 0 0 0 0 0 0
r10 0 0 0 0 0 0 0

. local i 0

. foreach c of local colist{
2. qui{
3. local ++i
4. sum invest if company == `c', d
5. matrix table1[`i', 1] = r(p25)
6. matrix table1[`i', 2] = r(p50)
7. matrix table1[`i', 3] = r(p75)
8. correlate invest invavg if company == `c'
9. matrix table1[`i', 4] = r(rho)
10. correlate kstock kapavg if company == `c'
11. matrix table1[`i', 5] = r(rho)
12. reg invest L.kstock L.kapavg if company == `c'
13. matrix table1[`i', 6] = _b["L.kstock"]
14. matrix table1[`i', 7] = _se["L.kstock"]
15. }
16. }

. matrix rownames table1 = `colist'

. matrix colnames table1 = p25 p50 p75 rinvest rkap betak sek

. matrix list table1

table1[10,7]
p25 p50 p75 rinvest rkap betak sek
1 429.3 538.35001 665.5 .96355661 .98464086 .14971282 .36146776
2 321.75 419.54999 471.34999 .80775991 .89819793 -.62648164 .29878812
3 59.049999 93.549999 146.75 .89941256 .99063942 -.1804446 .24927241
4 55.99 71.085003 95.010002 .92792669 .9570222 .29802565 .16047287
5 51.525 60.385 72.289997 .8478581 .96597103 -.01904298 .05948012
6 27.685 43.110001 72.75 .9593111 .98986261 -.28145697 .26466183
7 33.244999 44.199999 57.679998 .87604314 .9442384 .04509232 .05995572
8 30.305 38.540001 53.92 .92841638 .96329808 -.13167999 .176705
9 29.715 38.109999 55.405001 .7234149 .97467447 .3501322 .11761613
10 1.925 2.215 4.4400001 .79690054 .88652247 .11318377 .1823479

使用eurostatuse命令从欧盟统计局下载数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// 语法:eurouse eurostat_code [ , clear ]
// eurouse eurostat_code可以根据自己需要的数据在这个网站中直接找到:
// http://ec.europa.eu/eurostat/data/database,这个网站可以在help文档中点击“here”进入
// 例如Consumers - monthly data (ei_bsco_m),这表明消费者月度数据的代码是ei_bsco_m

eurouse env_watres_rb, clear
/* 多次尝试表明,这个命令很不好用,虽然出现错误,但是会下载一个tsv格式的数据文件,因此推荐使用另外一个命令eurostatuse命令 */


// 使用eurostatuse命令从欧盟统计局下载数据
db eurostatuse
// 上述命令可以打开一个窗口进行操作
// tablename可以按照help文档中的指示到欧盟统计局网站进行查找。
eurostatuse ei_bsco_m, clear noerase geo() keepdim() start() end()
// 这个命令第一次运行需要按照指示下载一个7-zip软件安装到指定位置,然后
// 再次运行就可以自动下载并读取数据了。

使用mergemany命令横向合并多个文件

1:列出全部文件名
语法:mergemany 1:1 filename1 filename2 …, match(varlist) [options]

1
mergemany 1:1 资产负债表 利润表 直接法现金流量表, match(stkcd year)

还可以通过增加saving(filename)选项来保存合并后的文件
mergemany 1:1 资产负债表 利润表 直接法现金流量表, match(stkcd year) saving(财务报表)

2:合并当前目录下的所有文件

1
mergemany 1:1 all, match(stkcd year) all saving(财务报表1)

3:使用文件名的数字后缀(文件名的前缀相同而后缀不同)

1
2
3
4
5
6
7
webuse autosize
save auto1
webuse autoexpense
save auto2
webuse auto
save auto3
mergemany 1:1 auto, match(make) numerical(1(1)3)

使用putdocx输出list的结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
sysuse auto,clear
putdocx begin
putdocx table tbl = data("make price weight length") in 1/10, varnames
putdocx save mytable.docx, replace
shellout mytable.docx

!taskkill /F /IM WINWORD.EXE /T
putdocx begin
putdocx table tbl = data("make-rep78") in 1/10, varnames //导出make到rep78的相邻四个变量
putdocx save mytable.docx, replace
shellout mytable.docx

!taskkill /F /IM WINWORD.EXE /T
putdocx begin
putdocx table tbl = data("_all") in 1/10, varnames //导出make到rep78的相邻四个变量
putdocx save mytable.docx, replace
shellout mytable.docx

clear
!taskkill /F /IM WINWORD.EXE /T
putdocx begin
putdocx paragraph, halign(center)
putdocx text ("Auto.dta"), font("Times New Roman",18,black) bold linebreak
putdocx text ("list结果展示"), font("宋体",14,black) bold
putdocx save mytable.docx, replace
shellout mytable.docx

!taskkill /F /IM WINWORD.EXE /T
sysuse auto, clear
putdocx begin
putdocx pagebreak
putdocx paragraph, halign(left) spacing(after, 0)
putdocx text ("Auto,list in 1/20"), font("Times New Roman",14,black) bold linebreak
putdocx table tbl = data("make-rep78") in 1/20, varnames layout(autofitwindow) //根据窗口的大小自动调节列宽
putdocx save mytable.docx, append
shellout mytable.docx

!taskkill /F /IM WINWORD.EXE /T
putdocx begin
putdocx pagebreak
putdocx paragraph, halign(left) spacing(after, 0)
putdocx text ("Auto,list in 21/40"), bold
putdocx table tbl = data("make-rep78") in 21/40, varnames width(5.5) //设置表格宽度为5.5英尺
putdocx save mytable.docx,append
shellout mytable.docx

!taskkill /F /IM WINWORD.EXE /T
putdocx begin
putdocx pagebreak
putdocx paragraph, halign(left) spacing(after, 0)
putdocx text ("Auto, list in 41/60"), font("Times New Roman",14,black) bold linebreak
putdocx table tbl = data("make-rep78") in 41/60, varnames layout(autofitcontents) indent(2)
putdocx save mytable.docx, append
shellout mytable.docx

!taskkill /F /IM WINWORD.EXE /T
putdocx begin
putdocx pagebreak
putdocx paragraph, halign(left) spacing(after, 0)
putdocx text ("Auto,list in 61/74"),font("Times New Roman",14,black) bold linebreak
putdocx table tbl = data("make-rep78") in 61/74, varnames width(5.5) halign(center) cellspacing(0.08) //设置相邻单元格以及单元格与表格之间的宽度为0.08英寸
putdocx save mytable.docx, append
shellout mytable.docx

我们也可以通过循环将list结果分批输出到docx文件中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
clear all
sysuse auto,clear
set trace on
set tracedepth 1
putdocx begin
putdocx paragraph, halign(center) spacing(after, 0)
putdocx text ("Auto list output presentatin"), font("宋体",18,black) bold linebreak
putdocx save mytable1.docx, replace
forvalues i = 1(20)`=_N'{
putdocx begin
putdocx pagebreak
putdocx paragraph, halign(left) spacing(after, 0)
if `=`i'+19' > `=_N'{
putdocx text ("Auto, list in `i'/`=_N'"), font("Times",14,black) bold linebreak
putdocx table tbl = data("make-rep78") in `i'/`=_N', varnames width(5.5)
}
else{
putdocx text ("Auto, list in `i'/`=`i'+19'"), font("Times New Roman",14,black) bold linebreak
putdocx table tbl = data("make-rep78") in `i'/`=`i'+19',varnames width(5.5)
}
putdocx save mytable1.docx, append
}
shellout mytable1.docx

使用rolling和merge产生描述性统计量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
. use grunfeld, clear

. gen linvest = log(invest)

. mvsumm LD.linvest, gen(invrate) stat(mean) window(3) end

. qui rolling r(mean), window(3) saving(ldinvest, replace): sum LD.linvest

. cap restore

. preserve

. use ldinvest, clear
(rolling: summarize)

. ren (end _stat_1) (year rolling_ldinvest)

. keep if year >= 1939
(20 observations deleted)

. xtset company year
panel variable: company (strongly balanced)
time variable: year, 1939 to 1954
delta: 1 unit

. save ldinvest, replace
file ldinvest.dta saved

. restore

. merge m:1 company year using ldinvest

Result # of obs.
-----------------------------------------
not matched 40
from master 40 (_merge==1)
from using 0 (_merge==2)

matched 160 (_merge==3)
-----------------------------------------

. drop _m

. sum invrate rolling_ldinvest

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
invrate | 160 .0602769 .1303373 -.2292838 .475313
rolling_ld~t | 160 .0602769 .1303373 -.2292838 .475313

使用scatteri命令在图片指定位置添加文字

1
2
3
4
5
6
7
8
9
clear all
tw ///
scatteri 6 0 "Stata可以绘制很多图", mlabsize(huge) pstyle(p1) msymbol(i) || ///
scatteri 5 0 "scheme控制着图片的整体样式", mlabsize(huge) pstyle(p1) msymbol(i) || ///
scatteri 4 0 "绘图命令有着相似的结构", mlabsize(huge) pstyle(p1) msymbol(i) || ///
scatteri 3 0 "Stata可以构建很多复杂的图形", mlabsize(huge) pstyle(p1) msymbol(i) || ///
scatteri 2 0 "这只是一个非常简单的例子", mlabsize(huge) pstyle(p1) msymbol(i) || ///
scatteri 1 0 "可视化是非常有趣的", mlabsize(huge) pstyle(p1) title(概览) xla(-0.05 1) yla(-1 7) xsc(off) ysc(off) leg(off) note("Stata绘图是非常简单和容易的" "Stata的绘图功能是非常强大的!", ring(0)) msymbol(i)
graph export 一个文字图.png, replace

使用stack命令绘图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sysuse auto, clear
cap restore
preserve
tempfile mylabel
label save origin using `mylabel'

replace rep78 = rep78 + 2
stack price foreign weight rep78, into(data foreign) clear
label def _stack 1 "`l1'" 2 "`l2'"
label val _stack _stack
do `mylabel'
label def origin 3 "1" 4 "2" 5 "3" 6 "4" 7 "5", add
label val foreign origin

gr hbox data, over(foreign)

使用stata做定积分计算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
set scheme s1color
clear
set obs 5000
gen x = uniform()*4+(-1.5)
gen y1 = x^2
gen y2 = x + 2
gen low = min(y1,y2)
gen high = max(y1,y2)
twoway rarea low y1 x, sort fcolor(red) lcolor(yellow) || ///
rarea low y2 x, sort fcolor(green) lcolor(black)

// 投点法
clear
set seed 123456
set obs 10000
gen u1 = uniform()*3+(-1)
gen u2 = uniform()*3
gen n=(u2<u1+2-u1^2) //生成虚拟变量n,若随机点数落入积分区域内为1,否则为0
qui sum n
dis "acreage="(2+1)*3*r(mean)

// 平均法
clear
set obs 100000
set seed 1234567
gen x = uniform()*3-1
gen y = x+2-x^2
qui sum y
dis "acreage="3*r(mean)

// 若想得到更为精确的近似值,只需将上述程序重复多次,程序如下:
clear
// 如果使用的是Stata15运行程序,不再需要设置set more off,因为Stata15默认set more off
set seed 1234
cap postclose integration
// 运用post文件,若文件integration打开就关上,否则忽略此行
postfile integration Integration using Integration, replace
// 定义post文件integration,包含的变量名为i,包含的变量名为i,保存在D:\Desktop\Stata笔记\integration文件
forvalues i = 1(1)100{
qui set obs 1000
gen x=uniform()*3+(-1)
gen y = x+2-x^2
qui sum y
scalar i = 3*r(mean) //定义一次实验的积分值
post integration (i) // 调用post文件integration,将i值储存在变量Integration
clear
}
postclose integration //结束post文件
use Integration //调用所储存的数据文件
qui sum Integration
dis "Integration="r(mean)

使用xtset产生一幅面板数据图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
* 产生面板数据
clear
set obs 3
gen id=_n
expand 10
bysort id: gen year=2000+_n-1
gen data=runiform()*100
list, sepby(id)

* 绘图
xtset id year
bysort year: egen ave = mean(data)
replace ave = . if id != 1

gen mean_lab = "mean" if id == 1 & year == 2009

sort id year
xtline data, overlay addplot((line ave year, lw(thick))(sc data year if year == 2009, mla(id))(sc ave year if year == 2009, mla(mean_lab)), leg(off))

适用于所有图表的绘图选项

标题

1
2
vguse allstates, clear
sc propval100 ownhome, t1ti("t1title") t2ti("t2title") b1ti("b1title") b2ti("b2title") l1ti("l1title") l2ti("l2title") r1ti("r1title") r2ti("r2title")

aspectratio:设定横纵坐标比例

1
sc propval100 ownhome, aspectratio(1)

xsize/ysize

1
sc propval100 ownhome, xsize(2) ysize(2)

1
sc propval100 ownhome, xsize(3) ysize(1)

scale

1
sc propval100 ownhome, scale(3.7)

plotr

1
sc propval100 ownhome, plotr(color(stone))

1
sc propval100 ownhome, plotr(lcolor(navy) lw(thick))

graphr

1
sc propval100 ownhome, graphr(color(erose))

1
sc propval100 ownhome, graphr(ifcolor(erose) fcolor(maroon))

1
sc propval100 ownhome, graphr(lc(navy) lw(vthick))

数据合并

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// 先整理第一个数据,把前三列合并
clear
use c12indinc, clear
format hhid %12.0f
gen hlw = string(hhid, "%14.0f") + string(line) + string(wave)
drop hhid line wave
destring hlw, gen(id)
format id %20.0f
order id
drop hlw
save c12indinc, replace

// 同样处理第二个文件
clear
use m12educ, clear
format hhid %12.0f
gen hlw = string(hhid, "%14.0f") + string(line) + string(wave)
drop hhid line wave
destring hlw, gen(id)
format id %20.0f
order id
drop hlw
save m12educ, replace

// 删除重复值
use c12indinc, clear
duplicates report id
duplicates list
duplicates drop id, force
save c12indinc, replace

use m12educ, clear
duplicates report id
duplicates drop id, force
save m12educ, replace

// 合并文件
use c12indinc, clear
merge 1:1 id using m12educ
order id _merge

// 根据合并情况找到没有匹配的观测值
keep if _merge == 2 //现在剩下的就都是第一个文件里面没有的了。

数据类型转换之数据格式问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
clear
set more off
set obs 10
gen x = (rnormal())*10000000
/*
tostring选择项中有format(),通过这个选项我们就可以选择数值型数据
转换为字符型数据之后的数据显示格式问题。字符型数据显示格式通常包括
%#.*e、%#.*g、%#.*gc、%#.*f、%#.*fc(带上“-”表示左对齐,不带表示右对齐,
#表示数据列宽度,*表示小数点右侧的小数位数)。

e:科学计数法
g:一般格式
f:固定格式
c:带千分符
*/
tostring x, gen(x1) force format(%12.0gc)
tostring x, gen(x2) force format(%12.1gc)
tostring x, gen(x3) force format(%12.0fc)
tostring x, gen(x4) force format(%12.1fc)
order x1 x2 x3 x4 x
list

// sdecode命令也可以实现同样的功能
sdecode x, gen(x5) format(%12.0fc)
sdecode x, gen(x6) format(%12.2fc)
list

// 最后我们还可以使用string函数解决数值的千分符问题
gen x7 = string(x, "%12.0fc")
list x7

数据替换命令

爬虫俱乐部推文学习笔记

recode命令:替换数值变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
clear all
sysuse auto, clear
//例如我们把weight中的2930换成1000,3350换成2000
list weight in 1/3

+--------+
| weight |
|--------|
1. | 2,930 |
2. | 3,350 |
3. | 2,640 |
+--------+
recode weight(2930 = 1000)(3350 = 2000), gen(weight1)
list weight1 in 1/3

+---------+
| weight1 |
|---------|
1. | 1000 |
2. | 2000 |
3. | 2640 |
+---------+
  • 但是recode命令只能针对于数值型变量进行处理,针对于字符型变量我们需要使用subinstr()函数

subinstr():处理字符变量

  • subinstr(s1,s2,s3,n):将字符串s1中前n次出现的s2时的s2全部替换成字符串s3
  • 例如我们想把变量“make”中的AMC全部换成SUV
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
list make in 1/3

+-------------+
| make |
|-------------|
1. | AMC Concord |
2. | AMC Pacer |
3. | AMC Spirit |
+-------------+

replace make = subinstr(make, "AMC", "SUV", 1)
list make in 1/3

+-------------+
| make |
|-------------|
1. | SUV Concord |
2. | SUV Pacer |
3. | SUV Spirit |
+-------------+

//如果我们想删除它:
replace make = subinstr(make, "SUV", "", 1)
list make in 1/3

+----------+
| make |
|----------|
1. | Concord |
2. | Pacer |
3. | Spirit |
+----------+
  • 有时候我们需要对多个字符型变量进行替换,这个时候使用fdta命令更加方便

fdta命令:可以同时替换多个字符变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
ssc install fdta
* 句法规则:fdta varlist, from(str1) [to(str2)]
* 例如我们想实现上面的功能:
sysuse auto, clear
fdta make, from("AMC") to("SUV") //替换
list make in 1/3

+-------------+
| make |
|-------------|
1. | SUV Concord |
2. | SUV Pacer |
3. | SUV Spirit |
+-------------+

fdta make, from("SUV") //删除
list make in 1/3

+----------+
| make |
|----------|
1. | Concord |
2. | Pacer |
3. | Spirit |
+----------+

* 批量替换:
* 先生成多一点的字符型变量
sysuse auto, clear
gen a = word(make, 1)
gen b = substr(make, 1, 8)
list make a b in 1/3

+------------------------------+
| make a b |
|------------------------------|
1. | AMC Concord AMC AMC Conc |
2. | AMC Pacer AMC AMC Pace |
3. | AMC Spirit AMC AMC Spir |
+------------------------------+

fdta _all, from("AMC") to("SUV")
list make a b in 1/3

+------------------------------+
| make a b |
|------------------------------|
1. | SUV Concord SUV SUV Conc |
2. | SUV Pacer SUV SUV Pace |
3. | SUV Spirit SUV SUV Spir |
+------------------------------+
# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了604.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×