Stata旧笔记整理(十)

Stata旧笔记整理(十)

之前老网站上有很多没有很好整理的笔记。之前也整理过一些,但是还有两百多篇,所以就简单汇总一下,便于检索。

local命令:respectcase选项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
假如现在工作目录里面有3个文件:
A1.txt, A2.txt, A3.txt
*/
* 读入并转码
unicode encoding set gb18030
local files: dir "路径" files "*.txt"
foreach files in `files'{
unicode translate `files'
}
* 但是这样会出问题,因为这样读取的文件名全是小写的,所有要加respectcase选项:
unicode encoding set gb18030
local files: dir "路径" files "*.txt", respectcase
foreach files in `files'{
unicode translate `files'
}

margins:计算边际效应及绘图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
. use airquality, clear
. qui sum so2, d
. gen hiso2 = (so2 > r(p50)) & !missing(so2)
. sum precip, meanonly
. gen hiprecip = (precip > r(mean)) & !missing(precip)
. qui probit hiso2 pop manuf
. margins, dydx(*) post

Average marginal effects Number of obs = 41
Model VCE : OIM

Expression : Pr(hiso2), predict()
dy/dx w.r.t. : pop manuf

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pop | -.0007313 .0004183 -1.75 0.080 -.0015511 .0000884
manuf | .0012074 .0004747 2.54 0.011 .000277 .0021377
------------------------------------------------------------------------------

. est store three
. qui probit hiso2 pop manuf i.hiprecip
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
. margins, dydx(*) post

Average marginal effects Number of obs = 41
Model VCE : OIM

Expression : Pr(hiso2), predict()
dy/dx w.r.t. : pop manuf 1.hiprecip

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pop | -.0007739 .0004107 -1.88 0.060 -.0015788 .0000311
manuf | .0012844 .0004731 2.72 0.007 .0003572 .0022117
1.hiprecip | .1299469 .1406281 0.92 0.355 -.145679 .4055729
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. est store four

. esttab three four, margin drop(0.hiprecip) nodep num nomtitles ti("Factors influencing high SO2 concentration")

Factors influencing high SO2 concentration
--------------------------------------------
(1) (2)
--------------------------------------------
pop -0.000731 -0.000774
(-1.75) (-1.88)

manuf 0.00121* 0.00128**
(2.54) (2.72)

1.hiprecip 0.130
(0.92)
--------------------------------------------
N 41 41
--------------------------------------------
Marginal effects; t statistics in parentheses
(d) for discrete change of dummy variable from 0 to 1
* p<0.05, ** p<0.01, *** p<0.001

. qui probit hiso2 pop manuf i.hiprecip

. sum manuf if e(sample)

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
manuf | 41 463.0976 563.4739 35 3344

. qui margins, dydx(manuf) at(manuf = (50(50)750))

. marginsplot

Variables that uniquely identify margins: manuf

mata环境求解高阶方程

1
2
3
4
5
6
7
8
/*mata环境求解高阶方程*/
// 求解齐次一元高阶方程
// 例如求解方程:-2113.15x^8+2130.52x^3-21.4x^2-80.2x+1616.91=0
mata //进入mata语言环境
mata clear
polyroots((1616.91,-80.2,-21.4,2130.52,0,0,0,0,-2113.15))
//将求解方程系数由低到高输入
end

等式右边全为0的方程组:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
mata
mata clear
void function myfun2(real colvector a, real colvector values)
/*void指明函数参数是无类型限定的,编写一个函数myfun2,设置两个参数:
一个列向量代表我们的函数解,另一个是列向量的维度,就是未知参数的个数*/
{
values[1]=10-a[1]*exp(a[2]*1)-a[3]*1
values[2]=12-a[1]*exp(a[2]^2)-a[3]*2
values[3]=15-a[1]*exp(a[2]^3)-a[3]*3/*方程组的书写*/
}
S = solvenl_init() //方程的初始化求解
solvenl_init_evaluator(S,&myfun2()) //设置一个指针(实函数)指向的函数集myfun2()
solvenl_init_type(S,"zero") //指明方程的类型,"zero"表示方程组等式右边为0
solvenl_init_technique(S,"newton") //设置计算方法为牛顿法
solvenl_init_numeq(S,3) //设置当前方程组的未知参数为3
solvenl_init_startingvals(S,J(3,1,0.3)) //设置初始值是3行1列,且值均为0.3,初始值可以随便设置,如果发生错误可以更换初始值然后重新迭代
solvenl_init_iter_log(S,"on") //按日志格式显示迭代过程
a = solvenl_solve(S) //调用计算器并返回计算方案,如果出现错误,solvenl_solve()终止并返回错误提示
a //输出a的值
end //退出mata环境

等式右边为未知参数的方程组:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mata
mata clear
void function myfun(real colvector from, real colvector values)
{
values[1] = 5/3-2/3*from[2]
values[2] = 10/3-2/3*from[1]
}
S=solvenl_init()
solvenl_init_evaluator(S,&myfun())
solvenl_init_type(S, "fixedpoint") // 指明方程的类型,"fixedpoint"代表方程组的一边是未知参数
solvenl_init_technique(S, "gaussseidel") // 设置计算方法为gaussseidel法,该方法不能用于右边是0的方程组求解中
solvenl_init_numeq(S,2)
solvenl_init_iter_log(S,"on")
a = solvenl_solve(S)
a
end

Mata语言

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//数值与字符输入
mata
mata clear
2+2
x=4
x
y=x+2
y
z=x+y
z
a="Ma"
a
b="ta"
c=a+b
c
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
//矩阵的输入
mata
mata clear
A = (1,2\3,4)
A
B = (3/2,2\3,4)
B
AB1 = (A\B)
AB1
a = (1+1i,1-1i\2-4i,3+6i)
a
b = (-1+1i,1+1i\2-4i,3+6i)
b
ab=(a,b)
ab
ab1=(a\b)
ab1
Alpha = ("a","b"\"c","d")
Alpha
Beta = ("A","B"\"C","D")
Beta
AlphaBeta = (Alpha,Beta)
AlphaBeta
AlphaBeta1 = (Alpha\Beta)
AlphaBeta1
end

//Mata的基本运算
mata
mata clear
2+2
(2+2<3)&(3+5>=10)
(2+2<3)|(3+5<10)
!(2+2<3)
~(2+2<3)
x=2+2
x
x~=3
2-4i+3+6i
a=2-4i
a
"matrix"+" praogramming"
e="matrix"
f=" programming"
g=e+f
g
end
//矩阵的运算
mata
mata clear
A=(1,2\3,4)
B=(3,4\1,2)
A+B
A*B //矩阵相乘,要求A的行数等于B的列数
A#B //克罗内克积,即aij*B
A' //转置
a=2
a*A
A*a
A/a
end

//矩阵函数
/*从Stata数据集中导入数据生成矩阵,在Mata中主要有以下两个函数:
1:st_data(real matrix i,rowvector j)
2: st_view(mat=.,real matrix i,rowvector j)
A1=st_data(.,.) 所有变量的所有观测值。
A2=st_data(1,.) 所有变量的所有第一个观测值
A3=st_data((1,3),.) 所有变量的第一个到第三个观测值
A4=st_data((1\3),.) 所有变量的第一个和第三个观测值
A5=st_data((1,3\4,6),.) 所有变量的第一到第三个观测值和第四到第六个观测值
B1=st_data(.,1) 第一个变量的所有观测值
B2=st_data(.,(1,3)) 第一到第三变量的所有观测值
B3=st_data(.,("v","v2")) 变量"v"和"v2"的所有观测值
*/
use mata_input.dta,clear
list
mata
mata clear
A1B1 = st_data(.,.)
A1B1
A2 = st_data(1,.)
A2
B2 = st_data(.,1)
B2
A3 = st_data((1,3),.)
A3
i=(1,3)
i
A3 = st_data(i,.)
A3
B2 = st_data(.,(1,3))
B2
B3 = st_data(.,i)
B3
B3 = st_data(.,("var1","var3"))
B3
j = ("var1","var3")
j
B3 = st_data(.,j)
B3
A4 = st_data((1\3),.)
A4
i=(1\3)
A4 = st_data(i,.)
A4
A5 = st_data((1,3\4,5),.)
A5
i = (1,3\4,5)
i
A5 = st_data(i,.)
A5
end

/*st_data()与st_view()的区别在于:
st_data()函数生成的矩阵与原始矩阵没有联系;
st_view()函数生成的矩阵只是以一种特殊的过滤方式来查看和操作Stata数据集中的数据,与原始数据
集是相互关联的。*/
use mata_input.dta,clear
list
mata
A1 = st_data(.,.)
st_view(A=.,(1,2),(1,2))
A
st_view(A2,.,.)
A2
end

/*矩阵求逆:luinv()函数*/
clear
mata
mata clear
b = (3,2,1\2,2,0\1,0,2)
b
invb=luinv(b)
invb
cholinv(b) //其他求逆函数
invsym(b)
b*invb

e=(3,2,2\2,1,2\0,2,1)
e
inve=luinv(e)
inve
f=(3,2,2\2,1,2\4,2,4) //奇异阵的逆是一个由缺失值构成的矩阵
luinv(f)
end

/*生成特定格式的矩阵*/
//1:I(): 单位矩阵
mata
mata clear
I(3)
I(3,4)
end

//2:J(n,m,#):生成一个n行m列且每个元素都为#的矩阵
mata
mata clear
J(3,3,"#")
J(2,2,2)
end

//3: diag(A): 根据矩阵或向量A产生一个对角阵
mata
mata clear
A=(1,2,3)
diag(A)
B=(1,2,3\4,5,6)
diag(B)
C=(1\2\3)
diag(C)
D=(1,2,3\4,5,6\7,8,9)
diag(D)
end

/*lowertriangle(A):从矩阵中提取下三角阵
uppertriangle(A):从矩阵中提取上三角阵
lowertriangle(A,i):从矩阵中提取下三角阵并用标量i替换主对角线
uppertriangle(A,i)
*/
mata
mata clear
A=(1,2\3,4)
B=(1,2,3\4,5,6)
A
lowertriangle(A)
uppertriangle(A)
lowertriangle(A,0)
uppertriangle(A,"A")
B
lowertriangle(B)
B'
lowertriangle(B') //注意对行向量和列向量的处理是不一样的
C=(1,2,3\4,5,6\7,8,9)
_lowertriangle(C) //在函数前面加一条下横线产生的矩阵会取代原矩阵
C
E=(1,2,3\4,5,6\7,8,7)
_luinv(E)
E
end

//e(i,n)生成第i个元素为1,其余元素均为0的1*n向量
mata
mata clear
e(4,5)
e(4,5)'
end

//diagonal(A):从矩阵A中提取主对角元素组成列向量
mata
mata clear
a=(1,2,3\4,5,6)
diagonal(a)
end

//sqrt(A):求矩阵中每个元素的平方根组成一个新的矩阵
mata
mata clear
a=(1,2,3\4,5,6\7,8,9)
sqrt(a)
end

/*mean():矩阵求列均值得到一个行向量
mean(X,w):以w为权值对X求列加权平均值并得到一个行向量
variance(X,w)
correlation(X,w)
*/
mata
mata clear
a=(1,2,3\4,5,6\7,8,9)
mean(a)
X=select(a,(0,1,1))
X
w=select(a,(1,0,0))
w
mean(X,w)
variance(X)
variance(X,w)
correlation(X)
correlation(X,w)
end

/*rowsum(X):对行元素求和,得到一个列向量
colsum(X):对列元素求和,得到一个行向量
sum():对矩阵中的所有元素求和
rowsum(X,missing),可以设置缺失值取非零值,缺失值会作为缺失值对待,如果不设置,默认缺失值为0
*/
mata
mata clear
X = (1,10,20,30\2,.,50,60)
X
rowsum(X)
rowsum(X,missing=10)
X = (X\3,70,80,90)
X
colsum(X)
colsum(X,missing=10)
sum(X)
end

/*rows(A):返回矩阵A的行数
cols(A):返回矩阵A的列数
length(A):返回矩阵A行数与列数的乘积值
*/
mata
mata clear
b=(1,2,3\4,5,6)
rows(b)
cols(b)
length(b)
end

/*select(X,V):从矩阵X中提取向量V所指定的行或列组成矩阵
st_select(A,X,V):从矩阵X中提取向量V所指定的行或列组成矩阵,并替换A(A已存在)
st_select(A=.,X,V):A不存在的时候。
*/
clear all
use mata_input.dta,clear
mata
mata clear
X=st_data(.,.)
X
select(X,(1,0,0,1,0,1))
select(X,(1\0\1\0\1))
end

mata中的循环

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
mata
mata clear
for (i = 1; i <=10; i++){
printf("%g squared is %g \n", i, i^2)
}
for (i = 10; i > 0; i = i - 2){
printf("%g squared is %g \n", i, i^2)
}
i = 1
do{
printf("%g squared is %g \n", i, i^2)
i++
} while(i <= 10)

i = 1
if(i == 1){
printf("%g equals 1.", i)
}
else{
printf("%g don't equal 1 ", i)
}

n = 10
k = 5
if (k == 0) dof = n - 1
else dof = n - k
dof

// 上面一段代码等价于下面的代码
dof = (k == 0 ? n - 1 : n - k)
dof
end

matlist——显示一个矩阵及控制其显示格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. clear all
. matrix A = (1, 2 \ 3, 4 \ 5, 6)
. matrix list A

A[3,2]
c1 c2
r1 1 2
r2 3 4
r3 5 6

. matlist A

| c1 c2
-------------+----------------------
r1 | 1 2
r2 | 3 4
r3 | 5 6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
. * border(rows): 在表的上下划线

. * rowtitle(rows): 指定行标题

. * left(4): 左边缩进4

. matlist A, border(rows) rowtitle(rows) left(4)

------------------------------------
rows | c1 c2
-------------+----------------------
r1 | 1 2
r2 | 3 4
r3 | 5 6
------------------------------------

.
. * twidth(8): 指定第一列的宽度,第一列是行名称

. matlist 2*A, border(all) lines(none) format(%6.1f) names(rows) twidth(8) left(4) title(
> "Guess what, a title") tindent(8)

Guess what, a title

+--------------------------+
| r1 2.0 4.0 |
| r2 6.0 8.0 |
| r3 10.0 12.0 |
+--------------------------+

.
. matrix E = ( 1, 2, 3, 4, 5, 6, 7 \ 8, 9, 10, 11, 12, 13, 14 \ 15, 16, 17, 18, 19, 20, 2
> 1 \ 22, 23, 24, 25, 26, 27, 28 \ 29, 30, 31, 32, 33, 34, 35 \ 36, 37, 38, 39, 40, 41, 4
> 2 )

. matrix colnames E = A:a1 A:a2 B:ab1 B:b2 C:c1 C:c2 C:c3

. matrix rownames E = D:d1 D:d2 E:e1 E:e2 F:f1 F:f2

. matlist E

| A | B | C
| a1 a2 | ab1 b2 | c1 c2
-------------+----------------------+----------------------+----------------------
D | | |
d1 | 1 2 | 3 4 | 5 6
d2 | 8 9 | 10 11 | 12 13
-------------+----------------------+----------------------+----------------------
E | | |
e1 | 15 16 | 17 18 | 19 20
e2 | 22 23 | 24 25 | 26 27
-------------+----------------------+----------------------+----------------------
F | | |
f1 | 29 30 | 31 32 | 33 34
f2 | 36 37 | 38 39 | 40 41

| C
| c3
-------------+-----------
D |
d1 | 7
d2 | 14
-------------+-----------
E |
e1 | 21
e2 | 28
-------------+-----------
F |
f1 | 35
f2 | 42

. matlist hadamard(E,E), showcoleq(c) keepcoleq border(right) left(4)

| A | B |
| a1 a2 | ab1 b2 |
-------------+----------------------+----------------------|
D | | |
d1 | 1 4 | 9 16 |
d2 | 64 81 | 100 121 |
-------------+----------------------+----------------------|
E | | |
e1 | 225 256 | 289 324 |
e2 | 484 529 | 576 625 |
-------------+----------------------+----------------------|
F | | |
f1 | 841 900 | 961 1024 |
f2 | 1296 1369 | 1444 1521 |

| C |
| c1 c2 c3 |
-------------+---------------------------------|
D | |
d1 | 25 36 49 |
d2 | 144 169 196 |
-------------+---------------------------------|
E | |
e1 | 361 400 441 |
e2 | 676 729 784 |
-------------+---------------------------------|
F | |
f1 | 1089 1156 1225 |
f2 | 1600 1681 1764 |

. * 转置

. matlist hadamard(E,E)', showcoleq(c) keepcoleq border(right) left(4)

| D | E | F |
| d1 d2 | e1 e2 | f1 f2 |
-------------+----------------------+----------------------+----------------------|
A | | | |
a1 | 1 64 | 225 484 | 841 1296 |
a2 | 4 81 | 256 529 | 900 1369 |
-------------+----------------------+----------------------+----------------------|
B | | | |
ab1 | 9 100 | 289 576 | 961 1444 |
b2 | 16 121 | 324 625 | 1024 1521 |
-------------+----------------------+----------------------+----------------------|
C | | | |
c1 | 25 144 | 361 676 | 1089 1600 |
c2 | 36 169 | 400 729 | 1156 1681 |
c3 | 49 196 | 441 784 | 1225 1764 |

.
. matrix Htest = (12.30, 2, .00044642 \ 2.17, 1, .35332874 \ 8.81, 3, .04022625 \ 20.05, 6, .00106763)

. matrix rownames Htest = trunk length weight overall

. matrix colnames Htest = chi2 df p

.
. matrix list Htest

Htest[4,3]
chi2 df p
trunk 12.3 2 .00044642
length 2.17 1 .35332874
weight 8.81 3 .04022625
overall 20.05 6 .00106763

. matlist Htest

| chi2 df p
-------------+---------------------------------
trunk | 12.3 2 .0004464
length | 2.17 1 .3533287
weight | 8.81 3 .0402262
overall | 20.05 6 .0010676

. * cspec() & rspec()分别控制行列的格式

. matlist Htest, rowtitle(Variables) title(Test results) cspec(o4& %12s | %8.0g & %5.0f &
> %8.4f o2&) rspec(&-&&--)

Test results

Variables | chi2 df p
-------------+----------------------------
trunk | 12.3 2 0.0004
length | 2.17 1 0.3533
weight | 8.81 3 0.0402
-------------+----------------------------
overall | 20.05 6 0.0011
------------------------------------------

.
. matrix Z = ( .z, 1 \ .c, .z )

. matrix rownames Z = row_1 row_2

. matrix colnames Z = col1 col2

. matlist Z

| col1 col2
-------------+----------------------
row_1 | .z 1
row_2 | .c .z

. * .z表示缺失值, nodotz表示不显示该缺失值

. * underscore表示不显示行名称的下滑线

. matlist Z, nodotz underscore

| col1 col2
-------------+----------------------
row 1 | 1
row 2 | .c

matrix_accum——计算交叉乘积

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sysuse auto, clear 
matrix accum A = price weight mpg
matlist A

* 省略首行首列
* A = (y, X)'(y, X) = (y'y y'X \ X'y X'X)
matrix XX = A[2..., 2...]
matlist XX

matrix Xy = A[2..., 1]
matrix list Xy

matrix b = syminv(XX) * Xy
matrix list b

matrix b = invsym(A[2..., 2...]) * A[2..., 1]
matlist b

missings:缺失值处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
webuse nlswork, clear
* 报告缺失值
missings report
missings report, minimum(1000)
missings report, sort
missings report, sort show(10)

* 列示缺失值
missings list, minimum(5)
missings list, minimum(5) id(race)

* 列示缺失值表格
missings table
bysort race: missings table
missings table, identify(race)

* 检查每个观测值中包含的缺失值的数量并生成一个新变量存储
missings tag, generate(nmissing)

generate frog = .
generate toad = .a
generate newt = ""
missings dropvars frog toad newt, force sysmiss
missings dropvars toad, force sysmiss
set obs 30000
missings dropobs, force

mvsumm:计算窗口移动的描述性统计量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
. * mvsumm computes a moving-window descriptive statistic for tsvar which must be a time series variable under the aegis of tsset.

. webuse grunfeld, clear

. list invest in 1/10

+--------+
| invest |
|--------|
1. | 317.6 |
2. | 391.8 |
3. | 410.6 |
4. | 257.7 |
5. | 330.8 |
|--------|
6. | 461.2 |
7. | 512 |
8. | 448 |
9. | 499.6 |
10. | 547.5 |
+--------+

. mvsumm invest, stat(mean) win(3) gen(inv3yavg) end

. list invest inv3yavg in 1/10

+-------------------+
| invest inv3yavg |
|-------------------|
1. | 317.6 . |
2. | 391.8 . |
3. | 410.6 373.3333 |
4. | 257.7 353.3667 |
5. | 330.8 333.0333 |
|-------------------|
6. | 461.2 349.9 |
7. | 512 434.6667 |
8. | 448 473.7333 |
9. | 499.6 486.5333 |
10. | 547.5 498.3667 |
+-------------------+

. mvsumm invest, stat(sd) win(5) gen(inv5ysd) end

. list invest inv5ysd in 1/10

+-------------------+
| invest inv5ysd |
|-------------------|
1. | 317.6 . |
2. | 391.8 . |
3. | 410.6 . |
4. | 257.7 . |
5. | 330.8 61.26344 |
|-------------------|
6. | 461.2 78.40295 |
7. | 512 101.5951 |
8. | 448 104.4181 |
9. | 499.6 71.83615 |
10. | 547.5 40.02771 |
+-------------------+

. mvsumm D.mvalue, stat(median) win(5) gen(meddmval) end

. list mvalue meddmval in 1/10

+--------------------+
| mvalue meddmval |
|--------------------|
1. | 3078.5 . |
2. | 4661.7 . |
3. | 5387.1 . |
4. | 2792.2 . |
5. | 4313.2 . |
|--------------------|
6. | 4643.9 725.3999 |
7. | 4551.2 330.6997 |
8. | 3244.1 -92.69971 |
9. | 4053.7 330.6997 |
10. | 4379.3 325.5999 |
+--------------------+

netplot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
clear all
input v1 v2
1 2
2 3
4 .
5 6
1 4
3 4
end
netplot v1 v2, l
gre netplot

clear all
input v1 v2
100 2
100 3
99 4
99 6
99 4
99 4
end
netplot v1 v2, l t(circle) arrow

1
2
sysuse auto, clear
netplot foreign price, t(circle) l

1
2
replace foreign = 10000 if foreign == 1
netplot foreign price, t(circle) l

nrow:把某个变量命名为它的第n个观测值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
. sysuse auto, clear
(1978 Automobile Data)

. * 把第一个观测值作为变量名并删除第一个观测值

. nrow
(1 observation deleted)

. sysuse auto, clear
(1978 Automobile Data)

. * 把第一个观测值作为变量名并保留第一个观测值

. nrow, keep

. * 把第2个观测值作为变量名并删除第2个观测值

. nrow 2, keep

. * 指定变量列表

. sysuse auto, clear
(1978 Automobile Data)

. nrow 2, varlist(price-mpg)
(2 observations deleted)

openall:合并数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
* 传统合并方法
clear all
local files: dir "." file "*.xls"
foreach file in `files'{
import excel using `file', first case(lower) clear
save "`file'.dta", replace
}
clear
foreach file in `files'{
append using "`file'.dta"
}
*openall
openall *
// *代表当前路径下的所有dta文件
openall *bas*
// 把所有名字里面有bas的合并
openall *, insheet
// 合并csv文件

outfile命令:数据导出

爬虫俱乐部公众号推文学习笔记

导出一个数据

1
2
3
4
5
6
clear all
cd ~/Desktop
sysuse auto, clear
keep in 1/10
keep make price mpg rep78 weight length foreign
outfile using myout.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
"AMC Concord"             4099        22         3      2930       186
"Domestic"
"AMC Pacer" 4749 17 3 3350 173
"Domestic"
"AMC Spirit" 3799 22 . 2640 168
"Domestic"
"Buick Century" 4816 20 3 3250 196
"Domestic"
"Buick Electra" 7827 15 4 4080 222
"Domestic"
"Buick LeSabre" 5788 18 3 3670 218
"Domestic"
"Buick Opel" 4453 26 . 2230 170
"Domestic"
"Buick Regal" 5189 20 3 3280 200
"Domestic"
"Buick Riviera" 10372 16 3 3880 207
"Domestic"
"Buick Skylark" 4082 19 3 3400 200
"Domestic"

解除80字符限制

  • 但是我们发现,每一行还没有结束就自动换行了,这是因为outfile导出的数据默认每一个观测值每一行只能包含80个字符,下面我们使用wide选项来解除这一限制:
1
2
3
4
5
6
7
8
9
10
11
12
outfile using myout2.txt, wide

"AMC Concord" 4099 22 3 2930 186 "Domestic"
"AMC Pacer" 4749 17 3 3350 173 "Domestic"
"AMC Spirit" 3799 22 . 2640 168 "Domestic"
"Buick Century" 4816 20 3 3250 196 "Domestic"
"Buick Electra" 7827 15 4 4080 222 "Domestic"
"Buick LeSabre" 5788 18 3 3670 218 "Domestic"
"Buick Opel" 4453 26 . 2230 170 "Domestic"
"Buick Regal" 5189 20 3 3280 200 "Domestic"
"Buick Riviera" 10372 16 3 3880 207 "Domestic"
"Buick Skylark" 4082 19 3 3400 200 "Domestic"

导出变量名称——导出字典格式的数据

  • 另外我们又会发现导出的数据没有变量的名称!,这个时候我们就需要使用dict选项导出字典格式数据,然后再次使用infile命令导入的时候就会有变量名和标签名:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
outfile using myout3.txt, dict

dictionary {
str18 make `"Make and Model"'
int price `"Price"'
int mpg `"Mileage (mpg)"'
int rep78 `"Repair Record 1978"'
int weight `"Weight (lbs.)"'
int length `"Length (in.)"'
_newline
byte foreign :origin `"Car type"'
}
"AMC Concord" 4099 22 3 2930 186
"Domestic"
"AMC Pacer" 4749 17 3 3350 173
"Domestic"
"AMC Spirit" 3799 22 . 2640 168
"Domestic"
"Buick Century" 4816 20 3 3250 196
"Domestic"
"Buick Electra" 7827 15 4 4080 222
"Domestic"
"Buick LeSabre" 5788 18 3 3670 218
"Domestic"
"Buick Opel" 4453 26 . 2230 170
"Domestic"
"Buick Regal" 5189 20 3 3280 200
"Domestic"
"Buick Riviera" 10372 16 3 3880 207
"Domestic"
"Buick Skylark" 4082 19 3 3400 200
"Domestic"


* 这样我们就导出了一个数据字典格式的数据,这个时候可以使用infile直接把这个数据导入到Stata里:
infile using myout3.txt, clear

导出以逗号分隔的数据

  • 另外我们还发现outfile导出的数据默认是以空格分隔的,我们有时候需要导出以逗号分隔的数据,这个时候我们需要使用comma选项:
1
2
3
4
5
6
7
8
9
10
11
12
outfile using myout4.txt, comma

"AMC Concord",4099,22,3,2930,186,"Domestic"
"AMC Pacer",4749,17,3,3350,173,"Domestic"
"AMC Spirit",3799,22,,2640,168,"Domestic"
"Buick Century",4816,20,3,3250,196,"Domestic"
"Buick Electra",7827,15,4,4080,222,"Domestic"
"Buick LeSabre",5788,18,3,3670,218,"Domestic"
"Buick Opel",4453,26,,2230,170,"Domestic"
"Buick Regal",5189,20,3,3280,200,"Domestic"
"Buick Riviera",10372,16,3,3880,207,"Domestic"
"Buick Skylark",4082,19,3,3400,200,"Domestic"

导出没有数值标签的数据

  • 然后我们注意到,foreign是一个数值型变量,但是它有一个数值标签,有时候我们需要导出没有数值标签的数据,这个时候我们需要使用选项nolabel
1
2
3
4
5
6
7
8
9
10
11
12
outfile using myout5.txt, nolabel wide

"AMC Concord" 4099 22 3 2930 186 0
"AMC Pacer" 4749 17 3 3350 173 0
"AMC Spirit" 3799 22 . 2640 168 0
"Buick Century" 4816 20 3 3250 196 0
"Buick Electra" 7827 15 4 4080 222 0
"Buick LeSabre" 5788 18 3 3670 218 0
"Buick Opel" 4453 26 . 2230 170 0
"Buick Regal" 5189 20 3 3280 200 0
"Buick Riviera" 10372 16 3 3880 207 0
"Buick Skylark" 4082 19 3 3400 200 0

导出没有字符串双引号的数据

  • 最后我们不想要字符串的双引号,可以借助选项noquote
1
2
3
4
5
6
7
8
9
10
11
12
outfile using myout6.txt, nolabel noquote wide

AMC Concord 4099 22 3 2930 186 0
AMC Pacer 4749 17 3 3350 173 0
AMC Spirit 3799 22 . 2640 168 0
Buick Century 4816 20 3 3250 196 0
Buick Electra 7827 15 4 4080 222 0
Buick LeSabre 5788 18 3 3670 218 0
Buick Opel 4453 26 . 2230 170 0
Buick Regal 5189 20 3 3280 200 0
Buick Riviera 10372 16 3 3880 207 0
Buick Skylark 4082 19 3 3400 200 0

pairplot_处理效果图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
clear
set obs 76
gen treatment=mod(_n,3)+1

gen w_before=rnormal(82, 4.8)
replace w_before=rnormal(81, 5.7) if treatment==2
replace w_before=rnormal(83, 5) if treatment==3

gen w_after=rnormal(85.6, 8.3)
replace w_after=rnormal(81.1, 4.7) if treatment==2
replace w_after=rnormal(90, 5.4) if treatment==3
// end of making up the data

label var w_before " Weight before treatment, lb"
label var w_after " Weight after treatment, lb"
label define treat ///
1 "Cognitive behavioural" ///
2 "Control" ///
3 "Family therapy"

label values treatment treat

pairplot w_after w_before

1
pairplot w_after w_before, ms(Oh D)

1
pairplot w_after w_before, sort(w_before)

1
pairplot w_after w_before, diff mean

1
pairplot w_after w_before, ratio base(2)

parcoord——平行坐标轴图

1
2
3
4
* net get gr18.pkg, from("http://www.stata.com/stb/stb29/")
* net install gr18.pkg, from("http://www.stata.com/stb/stb29/")
cuse bushfire, clear
parcoord f*, center

1
parcoord f*

1
parcoord f*, colorby(cluster)

1
2
* 下面的这个是动图
parcoord f*, colorby(cluster) tour

pdplot_帕累托图

1
2
3
* ssc install pdplot
sysuse auto, clear
pdplot mpg

# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了659.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×