Stata旧笔记整理(一)

Stata旧笔记整理(一)

之前老网站上有很多没有很好整理的笔记。之前也整理过一些,但是还有两百多篇,所以就简单汇总一下,便于检索。

编写一个可以使用子样本及权重的命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cap prog drop cwsum
prog define cwsum
version 14.0
syntax [varlist(fv ts)] [if] [in] [aweight fweight] [, Detail noFormat]
marksample touse
sum `varlist' [`weight' `exp'] if `touse', `detail' `format'
end


sysuse auto, clear
cwsum price if price > 5000
cwsum price if price > 5000 [aw = rep78]
cwsum price if price > 5000 [fw = rep78]
cwsum price in 1/20
  1. fweights: 频数权重, are weights that indicate the number of duplicated observations.
  2. pweights: 抽样权重, are weights that denote the inverse of the probability that the observation is included because of the sampling design.
  3. aweights: 分析权重, are weights that are inversely proportional to the variance of an observation; that is, the variance of the jth observation is assumed to be sigma^2/w_j, where w_j are the weights. Typically, the observations represent averages and the weights are the number of elements that gave rise to the average. For most Stata commands, the recorded scale of aweights is irrelevant; Stata internally rescales them to sum to N, the number of observations in your data, when it uses them.
  4. iweights: 重要性权重, are weights that indicate the “importance” of the observation in some vague sense. iweights have no formal statistical definition; any command that supports iweights will define exactly how they are treated. Usually, they are intended for use by programmers who want to produce a certain computation.

产生交叉数据集:cross命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
. input str6 sex

sex
1. male
2. female
3. end

. save sex, replace
. drop _all

. * Create agecat dataset

. input agecat

agecat
1. 20
2. 30
3. 40
4. end

. * Form every pairwise combination of agecat with sex

. cross using sex

. list

+-----------------+
| agecat sex |
|-----------------|
1. | 20 male |
2. | 30 male |
3. | 40 male |
4. | 20 female |
5. | 30 female |
|-----------------|
6. | 40 female |
+-----------------+

产生交叉数据集:joinby

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
. webuse child, clear s
file ~/Library/Application Support/Stata/ado/plus/c/child.dta saved

. list

+--------------------------------+
| family~d child_id x1 x2 |
|--------------------------------|
1. | 1025 3 11 320 |
2. | 1025 1 12 300 |
3. | 1025 4 10 275 |
4. | 1026 2 13 280 |
5. | 1027 5 15 210 |
+--------------------------------+

. webuse parent, clear s

. list, sep(0)

+--------------------------------+
| family~d parent~d x1 x3 |
|--------------------------------|
1. | 1030 10 39 600 |
2. | 1025 11 20 643 |
3. | 1025 12 27 721 |
4. | 1026 13 30 760 |
5. | 1026 14 26 668 |
6. | 1030 15 32 684 |
+--------------------------------+

. sort family_id

. * Join information on parents from data in memory with information on children from data at http:

. joinby family_id using "http://www.stata-press.com/data/r14/child"

. list, sepby(family_id) abbrev(12)

+---------------------------------------------------+
| family_id parent_id x1 x3 child_id x2 |
|---------------------------------------------------|
1. | 1025 12 27 721 1 300 |
2. | 1025 12 27 721 4 275 |
3. | 1025 12 27 721 3 320 |
4. | 1025 11 20 643 1 300 |
5. | 1025 11 20 643 4 275 |
6. | 1025 11 20 643 3 320 |
|---------------------------------------------------|
7. | 1026 13 30 760 2 280 |
8. | 1026 14 26 668 2 280 |
+---------------------------------------------------+

储存和使用回归结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
. use usairquality, clear

. qui reg so2 temp manuf pop

. est store m1

. qui reg so2 temp pop wind

. est store m2

. qui reg so2 temp pop wind precip days

. est store m3

. qui reg so2 temp manuf pop wind precip days

. est store m4

. * 输出回归表

. est table m1 m2 m3 m4, stat(r2_a rmse) b(%7.3f) se(%6.3f) p(%4.3f)

------------------------------------------------------
Variable | m1 m2 m3 m4
-------------+----------------------------------------
temp | -0.587 -1.504 -2.053 -1.268
| 0.371 0.430 0.714 0.621
| 0.122 0.001 0.007 0.049
manuf | 0.071 0.065
| 0.016 0.016
| 0.000 0.000
pop | -0.047 0.020 0.021 -0.039
| 0.015 0.005 0.005 0.015
| 0.004 0.000 0.000 0.014
wind | -2.858 -3.649 -3.181
| 2.219 2.187 1.815
| 0.206 0.104 0.089
precip | 0.670 0.512
| 0.435 0.363
| 0.133 0.167
days | -0.048 -0.052
| 0.196 0.162
| 0.808 0.750
_cons | 58.196 128.501 147.197 111.728
| 20.488 36.725 56.163 47.318
| 0.007 0.001 0.013 0.024
-------------+----------------------------------------
r2_a | 0.581 0.386 0.434 0.611
rmse | 15.191 18.393 17.666 14.636
------------------------------------------------------
legend: b/se/p

. est table m1 m2 m3 m4, stat(r2_a rmse) b(%7.3f) star varlabel ti("Regression table")

Regression table

------------------------------------------------------------------------------
Variable | m1 m2 m3 m4
-------------------------+----------------------------------------------------
Mean temperature | -0.587 -1.504** -2.053** -1.268*
Mfg. workers, 000 | 0.071*** 0.065***
Population | -0.047** 0.020*** 0.021*** -0.039*
Mean wind speed | -2.858 -3.649 -3.181
Mean precipitation | 0.670 0.512
Mean days quality=poor | -0.048 -0.052
Constant | 58.196** 128.501** 147.197* 111.728*
-------------------------+----------------------------------------------------
r2_a | 0.581 0.386 0.434 0.611
rmse | 15.191 18.393 17.666 14.636
------------------------------------------------------------------------------
legend: * p<0.05; ** p<0.01; *** p<0.001

.
. * 批量检验

. est for m1 m2 m3 m4: test temp = -1.6

----------------------------------------------------------------------------------
Model m1
----------------------------------------------------------------------------------

( 1) temp = -1.6

F( 1, 37) = 7.45
Prob > F = 0.0096

----------------------------------------------------------------------------------
Model m2
----------------------------------------------------------------------------------

( 1) temp = -1.6

F( 1, 37) = 0.05
Prob > F = 0.8236

----------------------------------------------------------------------------------
Model m3
----------------------------------------------------------------------------------

( 1) temp = -1.6

F( 1, 35) = 0.40
Prob > F = 0.5297

----------------------------------------------------------------------------------
Model m4
----------------------------------------------------------------------------------

( 1) temp = -1.6

F( 1, 34) = 0.29
Prob > F = 0.5964

.
. * 重现回归结果

. est replay m1

----------------------------------------------------------------------------------
Model m1
----------------------------------------------------------------------------------

Source | SS df MS Number of obs = 41
-------------+---------------------------------- F(3, 37) = 19.50
Model | 13499.2473 3 4499.7491 Prob > F = 0.0000
Residual | 8538.65513 37 230.774463 R-squared = 0.6125
-------------+---------------------------------- Adj R-squared = 0.5811
Total | 22037.9024 40 550.947561 Root MSE = 15.191

------------------------------------------------------------------------------
so2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
temp | -.5871451 .3710077 -1.58 0.122 -1.338878 .1645878
manuf | .0712252 .0160601 4.43 0.000 .0386842 .1037661
pop | -.0466475 .0153719 -3.03 0.004 -.0777939 -.0155011
_cons | 58.19593 20.48789 2.84 0.007 16.68352 99.70835
------------------------------------------------------------------------------

.
. * 保存回归结果

. forval i = 1/4{
2. est restore m`i'
3. est notes: from file `c(filename)' saved `c(filedate)'
4. est save so2_m`i', replace
5. }
(results m1 are active now)
file so2_m1.ster saved
(results m2 are active now)
file so2_m2.ster saved
(results m3 are active now)
file so2_m3.ster saved
(results m4 are active now)
file so2_m4.ster saved

. * 打开回归结果的存储文件

. clear

. est clear

. est des using so2_m3

Estimation results saved on 20feb2018 14:31, produced by

. regress so2 temp pop wind precip days

Notes:
1. from file usairquality.dta saved 30 Jun 2007 08:41

. est use so2_m3

. est store so2_m3

. est table *

---------------------------
Variable | so2_m3
-------------+-------------
temp | -2.053046
pop | .02076991
wind | -3.6494827
precip | .66966028
days | -.04795291
_cons | 147.19742
---------------------------

创建虚拟变量

创建虚拟变量的常用方法:

  • gen
  • gen + replace
  • tab
  • recode
  • cond
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
. sysuse auto, clear
(1978 Automobile Data)

. gen bad = cond(rep78 > 2, 1, 0) if rep78 != .
(5 missing values generated)

. list rep78 bad in 1/5, sep(0)

+-------------+
| rep78 bad |
|-------------|
1. | 3 1 |
2. | 3 1 |
3. | . . |
4. | 3 1 |
5. | 4 1 |
+-------------+

.
. * todummy

ssc install todummy

. sysuse nlsw88, clear
(NLSW, 1988 extract)

. * 1. 以变量的中位数为界,大于中位数虚拟变量取值为1, 小于中位数取值
> 为0

. todummy wage, values (50) percentile
(note: default prefix d_ set)

. list wage d_wage in 1/10, sep(0)

+-------------------+
| wage d_wage |
|-------------------|
1. | 11.73913 1 |
2. | 6.400963 1 |
3. | 5.016723 0 |
4. | 9.033813 1 |
5. | 8.083731 1 |
6. | 4.62963 0 |
7. | 10.49114 1 |
8. | 17.20612 1 |
9. | 13.08374 1 |
10. | 7.745568 1 |
+-------------------+

. * 2. 将变量分段,创建多个虚拟变量:当年龄大于等于45时age1为1,大于等于40岁时age2为1,当年龄在38-40岁之间时,age3 = 1,其他情况时这三个变量均为0

. todummy age, values (45\40\=38 40) cut

. list age age1 age2 age3 in 1/10, sep(0)

+--------------------------+
| age age1 age2 age3 |
|--------------------------|
1. | 37 0 0 0 |
2. | 37 0 0 0 |
3. | 42 0 1 0 |
4. | 43 0 1 0 |
5. | 42 0 1 0 |
6. | 39 0 0 1 |
7. | 37 0 0 0 |
8. | 40 0 1 1 |
9. | 40 0 1 1 |
10. | 40 0 1 1 |
+--------------------------+

. * 3. 根据变量的类别,创建多个虚拟变量: 根据三类人种。创建虚拟变量,当为白种人时,虚拟变量white为1,当为黑人或者其他时,虚拟变量other为1.

. todummy race, values (1 \ 2 3) generate(white other)

. list race white other in 1/10, sep(0)

+-----------------------+
| race white other |
|-----------------------|
1. | black 0 1 |
2. | black 0 1 |
3. | black 0 1 |
4. | white 1 0 |
5. | white 1 0 |
6. | white 1 0 |
7. | white 1 0 |
8. | white 1 0 |
9. | white 1 0 |
10. | white 1 0 |
+-----------------------+

从日期中提取年月

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// %tm格式的日期指形如1959m12的日期,类似的还有%tw、%td、%tq的日期
clear
input str7 mydate
1960m1
1980m2
2005m3
2010m10
2015m11
2017m10
end

gen year1 = substr(mydate,1,4)
gen month1 = substr(mydate,6,2)

// substr(s,n1,n2)表示从第n1个开始,截取n2个字符
list
d
destring, replace
d

gen mydate1 = monthly(mydate, "YM")
list mydate1
gen date = dofm(mydate1)

// 该函数将mydate1换算出的月份数转换为天数(月份默认为该月的第一天)
list date

* 或用如下命令也一样
gen year3 = int(mydate1/12+1960)
gen month3 = mod(mydate1,12) + 1
list

滚动窗口与递归估计:rolling

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174

. * Example: Collecting coefficients

. webuse lutkepohl2, clear s
file ~/Library/Application Support/Stata/ado/plus/l/lutkepohl2.dta saved

. tsset qtr
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter

. * 30期的循环回归

. rolling _b, window(30): regress dln_inv dln_inc dln_consump
(running regress on estimation sample)

Rolling replications (63)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.............

. list in 1/10, abbrev(14)

+-----------------------------------------------------------+
| start end _b_dln_inc _b_dln_consump _b_cons |
|-----------------------------------------------------------|
1. | 1960q1 1967q2 .1054375 1.263474 -.0101802 |
2. | 1960q2 1967q3 .1542573 1.251464 -.0113987 |
3. | 1960q3 1967q4 .2400457 1.001518 -.0048182 |
4. | 1960q4 1968q1 .0053584 1.202571 -.0067967 |
5. | 1961q1 1968q2 .012656 1.187025 -.006777 |
|-----------------------------------------------------------|
6. | 1961q2 1968q3 -.0790168 1.094311 -.0048056 |
7. | 1961q3 1968q4 .0205408 .964076 -.0018992 |
8. | 1961q4 1969q1 -.1895722 1.169699 -.0022988 |
9. | 1962q1 1969q2 -.2074511 1.271727 -.002647 |
10. | 1962q2 1969q3 -.0170991 1.187241 -.0051391 |
+-----------------------------------------------------------+

. * Same as above, _b is default for e-class commands

. webuse lutkepohl2, clear

. tsset qtr
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter

. rolling, window(30): regress dln_inv dln_inc dln_consump
(running regress on estimation sample)

Rolling replications (63)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.............

. list in 1/10, abbrev(14)

+-----------------------------------------------------------+
| start end _b_dln_inc _b_dln_consump _b_cons |
|-----------------------------------------------------------|
1. | 1960q1 1967q2 .1054375 1.263474 -.0101802 |
2. | 1960q2 1967q3 .1542573 1.251464 -.0113987 |
3. | 1960q3 1967q4 .2400457 1.001518 -.0048182 |
4. | 1960q4 1968q1 .0053584 1.202571 -.0067967 |
5. | 1961q1 1968q2 .012656 1.187025 -.006777 |
|-----------------------------------------------------------|
6. | 1961q2 1968q3 -.0790168 1.094311 -.0048056 |
7. | 1961q3 1968q4 .0205408 .964076 -.0018992 |
8. | 1961q4 1969q1 -.1895722 1.169699 -.0022988 |
9. | 1962q1 1969q2 -.2074511 1.271727 -.002647 |
10. | 1962q2 1969q3 -.0170991 1.187241 -.0051391 |
+-----------------------------------------------------------+

.
. * Example: Collecting standard errors

. webuse lutkepohl2, clear

. tsset qtr
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter

. rolling _se, window(10): regress dln_inv dln_inc dln_consump
(running regress on estimation sample)

Rolling replications (83)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................

. list in 1/10, abbrev(14)

+-----------------------------------------------------------+
| start end _se_dln_inc _se_dln_cons~p _se_cons |
|-----------------------------------------------------------|
1. | 1960q1 1962q2 1.276762 1.215871 .049808 |
2. | 1960q2 1962q3 1.186243 1.094224 .0452442 |
3. | 1960q3 1962q4 1.110953 1.041057 .0407168 |
4. | 1960q4 1963q1 1.856414 1.414761 .0573072 |
5. | 1961q1 1963q2 3.283753 2.680502 .0994599 |
|-----------------------------------------------------------|
6. | 1961q2 1963q3 3.942591 2.935519 .1115676 |
7. | 1961q3 1963q4 3.843241 3.602231 .1282991 |
8. | 1961q4 1964q1 3.997124 4.167273 .1134084 |
9. | 1962q1 1964q2 5.683699 3.38701 .1156075 |
10. | 1962q2 1964q3 5.132209 3.315579 .107338 |
+-----------------------------------------------------------+

.
. * Example: Collecting stored results

. webuse lutkepohl2, clear

. tsset qtr
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter

. rolling mean=r(mean) median=r(p50), window(10): summarize inc, detail
(running summarize on estimation sample)

Rolling replications (83)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................

. list in 1/10

+----------------------------------+
| start end mean median |
|----------------------------------|
1. | 1960q1 1962q2 509 514.5 |
2. | 1960q2 1962q3 521.3 520.5 |
3. | 1960q3 1962q4 533.1 530.5 |
4. | 1960q4 1963q1 543.7 544 |
5. | 1961q1 1963q2 554.3 553 |
|----------------------------------|
6. | 1961q2 1963q3 564.4 566 |
7. | 1961q3 1963q4 575.1 578.5 |
8. | 1961q4 1964q1 587.2 587 |
9. | 1962q1 1964q2 598.5 595 |
10. | 1962q2 1964q3 609.7 604.5 |
+----------------------------------+

.
. webuse lutkepohl2, clear

. tsset qtr
time variable: qtr, 1960q1 to 1982q4
delta: 1 quarter

. rolling ratio=(r(mean)/r(p50)), window(10): summarize inc, detail
(running summarize on estimation sample)

Rolling replications (83)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................

. list in 1/10

+----------------------------+
| start end ratio |
|----------------------------|
1. | 1960q1 1962q2 .98931 |
2. | 1960q2 1962q3 1.001537 |
3. | 1960q3 1962q4 1.004901 |
4. | 1960q4 1963q1 .9994485 |
5. | 1961q1 1963q2 1.002351 |
|----------------------------|
6. | 1961q2 1963q3 .9971731 |
7. | 1961q3 1963q4 .9941227 |
8. | 1961q4 1964q1 1.000341 |
9. | 1962q1 1964q2 1.005882 |
10. | 1962q2 1964q3 1.008602 |
+----------------------------+

Stata15中新的宏扩展函数

爬虫俱乐部推文学习笔记
rownumb & colnumb
rowsof & colsof
rowvarlist & colvarlist
rowfullnames & colfullnames

rownumb&colnumb:返回字符串所在矩阵的行数和列数

用法:

  • local lname: colnumb matrixname string
  • local lname: rownumb matrixname string
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
. sysuse auto, clear
(1978 Automobile Data)

. reg price length weight disp

Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(3, 70) = 12.44
Model | 220789762 3 73596587.2 Prob > F = 0.0000
Residual | 414275635 70 5918223.35 R-squared = 0.3477
-------------+---------------------------------- Adj R-squared = 0.3197
Total | 635065396 73 8699525.97 Root MSE = 2432.7

------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
length | -97.63366 39.57428 -2.47 0.016 -176.5621 -18.70524
weight | 4.613288 1.397405 3.30 0.002 1.826253 7.400324
displacement | .7274381 6.969071 0.10 0.917 -13.17194 14.62681
_cons | 10440.63 4369.32 2.39 0.020 1726.294 19154.96
------------------------------------------------------------------------------

. matrix X = r(table)

. matrix list X

X[9,4]
length weight displacement _cons
b -97.633657 4.6132883 .72743814 10440.629
se 39.574283 1.3974046 6.9690715 4369.3202
t -2.4670985 3.3013262 .10438093 2.3895316
pvalue .01607078 .00151687 .91716556 .01956706
ll -176.56208 1.8262528 -13.171937 1726.2945
ul -18.705237 7.4003238 14.626813 19154.963
df 70 70 70 70
crit 1.9944371 1.9944371 1.9944371 1.9944371
eform 0 0 0 0

.
. local a: rownumb X df

. local b: colnumb X weight

. di "`a' `b'"
7 2

rowsof&colsof:返回矩阵的行数和列数

用法:

  • local lname: rowsof matrixname
  • local lname: colsof matrixname
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
. local c: rowsof X

. local d: colsof X

. di "`c' `d'"
9 4

.
. * rowvarlist&colvarlist:返回矩阵的所有行变量和列变量
. * 用法:
. * local lname: rowvarlist matrixname
. * local lname: colvarlist matrixname
. local t: rowvarlist X

. local h: colvarlist X

. di "`t'"
b se t pvalue ll ul df crit eform

. di "`h'"
length weight displacement _cons

rowfullnames&colfullnames:显示行和列的完整名称

1
2
3
4
5
6
7
8
9
. local m: rowfullnames X

. local n: colfullnames X

. di "`m'"
b se t pvalue ll ul df crit eform

. di "`n'"
length weight displacement _cons

宏拓展函数:sortedby

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
. use un, clear

. d

Contains data from un.dta
obs: 58
vars: 15 13 Jun 2006 10:52
size: 3,654
----------------------------------------------------------------------------------
storage display value
variable name type format label variable label
----------------------------------------------------------------------------------
name str10 %10s
mistype float %9.0g
contype float %9.0g
sevviol float %9.0g
area float %9.0g
loctype float %9.0g
addloc float %9.0g
borders float %9.0g
primact float %9.0g
spinv float %9.0g
duration float %9.0g
troop float %9.0g
expend float %9.0g
deaths float %9.0g
completed byte %8.0g
----------------------------------------------------------------------------------
Sorted by: duration

. local sb: sortedby

. di "dataset sorted by: `sb'."
dataset sorted by: duration.

矩阵的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
mat A = (1, 2 \ 3, 4)
mat B = (5, 7 \ 9, 2)
mat C = A + B
mat list C

mat B = A - B
matlist B

mat X = (1, 1 \ 2, 5 \ 8, 0 \ 4, 5)
mat C = 3 * X * A' * B
mat l C

mat D = (X'*X - A'*A)/4
mat rownames D = dog cat
mat colnames D = bark meow
mat l D

mat rownames A = aa bb
mat colnames A = alpha beta
mat l A

* kronecker product
mat D = A#D
mat l D

mat G = A, B \ D
mat l G

mat Z = (B - A)' * (B + A' * -B)/4
mat l Z

* 一些特殊的矩阵函数
* 单位对角矩阵
mat A = I(4)
matlist A

* r行c列全是k的矩阵
mat A = J(3, 4 ,5)
matlist A

* cholesky分解
mat A = (1, 2 \ 2, 5)
mat L = cholesky(4*I(2) + A'*A)
mat l L

* 逆矩阵
mat Ainv = invsym(A)
mat l Ainv

矩阵在mata中的输入和运算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
clear all
mat A = (2,6,1\3,10,12\2,9,5)
memory // 查看内存
mata
mata clear
A = (2,6,1\3,10,12\2,9,5)
mata memory
end
mata
A1 = (1,2,-1\3,4,-2+3i\2,5,6)
A2 = (-2,1,-2\1,-2,3-1i+3i\3,6,-1)
A3 = (A1,A2)
A3
A4 = (A1\A2)
A4
end
mata
C1 = A1[1,2]
C1
C2 = A1[1,.]
C2
C3 = A1[1,(1,3)]
C3
C4 = A1[1,(1::3)]
C4
C5 = A3[|1,4\2,6|] // 第一行第四列到第二行第六列
A3
C5
end

mata
D1 = A1'
D1
D2 = A1'A1
D2
end
// 当使用冒号符的时候,加减乘除被用于元素对元素的运算
mata
F1 = (5,0\0,2\3,8)
F2 = (1,3\5,1\3,2)
F3 = F1:*F2 //矩阵元素相乘
F3
end

// 逻辑运算符
mata
F1
G1 = (!F1)
G1
F2
G2 = (F1==F2') // 判断F1中的所有元素是否和F2的转置矩阵中的所有元素相等
G2
G3 = (F1!=F2') // 判断F1是否和F2的转置矩阵有不相等的元素
G3
G4 = (F1>=F2) // 判断F1中的所有元素是否大于等于F2的所有元素
G4
G5 = (F1[2,1]=0) // 判断是否等于0
G5

/*
!可以用于标量、向量和矩阵,并在各元素相应的位置上返回0或1,其它的逻辑运算符& | 只能用于标量
关系运算符!=可以用于不同结构的向量和矩阵,其他的关系运算符> >= < <=仅能用于相同结构且具有相同数据类型的矩阵,返回值为0或1
*/

离散型随机变量的模拟

有限离散型随机变量的模拟(一)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
clear all
set more off
set obs 1234
gen u = uniform()
egen x = cut(u), at(0, 0.1, 0.3, 0.55, 0.8, 1)
tab x
recode x 0 = 1
// recode命令的作用类似于replace,
// recode往往作用于对取值范围有限的非连续变量重新赋值。
recode x 0.1 = 2
recode x 0.3 = 3
recode x 0.55 = 4
recode x 0.8 = 5
tab x

有限离散型随机变量的模拟(二)

1
2
3
4
5
6
7
8
clear all
set more off
set seed 6789
set obs 10000
gen u = uniform()
egen x = cut(u), at(0,0.1,0.3,0.55,0.8,1)
recode x (0=1) (0.1 = 2) (0.3 = 3) (0.55 = 4) (0.8 = 5)
tab x

除了recode命令外,我们还可以用egen的group函数,对x按照取值的不同分组,并按照取值的大小升序排列编码为1,2,3,4,5。生成一个新的变量y:

1
2
3
4
5
6
7
8
clear all
set more off
set seed 8789
set obs 10000
gen u = uniform()
egen x =cut(u), at(0,0.1,0.3,0.55,0.8,1)
egen y = group(x)
tab y

六位数字体育彩票的100次模拟

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
clear all
set more off
set obs 10
set seed 45678
postfile premutation x1 x2 x3 x4 x5 x6 using temp, replace
gen u = .
forval i = 1(1)100{
qui replace u = uniform()
qui egen x = rank(u)
qui replace x = x - 1
post premutation (x[1]) (x[2]) (x[3]) (x[4]) (x[5]) (x[6])
drop x
}
postclose premutation
use temp, clear

六合彩随机投注程序

1
2
3
4
5
6
7
8
9
10
11
12
13
clear all
set more off
set obs 49
postfile premutation x1 x2 x3 x4 x5 x6 using temp, replace
gen u = .
forval i = 1/10000{
qui replace u = uniform()
qui egen x = rank(u)
post premutation (x[1]) (x[2]) (x[3]) (x[4]) (x[5]) (x[6])
drop x
}
postclose premutation
use temp, clear

几何分布离散型随机变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
* p(x = i) = p(1-p)^(i-1)
clear all
set more off
set obs 10000
set seed 5678
gen u = uniform()
gen x = 1 if u < 0.2
local i = 2
qui sum x
local missing = _N - r(N)
while `missing' ~= 0 {
local pi = 1 - 0.8^(`i'-1)
qui replace x = `i' if u < `pi' & x == .
local i = `i' + 1
qui sum x
local missing = _N-r(N)
}
drop u
tab x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
local t1 = clock(c(current_date) + " " + c(current_time), "DMYhms")
clear
set more off
set obs 10000
gen u = uniform()
gen n = _n
sort u
gen x = 1 if _n == 1
local pi = 0
local x = 1
local N = _N
local i = 2
while `i' <= `N'{
if u[`i'] < `pi'{
qui replace x = `x' if _n == `i'
local i = `i' + 1
}
else{
local x = `x' + 1
local pi = 1 - 0.8^(`x'-1)
}
}
sort n
drop n u
tab x
local t2 = clock(c(current_date) + " " + c(current_time), "DMYhms")
local run_time = `t2' - `t1'
di "The program run `run_time' milliseconds"

泊松分布离散型随机变量的模拟

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
local t1 = clock(c(current_date) + " " + c(current_time), "DMYhms")
clear
set more off
set seed 1234
set obs 10000
gen u = uniform()
local F = exp(-0.2)
local pi = exp(-0.2)

gen x = 0 if u < `F'
local i = 0
qui sum x
local missing = _N - r(N)
while `missing' ~= 0{
local i = `i' + 1
local pi = `pi'*0.2/`i'
local F = `F' + `pi'
qui replace x = `i' if u < `F' & x == .
qui sum x
local missing = _N - r(N)
}
drop u
tab x
local t2 = clock(c(current_date) + " " + c(current_time), "DMYhms")
local run_time = `t2' - `t1'
di "The program run `run_time' ms"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
local t1 = clock(c(current_date) + " " + c(current_time), "DMYhms")
clear
set more off
set seed 6789
set obs 10000
gen n = _n
gen u = uniform()
sort u
gen x = 0 if _n == 1
local F = exp(-0.2)
local pi = exp(-0.2)
local x = 0
local N = _N
local i = 2
while `i' <= `N'{
if u[`i'] < `F'{
qui replace x = `x' if _n == `i'
local i = `i' + 1
}
else{
local x = `x' + 1
local pi = `pi' * 0.2/`x'
local F = `F' + `pi'
}
}
sort n
drop n u
tab x
local t2 = clock(c(current_date) + " " + c(current_time), "DMYhms")
local rt = `t2'-`t1'
di "The program run `rt'ms"

伯努利离散型随机变量

伯努利试验B(10, 0.2)模拟

1
2
3
4
5
6
7
8
9
10
11
12
13
clear
set seed 234
set obs 10000
gen u = uniform()
local pi = 0.8^10
local F = 0.8^10
gen x = 0 if u < `pi'
forval i = 1(1)10{
local pi = `pi'*((10-`i'+1)/(`i'))*(0.2/0.8)
local F = `pi' + `F'
replace x = `i' if u < `F' & x == .
}
tab x

伯努利试验B(1000, 0.2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
clear
set seed 234
set obs 10000
gen u = uniform()
local pi = 0.8^1000
di `p'
local F = 0.8^1000
gen x = 0 if u < `pi'
forval i = 1(1)1000{
local pi = `pi'*((1000-`i'+1)/(`i'))*(0.2/0.8)
local F = `pi' + `F'
qui replace x = `i' if u < `F' & x == .
}
tab x
1
2
3
4
5
6
7
8
9
clear
set more off
set seed 567859
set obs 1000
gen u = uniform()
gen x = (u<0.2)
qui sum if x == 1
local N = r(N)
di "`N' success out of 1000 trails"

伯努利试验B(1000,0.2)多次模拟

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
clear all
set more off
cap program drop binomial
program define binomial, rclass
version 14.2
syntax[, obs(integer 1000) p(real 0.2)]
drop _all
set obs `obs'
tempvar z1 z2
gen `z1' = uniform()
gen `z2' = (`z1'<`p')
summarize `z2' if `z2' == 1
return scalar Num = r(N)
end
qui simulate N = r(Num), reps(10000): binomial, obs(1000) p(0.2)
tab N
hist N

连续型随机变量的模拟

单位圆上的双变量均匀分布

1
2
3
4
5
6
7
8
clear all
set more off
set seed 789
set obs 10000
gen r = uniform()
gen theta = uniform()*2*c(pi)
gen x = r*cos(theta)
gen y = r*sin(theta)
1
2
3
4
5
6
7
8
9
clear all
set more off
set seed 234
set obs 15000
gen x = uniform()*2 - 1
gen y = uniform()*2 - 1
keep if x^2 + y^2 <= 1
keep if _n <= 10000
tw scatter x y

指数分布

1
2
3
4
5
6
7
clear all
set more off
set seed 78
set obs 10000
gen u = uniform()
gen x = -ln(1-u)
hist x, bin(50) norm
1
2
3
4
5
6
7
clear all
set more off
set seed 789
set obs 10000
gen u = uniform()
gen x = -ln(u)
hist x, bin(50) norm

Box-Muller方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
clear all
set more off
set seed 789
set obs 10000
gen u1 = uniform()
gen u2 = uniform()
gen r2 = -2*log(u1)
gen theta = 2*c(pi)*u2
gen x = sqrt(r2)*cos(theta)
gen y = sqrt(r2)*sin(theta)
grss hist x, bin(50) norm
grss hist y, bin(50) norm
pwcorr x y

标准正态分布模拟

1
2
3
4
5
6
7
8
9
10
11
12
13
clear all
set more off
set seed 1990
set obs 20000
gen r2 = -2*log(uniform())
gen v1 = uniform()*2 - 1
gen v2 = uniform()*2 - 1
keep if v1^2+v2^2 <= 1
keep if _n<= 10000
gen x = sqrt(r2)*v1/sqrt(v1^2+v2^2)
gen y = sqrt(r2)*v2/sqrt(v1^2+v2^2)
hist x, bin(50) norm
hist y, bin(50) norm

Gamma分布(n个参数为lamda的独立指数分布的和)

1
2
3
4
5
6
7
8
9
10
11
clear all
set more off
set seed 56789
set obs 10000
forval i = 1/5{
gen exp`i' = -log(uniform())/0.8
}
egen gamma = rowtotal(exp*)
drop exp*
sum gamma, d
hist gamma, bin(50) norm

正态分布

1
2
3
4
5
6
clear all
set more off
set seed 56789
set obs 10000
gen x = invnorm(uniform()) //生成标准正态分布
hist x, bin(50) norm

对数正态分布

1
2
3
4
5
6
7
clear all
set more off
set seed 5678
set obs 10000
gen norm = invnorm(uniform())
gen lnorm = exp(norm)
hist lnorm, bin(50) norm

卡方分布

1
2
3
4
5
6
7
8
9
10
11
clear all
set more off
set obs 10000
set seed 6789
forval i = 1/10{
gen x`i' = invnorm(uniform())
gen x`i'_sq = x`i'^2
}
keep *_sq
egen chi_sq = rowtotal(x*_sq)
hist chi_sq, bin(50) norm

F分布

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
clear all
set more off
set seed 56789
set obs 10000
forval i = 1/10{
gen x`i' = invnorm(uniform())
gen x`i'_sq = x`i'^2
}
forval i = 1/5{
gen y`i' = invnorm(uniform())
gen y`i'_sq = y`i'^2
}
keep *_sq
egen Chi_x = rowtotal(x*_sq)
egen Chi_y = rowtotal(y*_sq)
gen F = (Chi_x/10)/(Chi_y/5)
hist F, bin(50) norm

t分布

1
2
3
4
5
6
7
8
9
10
11
12
13
clear all
set more off
set seed 34567
set obs 10000
gen x = invnorm(uniform())
forval i = 1/10{
gen y`i' = invnorm(uniform())
gen y`i'_sq = y`i'^2
}
keep x *_sq
egen Chi_y = rowtotal(y*_sq)
gen T = x/sqrt(Chi_y/10)
hist T, bin(50) norm

多维正态分布

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
clear all
set more off
set seed 56789
set obs 10000
mat S = (0.931,1.680,1.140\1.680,3.048,2.164\1.140,2.164,2.361) // 相关系数矩阵
mat Q = cholesky(S) //cholesky分解
gen e1 = invnorm(uniform())
gen e2 = invnorm(uniform())
gen e3 = invnorm(uniform())
gen x1 = 1.3 + e1*scalar(Q[1,1])
gen x2 = 3.5 + e1*scalar(Q[2,1]) + e2*scalar(Q[2,2])
gen x3 = -3 + e1*scalar(Q[3,1]) + e2*scalar(Q[3,2]) + e3*scalar(Q[3,3])
corr x1 x2 x3, cov
corr x1 x2 x3
sum x1 - x3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
clear all
set more off
set obs 10000
set seed 6789
mat R = J(3,3,.)
forval i = 1/3{
forval j = 1/3{
mat R[`i',`j'] = uniform() - 0.5
}
}
mat R = R*R'
mat Q = cholesky(R)
gen z1 = invnorm(uniform())
gen z2 = invnorm(uniform())
gen z3 = invnorm(uniform())
gen y1 = 1.3 + z1*scalar(Q[1,1])
gen y2 = 3.5 + z1*scalar(Q[2,1]) + z2*scalar(Q[2,2])
gen y3 = -3 + z1*scalar(Q[3,1]) + z2*scalar(Q[3,2]) + z3*scalar(Q[3,3])
corr y1 y2 y3, cov
corr y1 y2 y3
pwcorr y1 y2 y3
sum y1 - y3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
clear all
set more off
set obs 10000
set seed 5678
mat R = (1,-0.6,0.2\-0.6,1,0.4\0.2,0.4,1)
mat Q = cholesky(R)
gen e1 = invnorm(uniform())
gen e2 = invnorm(uniform())
gen e3 = invnorm(uniform())
gen x1 = e1*scalar(Q[1,1])
gen x2 = e1*scalar(Q[2,1]) + e2*scalar(Q[2,2])
gen x3 = e1*scalar(Q[3,1]) + e2*scalar(Q[3,2]) + e3*scalar(Q[3,3])
corr x1 x2 x3, cov
pwcorr x1 x2 x3
sum x1-x3
hist x1, bin(50) norm
hist x2, bin(50) norm
hist x3, bin(50) norm

### 截断分布抽样方法
#### 截断抽样

```stata
clear all
set more off
set seed 167
set obs 10000
gen u = uniform()
local a = normal(-1)
local b = normal(5)
gen v = `a' + u*(`b'-`a')
gen x = invnorm(v)
hist x, bin(50) norm

用接受拒绝的方法构造截断分布样本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
clear all
set more off
set seed 678
set obs 10000000
mat S = (0.931,1.680,1.140\1.680,3.048,2.164\1.140,2.164,2.361) //相关系数矩阵
mat Q = cholesky(S) //cholesky分解
gen e1 = invnorm(uniform())
gen e2 = invnorm(uniform())
gen e3 = invnorm(uniform())
gen x1 = 1.3 + e1*scalar(Q[1,1])
gen x2 = 3.5 + e1*scalar(Q[2,1]) + e2*scalar(Q[2,2])
gen x3 = -3 + e1*scalar(Q[3,1]) + e2*scalar(Q[3,2]) + e3*scalar(Q[3,3])
corr x1 x2 x3, cov
corr x1 x2 x3
sum x1 - x3
keep if x1 > -2 & x1 < 2
keep if x2 > -2 & x2 < 2
keep if x3 > -2 & x3 < 2
grss hist x1, bin(50) norm
grss hist x2, bin(50) norm
grss hist x3, bin(50) norm
# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了659.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×