Stata旧笔记整理(九)

Stata旧笔记整理(九)

之前老网站上有很多没有很好整理的笔记。之前也整理过一些,但是还有两百多篇,所以就简单汇总一下,便于检索。

dropstringvars——删除给定变量列表中的所有字符串型变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
net install dropstringvars.pkg, from("https://bitbucket.org/keithk/kk-adofiles/raw/3668170c07edef8ad9f18af25b3e2a39673c62ee/")

. sysuse auto, clear
(1978 Automobile Data)

. dropstringvars
Dropped variables:
make
1 variables were dropped.

. sysuse auto, clear
(1978 Automobile Data)

. dropstringvars mpg weight make
Dropped variables:
make
1 variables were dropped.

e(sample)-e类返回值的用处

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
. use un, clear

. reg deaths duration troop

Source | SS df MS Number of obs = 42
-------------+---------------------------------- F(2, 39) = 33.15
Model | 103376.04 2 51688.0199 Prob > F = 0.0000
Residual | 60815.7935 39 1559.37932 R-squared = 0.6296
-------------+---------------------------------- Adj R-squared = 0.6106
Total | 164191.833 41 4004.67886 Root MSE = 39.489

------------------------------------------------------------------------------
deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
duration | .137803 .0405977 3.39 0.002 .0556864 .2199197
troop | .0059166 .0007648 7.74 0.000 .0043697 .0074635
_cons | -6.556093 8.179624 -0.80 0.428 -23.10094 9.988759
------------------------------------------------------------------------------

. eret list

scalars:
e(N) = 42
e(df_m) = 2
e(df_r) = 39
e(F) = 33.1465341558571
e(r2) = .6296052472842496
e(rmse) = 39.48897720443585
e(mss) = 103376.0398278876
e(rss) = 60815.7935054457
e(r2_a) = .6106106445808778
e(ll) = -212.4320571229659
e(ll_0) = -233.2889619227027
e(rank) = 3

macros:
e(cmdline) : "regress deaths duration troop"
e(title) : "Linear regression"
e(marginsok) : "XB default"
e(vce) : "ols"
e(depvar) : "deaths"
e(cmd) : "regress"
e(properties) : "b V"
e(predict) : "regres_p"
e(model) : "ols"
e(estat_cmd) : "regress_estat"

matrices:
e(b) : 1 x 3
e(V) : 3 x 3

functions:
e(sample)

. local regressors: colnames e(b)

. di "Regressors: `regressors'"
Regressors: duration troop _cons

.
. * 下面的命令将会获得回归样本中的统计量

. sum duration if e(sample)

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
duration | 42 90.04762 152.8856 2 641

ellip绘制置信椭圆

1
2
3
4
5
* 绘制置信椭圆
* ssc install ellip
sysuse auto, clear
ellip mpg weight, by(for, total leg(off)) ///
total tlabel(Total as a by-group) plot(scatter mpg weight)

eregress——Stata15中的内生性处理新命令

eregress的基本用法:

eregress depvar [indepvars], endogenous(depvars_en = varlist_en)[options]
depvar: 被解释变量
indepvar: 外生控制变量
depvars_en: 内生协变量
varlist_en: 包括工具变量和影响内生变量的其它变量遵循IV的计量原理,eregress命令会根据用户输入的变量构建主回归方程和辅助回归方程并使用MLE对模型进行估计。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
. * 案例:探究高中成绩对大学成绩的影响,控制家庭因素
. clear all

. webuse class10, clear
(Class of 2010 profile)

. * 首先是OLS回归
. reg gpa hsgpa income

Source | SS df MS Number of obs = 1,528
-------------+---------------------------------- F(2, 1525) = 1528.62
Model | 411.968655 2 205.984327 Prob > F = 0.0000
Residual | 205.496943 1,525 .134752094 R-squared = 0.6672
-------------+---------------------------------- Adj R-squared = 0.6668
Total | 617.465598 1,527 .404365159 Root MSE = .36709

------------------------------------------------------------------------------
gpa | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsgpa | 1.613996 .036998 43.62 0.000 1.541424 1.686568
income | .0444945 .0032092 13.86 0.000 .0381996 .0507895
_cons | -2.298322 .1078463 -21.31 0.000 -2.509865 -2.086779
------------------------------------------------------------------------------

. * 这里遗漏了智商等不可观测因素
. /* 研究人员认为一所高中的竞争力会影响学生的成绩,而一旦高中
> 平均成绩得到控制,高中的竞争力对大学的平均成绩的影响就可以
> 忽略,因此选择高中的排名作为高中GPA的IV。
> */
. eregress gpa income, endogenous(hsgpa = income i.hscomp)

Iteration 0: log likelihood = -638.58598
Iteration 1: log likelihood = -638.58194
Iteration 2: log likelihood = -638.58194

Extended linear regression Number of obs = 1,528
Wald chi2(2) = 1167.79
Log likelihood = -638.58194 Prob > chi2 = 0.0000

-------------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
gpa |
income | .0575145 .0055174 10.42 0.000 .0467007 .0683284
hsgpa | 1.235868 .133686 9.24 0.000 .9738484 1.497888
_cons | -1.217141 .3828614 -3.18 0.001 -1.967535 -.4667464
--------------------+----------------------------------------------------------------
hsgpa |
income | .0356403 .0019553 18.23 0.000 .0318079 .0394726
|
hscomp |
moderate | -.1310549 .0136503 -9.60 0.000 -.1578091 -.1043008
high | -.2331173 .0232712 -10.02 0.000 -.278728 -.1875067
|
_cons | 2.951233 .0164548 179.35 0.000 2.918982 2.983483
--------------------+----------------------------------------------------------------
var(e.gpa)| .1436991 .0083339 .1282592 .1609977
var(e.hsgpa)| .0591597 .0021403 .05511 .063507
--------------------+----------------------------------------------------------------
corr(e.hsgpa,e.gpa)| .2642138 .0832669 3.17 0.002 .0948986 .4186724
-------------------------------------------------------------------------------------

.
. * eregress vs. ivreg2
. /* 首先,对比eregress,我们生成两个虚拟变量作为hsgpa的工具变量:*/
. tab hscomp, gen(hscomp)

High school |
competitive |
ness |
category | Freq. Percent Cum.
------------+-----------------------------------
low | 750 30.00 30.00
moderate | 1,501 60.04 90.04
high | 249 9.96 100.00
------------+-----------------------------------
Total | 2,500 100.00

. ivreg2 gpa income (hsgpa = hscomp2 hscomp3), liml savefirst

Stored estimation results
-------------------------
----------------------------------------------------------------------------
name | command depvar npar title
-------------+--------------------------------------------------------------
_ivreg2_hs~a | ivreg2 hsgpa 4 First-stage regression: hsgpa
----------------------------------------------------------------------------

LIML estimation
---------------
k =1.00006
lambda =1.00006

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

Number of obs = 1528
F( 2, 1525) = 582.75
Prob > F = 0.0000
Total (centered) SS = 617.4655979 Centered R2 = 0.6444
Total (uncentered) SS = 13946.5284 Uncentered R2 = 0.9843
Residual SS = 219.5722398 Root MSE = .3791

------------------------------------------------------------------------------
gpa | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsgpa | 1.235868 .1336861 9.24 0.000 .9738484 1.497888
income | .0575145 .0055174 10.42 0.000 .0467007 .0683284
_cons | -1.217141 .3828614 -3.18 0.001 -1.967535 -.4667464
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic): 124.894
Chi-sq(2) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 67.827
Stock-Yogo weak ID test critical values: 10% maximal LIML size 8.68
15% maximal LIML size 5.33
20% maximal LIML size 4.42
25% maximal LIML size 3.92
Source: Stock-Yogo (2005). Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments): 0.099
Chi-sq(1) P-val = 0.7532
------------------------------------------------------------------------------
Anderson-Rubin statistic (overidentification test of all instruments): 0.099
Chi-sq(1) P-val = 0.7532
------------------------------------------------------------------------------
Instrumented: hsgpa
Included instruments: income
Excluded instruments: hscomp2 hscomp3
------------------------------------------------------------------------------
.
end of do-file

考虑到eregress命令使用的估计方法是MLE,一些文献研究发现:

  1. 在大样本的情况下,liml估计量和2SLS是渐近等价的,而在非大
    样本的情况下,liml估计量的小样本性质更好,因为在有限样本之
    中两者对于IV赋予的权重不同;
  2. 在工具变量并不有效的情况下,尤其是在有限样本中,相对于2S
    LS和GMM,liml的偏误较小。

从结果可以看出两者的估计结果是相同的。

esttab功能Yes_or_no

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
clear
webuse nlswork, clear
xtset idcode year
tab year, gen(yd)
reg ln_w ttl_exp tenure not_smsa south, vce(cluster idcode)
est store m1
reg ln_w ttl_exp tenure not_smsa south yd*, vce(cluster idcode)
est store m2
xtreg ln_w age ttl_exp tenure not_smsa south yd*, fe vce(cluster idcode)
est store m3

esttab m1 m2 m3, star(* 0.1 ** 0.05 *** 0.01) b(%6.3f) t(%6.3f) compress nogap drop(yd*) stats(N r2_a, fmt(%12.0f %9.3f)) varwidth(20) title("Table1 Wage") mtitle("OLS" "OLS" "FE") nonum
// compress 压缩行距和列距
// 由于year是控制变量,我们不想观察其系数,故在输出时使用drop删除。
// 输出的表格仍然存在一定的问题:1.没有显示是否控制年份;2.没有显示cluster;3.没有显示是否控制个体效应(Fixed Effect)
// 针对与第一个问题,使用indicate()选项解决:
esttab m1 m2 m3, star(* 0.1 ** 0.05 *** 0.01) b(%6.3f) t(%6.3f) compress nogap stats(N r2_a, fmt(%12.0f %9.3f)) varwidth(20) indicate("Year FE = yd*") title("Table1 Wage") mtitle("OLS" "OLS" "FE")
// 针对后面两个问题,我们可以使用estadd命令来解决。
clear all
webuse nlswork, clear
xtset idcode year
tab year, gen(yd)
reg ln_w age ttl_exp tenure not_smsa south, vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Fixed_Effect "No", replace
est store m1
reg ln_w age ttl_exp tenure not_smsa south yd*, vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Fixed_Effect "No", replace
est store m2
xtreg ln_w age ttl_exp tenure not_smsa south yd*, fe vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Fixed_Effect "No", replace
est store m3
esttab m1 m2 m3, star(* 0.1 ** 0.05 *** 0.01) b(%6.3f) t(%6.3f) compress nogap stats(Fixed_Effect Cluster N r2_a, fmt(%3s %3s %12.0f %9.3f)) varwidth(20) indicate("Year FE = yd*") title("Table1 Wage") mtitle("OLS" "OLS" "FE")
//当然也可以使用estadd把Year FE标识调整到横线下面,程序修改如下:
reg ln_w age ttl_exp tenure not_smsa south, vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Year_FE "No", replace
estadd local Fixed_Effect "No", replace
est store m1
reg ln_w age ttl_exp tenure not_smsa south yd*, vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Year_FE "Yes", replace
estadd local Fixed_Effect "No", replace
est store m2
xtreg ln_w age ttl_exp tenure not_smsa south yd*, fe vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Year_FE "Yes", replace
estadd local Fixed_Effect "Yes", replace
est store m3
esttab m1 m2 m3, star(* 0.1 ** 0.05 *** 0.01) b(%6.3f) t(%6.3f) compress nogap drop(yd*) stats(Year_FE Fixed_Effect Cluster N r2_a, fmt(%3s %3s %3s %12.0f %9.3f)) varwidth(20) title("Table1 Wage") mtitle("OLS" "OLS" "FE")

esttab命令输出边际效用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
//esttab命令输出边际效用
clear all

//首先我们做一个logit回归
webuse lbw
logit low age lwt i.race smoke ptl ht ui
//然后计算边际效应,使用margin命令。margin命令可以估计指定协变量的边际效应并将其以表格的形式显示,语法:margins [marginlist] [if] [in] [weight] [,response_options options].
//where marginlist is a list of factor variables or interactions that appear in the current estimation results. The variables may be typed with or without the i. prefix.

margins race

//margins 命令非常强大,但也复杂,参数非常多。我们可以使用dydx(varlist)估计变量的边际效应;eyex(varlist)估计变量的弹性;dyex(varlist)来估计变量的半弹性(dy/d(lnx));eydx(varlist)估计变量的半弹性(d(lny)/dx)
//此外,在求边际效应时,我们也可以使用at()选项,指定在某种特定情况下,如指定协变量特定值来求解边际效应。比如:
margins, dydx(smoke) at(age = 20)
//over(varlist)用来计算在varlist每一值的情况下求解边际效应:
margins, dydx(smoke) over(race)

//输出结果到word中:estpost命令
//我们使用命令magins, dydx(*)来求上面的logit模型的所有的变量的边际效应并将边际效应的结果输出:
estpost margins, dydx(*)
esttab using margins.rtf, cell("b(star fmt(3)) t") pr2 replace compress nogap star(* 0.10 ** 0.05 *** 0.01) title("Marginal Effect")

//在实际中,也许我们会想将许多个边际效应结果放在一个表格里。此时我们需要结合eststo命令。
clear
set more off
webuse lbw
logit low age lwt i.race smoke ptl ht ui
eststo: estpost margins, dydx(race)
logit low age lwt i.race smoke ptl ht ui
eststo: estpost margins, dydx(race age)
logit low age lwt i.race smoke ptl ht ui
eststo: estpost margins, dydx(*)
esttab est1 est2 est3 using margins1.rtf, cell("b(star fmt(3))") pr2 replace compress nogap star(* 0.10 ** 0.05 *** 0.01) title("Marginal Effect")

esttab命令输出相关系数到rtf文档

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
//esttab命令输出相关系数到rtf文档
//esttab&estpost命令输出相关系数表
//理论上e()中储存的估计结果都可以使用esttab命令输出的,但需要一个命令将储存在e()中的结果传输到esttab里面,这就是estpost命令
clear all
set more off
sysuse auto, clear
estpost correlate price weight length, matrix
esttab .,not unstack compress noobs replace
esttab using corr.rtf,not unstack compress noobs replace
/*
matrix: 输出两两之间的相关系数
not: 表示不显示t统计量
unstack: 可以使输出的结果变成我们常见的下三角矩阵的相关系数表输出形式
compress:压缩
noobs:表示不显示观测值的数量
.: 表示输出当前的储存结果
*/
//另外还可以配合eststo命令,实现利用esttab同时导出分组估计结果,具体示例如下:
sysuse auto,clear
cap eststo clear
bysort foreign: eststo: estpost correlate price weight length, matrix
esttab est1 est2, not unstack compress noobs

//相关系数表导入rtf文档:
esttab est1 est2 using corr.rtf, unstack not noobs compress replace star(* 0.05 ** 0.01) b(%8.3f)

estwrite_储存估计结果,任意更改输出方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//estwrite: 储存估计结果,任意更改输出方式
clear all

//estwrite,命令可以将估计结果存储到后缀名为sters的文件中。使用estread命令,我们可以将估计结果读入Stata,然后任意更改输出方式。下面做两个回归并将结果用estwrite存储:
webuse nlswork, clear
xtset idcode year
qui tab year, gen(yd)
* 回归1: 控制年份OLS,对标准误进行聚类修正:
reg ln_w age ttl_exp tenure not_smsa south yd*, vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Year_FE "Yes", replace
estadd local Fixed_Effect "No", replace
est store m1

* 回归2: 控制年份的面板固定效应模型,对标准误进行聚类修正
xreg ln_w age ttl_exp tenure not_smsa south yd*, fe vce(cluster idcode)
estadd local Cluster "Yes", replace
estadd local Year_FE "Yes", replace
estadd local Fixed_Effect "Yes", replace
est store m2

estwrite * using mymodels //存储为sters文件

// 假定文章排版的时候需要改变输出方式,这个时候我们就可以通过estread调用mymodel.sters文件,使用esttab或estout输出结果:
estread mymodels
* 输出方式1:
esttab m1 m2, star(* 0.1 ** 0.05 *** 0.01) compress nogap indicate("Year=yd*") ar2(%9.3f) title("Table1 Wage") mtitle("OLS" "FE")
* 输出方式2:
esttab m1 m2, star(* 0.1 ** 0.05 *** 0.01) staraux b(%6.3f) t(%6.3f) compress nogap drop(yd*) stats(Year_FE Fixed_Effect Cluster N r2_a, fmt(%3s %3s %3s %12.0f %9.3f)) varwidth(20) title("Table1 Wage") mtitle("OLS" "OLS" "FE")

filelist

  • 如果文件过多,会失败的
  • To find all files in the current directory and its subdirectories
1
filelist
  • If there is a “main” directory within the current directory, you can search for all Stata datasets in “main” using
1
filelist, dir("main") pat("*.dta")
  • To search for all comma-separated data files in the “main” directory within the current directory and save the results to disk
1
filelist, dir("main") pat("*.csv") save("csv_datasets.dta")
  • You can run the following code if you want to use the saved search results to append all csv data files
1
2
3
4
5
6
7
8
9
10
11
12
13
14
use "csv_datasets.dta", clear
local obs = _N
forvalues i=1/`obs' {
use "csv_datasets.dta" in `i', clear
local f = dirname + "/" + filename
insheet using "`f'", clear
gen source = "`f'"
tempfile save`i'
save "`save`i''"
}
use "`save1'", clear
forvalues i=2/`obs' {
append using "`save`i''"
}

findsysmis——检查哪些变量中有缺失值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
. sysuse auto, clear
(1978 Automobile Data)

. findname
make rep78 weight displacement
price headroom length gear_ratio
mpg trunk turn foreign

. findsysmis `=r(varlist)'

Variables with sysmis (1 of 11 numeric variables)
------------------------------------------------------
rep78

. * strinclude: 也检查字符串变量

. findname
make rep78 weight displacement
price headroom length gear_ratio
mpg trunk turn foreign

. findsysmis `=r(varlist)', strinclude

Variables with sysmis
----------------------
Numeric variables (1 of 11):
rep78


String variables (0 of 1):

. findname
make rep78 weight displacement
price headroom length gear_ratio
mpg trunk turn foreign

. findsysmis `=r(varlist)', strinclude list

Variables with sysmis
-----------------------
rep78

1 of 11 numeric variables contain system missing values
-----------------------------------------------------------

0 of 1 string variables contain system missing values

findval——寻找值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
. sysuse auto, clear
(1978 Automobile Data)

. findval 100
The value of 100 is never found.

. findval 5
The value of 5 is found in variables: rep78 headroom trunk

. return list

scalars:
r(value) = 5

macros:
r(vars) : "rep78 headroom trunk"

.
. * generate creates a 0/1 variable set to one if the match is found in the ob
> servation, and missing if the if or in conditions are not satisfied.

.
. findval 5, gen(match5)
The value of 5 is found in variables: rep78 headroom trunk

. list match5 in 1/10

+--------+
| match5 |
|--------|
1. | 0 |
2. | 0 |
3. | 0 |
4. | 0 |
5. | 0 |
|--------|
6. | 0 |
7. | 0 |
8. | 0 |
9. | 0 |
10. | 0 |
+--------+

.
. findval 5, gen(match5f), if foreign
The value of 5 is found in variables: rep78 trunk

. li make `r(vars)' if match5f & match5f < .

+--------------------------------+
| make rep78 trunk |
|--------------------------------|
53. | Audi 5000 5 15 |
57. | Datsun 210 5 8 |
61. | Honda Accord 5 10 |
62. | Honda Civic 4 5 |
66. | Subaru 5 11 |
|--------------------------------|
67. | Toyota Celica 5 14 |
68. | Toyota Corolla 5 9 |
69. | Toyota Corona 5 11 |
71. | VW Diesel 5 15 |
74. | Volvo 260 5 14 |
+--------------------------------+

format命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
clear
set obs 1
gen v = "程振兴"
gen v1 = v
format %20s v
format v1 %20s
format v1 %-20s

clear
input v1
0.0023562
0.0345645
0.2345676
3.1234565
12.234566
123.12344
3456.4566
34566.456
345674.42
1234567.4
end
gen v2 = v1
gen v3 = v1
gen v4 = v1
gen v5 = -v1
format v2 %4.2f //四个字节、两个小数
format v3 %4.2f
format v4 %8.3g //八个字节、三位有效数字
format v5 %8.0g

gather&spread——数据的长宽转换

数据的长宽转换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
* 安装
net install tidy, from(https://github.com/matthieugomez/tidy.ado/raw/master/)
. sysuse educ99gdp.dta, clear
(Education and GDP)

. list, sep(0)

+----------------------------------+
| country public private |
|----------------------------------|
1. | Australia .7 .7 |
2. | Britain .7 .4 |
3. | Canada 1.5 .9 |
4. | Denmark 1.5 .1 |
5. | France .9 .4 |
6. | Germany .9 .2 |
7. | Ireland 1.1 .3 |
8. | Netherlands 1 .4 |
9. | Sweden 1.5 .2 |
10. | United States 1.1 1.2 |
+----------------------------------+
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
. gather public private
country
. list, sep(0)

+----------------------------------+
| country variable value |
|----------------------------------|
1. | Australia public .7 |
2. | Australia private .7 |
3. | Britain public .7 |
4. | Britain private .4 |
5. | Canada public 1.5 |
6. | Canada private .9 |
7. | Denmark public 1.5 |
8. | Denmark private .1 |
9. | France public .9 |
10. | France private .4 |
11. | Germany public .9 |
12. | Germany private .2 |
13. | Ireland public 1.1 |
14. | Ireland private .3 |
15. | Netherlands public 1 |
16. | Netherlands private .4 |
17. | Sweden public 1.5 |
18. | Sweden private .2 |
19. | United States public 1.1 |
20. | United States private 1.2 |
+----------------------------------+

. spread variable value

. list, sep(0)

+----------------------------------+
| country private public |
|----------------------------------|
1. | Australia .7 .7 |
2. | Britain .4 .7 |
3. | Canada .9 1.5 |
4. | Denmark .1 1.5 |
5. | France .4 .9 |
6. | Germany .2 .9 |
7. | Ireland .3 1.1 |
8. | Netherlands .4 1 |
9. | Sweden .2 1.5 |
10. | United States 1.2 1.1 |
+----------------------------------+

getfilename2——返回文件的名称格式和路径

1
2
3
4
5
6
7
8
9
10
* 安装:ssc install getfilename2
. getfilename2 "`=c(sysdir_plus)'n/nslwork.dta"

. ret list

macros:
r(filename) : "nslwork.dta"
r(root) : "nslwork"
r(ext) : "dta"
r(path) : "~/Library/Application Support/Stata/ado/plus/n"

graph3d

1
2
3
4
5
6
clear
set obs 600
gen x = int((_n - mod(_n-1,30)-1)/30)
gen z = mod(_n-1,30)
gen y = normalden(x,10,3)*normalden(z,15,5)*10000
graph3d x y z

1
2
// 用线连接起来
graph3d x y z, wire

1
graph3d x y z, colorscheme(cr) xang(80)

  • colorscheme: 调色盘,cr(蓝绿色和红色)bcgyr(蓝色、蓝绿色、绿色、黄色、红色)fade(黑色渐变色)
  • xang:x轴围绕枢轴旋转某个角度,默认45度
1
graph3d x y z, xang(80) cuboid innergrid coord(all) xlabel(x) xlangle(330) xlpos(9) yl(y) ylangle(90) ylpos(3) zlabel(z) zlangle(33) zlpos(11) colorscheme(bcgyr)

  • cuboid:添加长方体框
  • innergrid:在长方体框上添加额外的网格线
  • coord(all):标出长方体所有顶点的坐标
  • xlable:添加标签
  • xlangle: 标签的角度
  • xlpos: 标签的位置,有0-12个位置,1-12表示表盘的时刻,默认0,表示该标签放置在长方体顶点中心
1
graph3d x y z, markeroptions(msymbol() mfcolor(green) mlcolor(orange) msize(small))

  • msymbol:标记的形状:D(钻石) S(正方形) 等
  • mfcolor: 标记的填充颜色
  • mlcolor: 标记外部线条的颜色
  • msize: 标记的大小

grcomb:合并图

1
2
3
4
5
* 合并图
* ssc install grcomb

webuse nhanes2f, clear s
grcomb graph box copper zinc iron, v(1)

groups_列示分组频数和百分比

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
. * 列示分组频数和百分比

. sysuse auto, clear
(1978 Automobile Data)

.
. groups foreign

+-------------------------------------+
| foreign Freq. Percent %<= |
|-------------------------------------|
| Domestic 52 70.27 70.27 |
| Foreign 22 29.73 100.00 |
+-------------------------------------+

. tabulate foreign

Car type | Freq. Percent Cum.
------------+-----------------------------------
Domestic | 52 70.27 70.27
Foreign | 22 29.73 100.00
------------+-----------------------------------
Total | 74 100.00

. groups foreign rep78

+------------------------------------+
| foreign rep78 Freq. Percent |
|------------------------------------|
| Domestic 1 2 2.90 |
| Domestic 2 8 11.59 |
| Domestic 3 27 39.13 |
| Domestic 4 9 13.04 |
| Domestic 5 2 2.90 |
|------------------------------------|
| Foreign 3 3 4.35 |
| Foreign 4 9 13.04 |
| Foreign 5 9 13.04 |
+------------------------------------+

. tabulate foreign rep78

| Repair Record 1978
Car type | 1 2 3 4 5 | Total
-----------+-------------------------------------------------------+----------
Domestic | 2 8 27 9 2 | 48
Foreign | 0 0 3 9 9 | 21
-----------+-------------------------------------------------------+----------
Total | 2 8 30 18 11 | 69


.
. groups foreign rep78, fillin

+------------------------------------+
| foreign rep78 Freq. Percent |
|------------------------------------|
| Domestic 1 2 2.90 |
| Domestic 2 8 11.59 |
| Domestic 3 27 39.13 |
| Domestic 4 9 13.04 |
| Domestic 5 2 2.90 |
|------------------------------------|
| Foreign 1 0 0.00 |
| Foreign 2 0 0.00 |
| Foreign 3 3 4.35 |
| Foreign 4 9 13.04 |
| Foreign 5 9 13.04 |
+------------------------------------+

.
. bysort foreign: groups rep78

----------------------------------------------------------------------------------------
-> foreign = Domestic

+----------------------------------+
| rep78 Freq. Percent %<= |
|----------------------------------|
| 1 2 4.17 4.17 |
| 2 8 16.67 20.83 |
| 3 27 56.25 77.08 |
| 4 9 18.75 95.83 |
| 5 2 4.17 100.00 |
+----------------------------------+

----------------------------------------------------------------------------------------
-> foreign = Foreign

+----------------------------------+
| rep78 Freq. Percent %<= |
|----------------------------------|
| 3 3 14.29 14.29 |
| 4 9 42.86 57.14 |
| 5 9 42.86 100.00 |
+----------------------------------+

. groups foreign rep78, percentvar(foreign)

+------------------------------------+
| foreign rep78 Freq. Percent |
|------------------------------------|
| Domestic 1 2 4.17 |
| Domestic 2 8 16.67 |
| Domestic 3 27 56.25 |
| Domestic 4 9 18.75 |
| Domestic 5 2 4.17 |
|------------------------------------|
| Foreign 3 3 14.29 |
| Foreign 4 9 42.86 |
| Foreign 5 9 42.86 |
+------------------------------------+

. groups foreign rep78, percentvar(foreign) show(f p P)

+---------------------------------------------+
| foreign rep78 Freq. Percent %<= |
|---------------------------------------------|
| Domestic 1 2 4.17 4.17 |
| Domestic 2 8 16.67 20.83 |
| Domestic 3 27 56.25 77.08 |
| Domestic 4 9 18.75 95.83 |
| Domestic 5 2 4.17 100.00 |
|---------------------------------------------|
| Foreign 3 3 14.29 14.29 |
| Foreign 4 9 42.86 57.14 |
| Foreign 5 9 42.86 100.00 |
+---------------------------------------------+

.
. groups mpg, select(f == 1) show(none)

+-----+
| mpg |
|-----|
| 29 |
| 31 |
| 34 |
| 41 |
+-----+

. groups mpg, select(5)

+-------------------------------+
| mpg Freq. Percent %<= |
|-------------------------------|
| 12 2 2.70 2.70 |
| 14 6 8.11 10.81 |
| 15 2 2.70 13.51 |
| 16 4 5.41 18.92 |
| 17 4 5.41 24.32 |
+-------------------------------+

. groups mpg, select(-5)

+--------------------------------+
| mpg Freq. Percent %<= |
|--------------------------------|
| 30 2 2.70 93.24 |
| 31 1 1.35 94.59 |
| 34 1 1.35 95.95 |
| 35 2 2.70 98.65 |
| 41 1 1.35 100.00 |
+--------------------------------+

. groups mpg, select(5) order(h)

+-------------------------------+
| mpg Freq. Percent %<= |
|-------------------------------|
| 18 9 12.16 12.16 |
| 19 8 10.81 22.97 |
| 14 6 8.11 31.08 |
| 21 5 6.76 37.84 |
| 22 5 6.76 44.59 |
+-------------------------------+

.
. groups foreign rep78, fillin select(f == 0) show(none)

+-----------------+
| foreign rep78 |
|-----------------|
| Foreign 1 |
| Foreign 2 |
+-----------------+

.
. groups foreign rep78, sepby(foreign)

+------------------------------------+
| foreign rep78 Freq. Percent |
|------------------------------------|
| Domestic 1 2 2.90 |
| Domestic 2 8 11.59 |
| Domestic 3 27 39.13 |
| Domestic 4 9 13.04 |
| Domestic 5 2 2.90 |
|------------------------------------|
| Foreign 3 3 4.35 |
| Foreign 4 9 13.04 |
| Foreign 5 9 13.04 |
+------------------------------------+

. groups foreign rep78, sepby(foreign) showhead(# %)

+-------------------------------+
| foreign rep78 # % |
|-------------------------------|
| Domestic 1 2 2.90 |
| Domestic 2 8 11.59 |
| Domestic 3 27 39.13 |
| Domestic 4 9 13.04 |
| Domestic 5 2 2.90 |
|-------------------------------|
| Foreign 3 3 4.35 |
| Foreign 4 9 13.04 |
| Foreign 5 9 13.04 |
+-------------------------------+

.
. groups rep78, missing show(freq percent vpercent) separator(0)

+-----------------------------------+
| rep78 Freq. Percent % Valid |
|-----------------------------------|
| 1 2 2.70 2.90 |
| 2 8 10.81 11.59 |
| 3 30 40.54 43.48 |
| 4 18 24.32 26.09 |
| 5 11 14.86 15.94 |
| . 5 6.76 . |
+-----------------------------------+

. groups rep78, show(freq rfreq RPercent) ge

+------------------------------+
| rep78 Freq. #>= %>= |
|------------------------------|
| 1 2 69 100.00 |
| 2 8 67 97.10 |
| 3 30 59 85.51 |
| 4 18 29 42.03 |
| 5 11 11 15.94 |
+------------------------------+

. groups rep78, show(F f Rf) lt showhead(< = >)

+----------------------+
| rep78 < = > |
|----------------------|
| 1 0 2 67 |
| 2 2 8 59 |
| 3 10 30 29 |
| 4 40 18 11 |
| 5 58 11 0 |
+----------------------+

.
. groups mpg, reverse

+--------------------------------+
| mpg Freq. Percent %<= |
|--------------------------------|
| 41 1 1.35 100.00 |
| 35 2 2.70 98.65 |
| 34 1 1.35 95.95 |
| 31 1 1.35 94.59 |
| 30 2 2.70 93.24 |
|--------------------------------|
| 29 1 1.35 90.54 |
| 28 3 4.05 89.19 |
| 26 3 4.05 85.14 |
| 25 5 6.76 81.08 |
| 24 4 5.41 74.32 |
|--------------------------------|
| 23 3 4.05 68.92 |
| 22 5 6.76 64.86 |
| 21 5 6.76 58.11 |
| 20 3 4.05 51.35 |
| 19 8 10.81 47.30 |
|--------------------------------|
| 18 9 12.16 36.49 |
| 17 4 5.41 24.32 |
| 16 4 5.41 18.92 |
| 15 2 2.70 13.51 |
| 14 6 8.11 10.81 |
|--------------------------------|
| 12 2 2.70 2.70 |
+--------------------------------+

. groups mpg, reverse show(f p RP) ge

+--------------------------------+
| mpg Freq. Percent %>= |
|--------------------------------|
| 41 1 1.35 1.35 |
| 35 2 2.70 4.05 |
| 34 1 1.35 5.41 |
| 31 1 1.35 6.76 |
| 30 2 2.70 9.46 |
|--------------------------------|
| 29 1 1.35 10.81 |
| 28 3 4.05 14.86 |
| 26 3 4.05 18.92 |
| 25 5 6.76 25.68 |
| 24 4 5.41 31.08 |
|--------------------------------|
| 23 3 4.05 35.14 |
| 22 5 6.76 41.89 |
| 21 5 6.76 48.65 |
| 20 3 4.05 52.70 |
| 19 8 10.81 63.51 |
|--------------------------------|
| 18 9 12.16 75.68 |
| 17 4 5.41 81.08 |
| 16 4 5.41 86.49 |
| 15 2 2.70 89.19 |
| 14 6 8.11 97.30 |
|--------------------------------|
| 12 2 2.70 100.00 |
+--------------------------------+

.
. webuse nlswork, clear

. groups collgrad not_smsa c_city south, order(high) separator(0)

+--------------------------------------------------------+
| collgrad not_smsa c_city south Freq. Percent |
|--------------------------------------------------------|
| 0 0 0 0 5742 20.13 |
| 0 0 1 0 4941 17.32 |
| 0 1 0 1 3982 13.96 |
| 0 0 1 1 3455 12.11 |
| 0 1 0 0 3086 10.82 |
| 0 0 0 1 2527 8.86 |
| 1 0 0 0 1412 4.95 |
| 1 0 1 0 1096 3.84 |
| 1 0 1 1 698 2.45 |
| 1 0 0 1 598 2.10 |
| 1 1 0 0 566 1.98 |
| 1 1 0 1 423 1.48 |
+--------------------------------------------------------+

. groups collgrad not_smsa c_city south, order(high) separator(0) colorder(5 6)

+--------------------------------------------------------+
| Freq. Percent collgrad not_smsa c_city south |
|--------------------------------------------------------|
| 5742 20.13 0 0 0 0 |
| 4941 17.32 0 0 1 0 |
| 3982 13.96 0 1 0 1 |
| 3455 12.11 0 0 1 1 |
| 3086 10.82 0 1 0 0 |
| 2527 8.86 0 0 0 1 |
| 1412 4.95 1 0 0 0 |
| 1096 3.84 1 0 1 0 |
| 698 2.45 1 0 1 1 |
| 598 2.10 1 0 0 1 |
| 566 1.98 1 1 0 0 |
| 423 1.48 1 1 0 1 |
+--------------------------------------------------------+

insheetjson:读取json文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
clear all
* 查看json格式的数据
insheetjson using "http://finance.stockstar.com/finance/industrialdata/ajax/GetData.ashx?tablename=V_BS_IND_SPRIDXB70LMSC_M&filters=UNICNATRG-int=120000&pricecolumns=GRYOYSPRIDXRBNC,GRAPSPRIDXRBNC,GRYOYSPINCCRB,GRAPSPINCCRB,GRYOYSPRIDXSHRB,GRAPSPRIDXSHRB&orderby=STCYR,STCM", showresponse
* 加入flatten选项可以以键值对的形式查看数据
insheetjson using "http://finance.stockstar.com/finance/industrialdata/ajax/GetData.ashx?tablename=V_BS_IND_SPRIDXB70LMSC_M&filters=UNICNATRG-int=120000&pricecolumns=GRYOYSPRIDXRBNC,GRAPSPRIDXRBNC,GRYOYSPINCCRB,GRAPSPINCCRB,GRYOYSPRIDXSHRB,GRAPSPRIDXSHRB&orderby=STCYR,STCM", showresponse flatten
* 转换成数据集
gen str80 EDATE = ""
gen str80 PriceList = ""
insheetjson EDATE PriceList using "http://finance.stockstar.com/finance/industrialdata/ajax/GetData.ashx?tablename=V_BS_IND_SPRIDXB70LMSC_M&filters=UNICNATRG-int=120000&pricecolumns=GRYOYSPRIDXRBNC,GRAPSPRIDXRBNC,GRYOYSPINCCRB,GRAPSPINCCRB,GRYOYSPRIDXSHRB,GRAPSPRIDXSHRB&orderby=STCYR,STCM", table(Prices) col(EDATE PriceList)
"http://finance.stockstar.com/finance/industrialdata/ajax/GetData.ashx?tablename=V_BS_IND_SPRIDXB70LMSC_M&filters=UNICNATRG-int=110000&pricecolumns=GRYOYSPRIDXRBNC,GRAPSPRIDXRBNC,GRYOYSPINCCRB,GRAPSPINCCRB,GRYOYSPRIDXSHRB,GRAPSPRIDXSHRB&orderby=STCYR,STCM"
compress
gen clock = clock(EDATE, "YMDhms")
drop EDATE
order clock
format clock %tcCCYY-NN-DD
split PriceList, parse(,)
drop PriceList
ren PriceList1 新建住宅价格指数同比
ren PriceList2 新建住宅价格指数环比
ren PriceList3 新新建商品住宅价格指数同比
ren PriceList4 新建商品住宅价格指数环比
ren PriceList5 二手住宅价格指数同比
ren PriceList6 二手住宅价格指数环比
foreach i of varlist _all{
label var `i' "`i'"
}
label data "70个大中城市住宅销售价格指数_北京"
gen city = "北京"
order clock city
save 70个大中城市住宅销售价格指数_北京, replace

isid——解决重复值问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
. clear all

. sysuse auto, clear
(1978 Automobile Data)

. * Check whether mpg uniquely identifies observations

. isid mpg
variable mpg does not uniquely identify the observations
r(459);

. * Check whether make uniquely identifies observations

. isid make

. replace make = "" in 1
(1 real change made)

. isid make
variable make should never be missing
r(459);

. * missok indicates that missing values are permitted in varlist.

. isid make, missok

.
. webuse grunfeld, clear

. * Check whether panel and time variables uniquely identify observations

. isid company year

Jackknife_estimation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
. * Jackknife estimation

. * Setup

. sysuse auto
(1978 Automobile Data)

. * Jackknifed standard error of the sample mean

. jackknife r(mean): summarize mpg
(running summarize on estimation sample)

Jackknife replications (74)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
........................

Jackknife results Number of obs = 74
Replications = 74

command: summarize mpg
_jk_1: r(mean)
n(): r(N)

------------------------------------------------------------------------------
| Jackknife
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_jk_1 | 21.2973 .6725542 31.67 0.000 19.9569 22.6377
------------------------------------------------------------------------------

. * Jackknifed standard errors of the coefficients from a regression

. jackknife: regress mpg weight trunk
(running regress on estimation sample)

Jackknife replications (74)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
........................

Linear regression Number of obs = 74
Replications = 74
F( 2, 73) = 78.10
Prob > F = 0.0000
R-squared = 0.6543
Adj R-squared = 0.6446
Root MSE = 3.4492

------------------------------------------------------------------------------
| Jackknife
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0056527 .0010216 -5.53 0.000 -.0076887 -.0036167
trunk | -.096229 .1486236 -0.65 0.519 -.3924354 .1999773
_cons | 39.68913 1.873324 21.19 0.000 35.9556 43.42266
------------------------------------------------------------------------------

keyplot

  • keyplot produces a standard scatter plot with between one and ten y variables and one x variable, except that it attempts a more flexible approach to placing symbol keys for the y variables, either in the top title space or elsewhere.
1
2
3
sysuse auto, clear
separate mpg, by(rep78)
keyplot mpg1-mpg5 weight, varlbl l2(Miles per gallon)

1
keyplot mpg1-mpg5 weight, k(1 2 3 4 5) l2(Miles per gallon)
1
keyplot mpg1-mpg5 weight, k(Very poor\Poor\Fair\Good\Very good) sep(\) l2(Miles per gallon)

1
keyplot mpg1-mpg5 weight, k(1 2 3 4 5) l2(Miles per gallon) row(2000(2000)10000) col(26000)

1
keyplot mpg1-mpg5 weight, k(1 2 3 4 5) l2(Miles per gallon) row(2000(2000)10000) col(26000) t1(Report record) t1pos(5000 25500)

# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了607.9k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×