Stata旧笔记整理(十一)

Stata旧笔记整理(十一)

之前老网站上有很多没有很好整理的笔记。之前也整理过一些,但是还有两百多篇,所以就简单汇总一下,便于检索。

plotbeta——回归结果可视化

1
2
3
4
* 回归结果可视化
sysuse auto, clear
reg price mpg turn length
plotbeta mpg | turn | length

1
plotbeta mpg | turn | length, labels

1
plotbeta mpg | turn | length, addplot((scatteri 2.5 2.5, ms(S) msize(*2))) labels

1
plotbeta mpg | turn | length | mpg-length , xtitle(Parameters) yscale(range(0.7 4.3) axis(1)) yscale(range(0.7 4.3) axis(2)) title(Coefficients and Confidence Intervals) subtitle(from a Simple Linear Regression) xline(0, lp(dash))

predictnl命令——在估计后获得非线性预测、标准误等等

获取拟合值和95%置信区间

1
2
3
4
sysuse auto, clear
reg price mpg
predictnl pprice = predict(xb), ci(max95 min95) l(95)
tw rarea max95 min95 mpg, sort color(gs12) || line pprice mpg, legend(off) ||, yla(, ang(0)) xti(里程数) xla(10(5)40, alt)

在probit模型中计算拟合值和标准误

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
webuse lbw, clear
probit low lwt smoke ptl ht
predictnl phat = normal(_b[_cons] + _b[ht]*ht + _b[ptl]*ptl + _b[smoke]*smoke + _b[lwt]*lwt), se(phat_se)
list low phat phat_se in 1/5

+---------------------------+
| low phat phat_se |
|---------------------------|
1. | 0 .0954207 .0423407 |
2. | 0 .1465948 .0402328 |
3. | 0 .4103567 .0714723 |
4. | 0 .3992932 .0693101 |
5. | 0 .4029725 .0700096 |
+---------------------------+
* 下面我们也绘制一个95%置信区间
gen pmax95 = phat + phat_se*1.96
gen pmin95 = phat - phat_se*1.96
tw rarea pmax95 pmin95 lwt, sort fcolor(green*0.8) || line phat lwt, sort

putexcel:从Stata到Excel

putexcel的基本语法结构:putexcel cellexportlist [,options]
cellexportlist分为两个部分:
1:cell:指定的excel的内容和结构
2:exportlist:指导出到excel的内容或结构
options:
1:modify:修改excel表格中的内容
2:replace
3:sheet(“sheet name” [,replace])
4: colwise: 把结果值按列输出到Excel,默认是按行;
5:keepcellformat:写入数据时,已有工作表的单元格式。

1
2
3
4
5
6
7
8
9
10
11
12
sysuse auto, clear
reg price mpg rep78 headroom weight length
return list
mat a = r(table)
mat a = a'
mat list a
putexcel set mytable.xls, replace
putexcel D5 = matrix(a), names nfor(number_d2) font(Arial,14,black)
putexcel D4 = "Regression output table", font(Arial,18,blue)
putexcel D4:M4, merge bold hcenter vcenter border(bottom)
putexcel D12:M12,border(top)
shellout mytable.xls

putexcel将矩阵输出到Excel

putexcel基本语法:putexcel set fileneame [,set_options]
set_options包括:modify, replace, sheet(sheetname[,replace])
对指定单元格或某一范围输入内容:
putexcel ul_cell = exp [,export_options format_options]
对指定单元格输入指定内容,ul指upper-left,意指对Excel输入内容的起始位置
putexcel ul_cell = matrix(name) [,export_options format_options]
对指定单元格输入矩阵
putexcel ul_cell = picture(filenames)
对指定单元格导入图片
putexcel ul_cell = returnset [,export_options]
对指定单元格输入Stata命令返回值
putexcel ul_cell = formula(formula) [, expoty_option]
对指定单元格输入Excel函数
putexcel cellrange, format_options
对指定范围的单元格调整格式
putexcel describe
描述目前对Excel的导出设置
putexcel clear
结束对指定Excel表格的修改或输入内容,若不使用则以后的putexcel命令还会对之前指定的Excel表格进行内容输入或者修改。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
clear
sysuse auto, clear
spearman price-gear
return list

putexcel set putexcel.xls, replace
putexcel D5 = matrix(r(Rho)),names nfor(number_d2)
// 选择项names表示保留矩阵行列名,nfor(number_d2)表示矩阵中显示两位小数
shellout putexcel.xls

import excel using putexcel.xls, describe
putexcel D4 = "spearman correlation cofficident", font(Arial,15,black)
putexcel D4:N4,merge bold hcenter vcenter border(bottom)
putexcel D16:N16,border(top)
putexcel E5:E16, hcenter vcenter
putexcel clear
shellout putexcel.xls

putexcel命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
//putexcel命令语法:ul_cell = returnset[,export_options]
//其中ul_cell用来设定输出的表格在Excel中的起始位置;
//returnset用来指定输出某类型返回值的内容和名称
clear all
global PATH "D:\Desktop\Stata笔记"
cd $PATH

//以auto数据为例,加入我们想要按照rep78的类别分别对mpg进行统计描述,
//并通过putexcel命令将其返回值输出到Excel表格中。
//1.rep78 = 1
sysuse auto, clear
putexcel set sum, replace
sum mpg if rep78 == 1, d
return list

//接着,我们把命令执行时返回的标量输出到Excel中,
//并通过colwise选项让标量的内容和名称横向排列。
putexcel D3 = rscalars D2 = rscalarnames C3 = 1 C2 = "rep78",colwise
import excel using sum.xlsx, describe

//接着我们可以对表格进行美化:
putexcel C1:V1 C3:V3,border(bottom) //对第一行和最后一行添加下边框
putexcel C1:V1, merge
putexcel C1 = "Summary statistics of mpg",hcenter vcenter bold font(Arial,15,black)

/*2.对rep78的不同取值进行循环
接下来,我们对rep78的不同取值进行循环,按照rep78的不同取值对mpg进行统计,并将返回值输出至
Excel中。*/
putexcel set sum1, replace
levelsof rep78, local(type)
local col = 3 //前两行是表头和变量名,我们从第三行开始
foreach byvar in `type' {
sum mpg if rep78 == `byvar', d
putexcel D`col' = rscalars C`col' = `byvar', colwise
local col = `col' + 1
}
putexcel C2 = "rep78" D2 = rscalarnames, colwise
import excel using sum1.xlsx, describe
putexcel C1:V1 C7:V7, border(bottom)
putexcel C1:V1, merge
putexcel C1 = "Summary statistics of mpg", hcenter vcenter bold font(Arial,15,black)
putexcel C2:C7,left
putexcel D2:V7,hcenter vcenter
shellout sum1.xlsx

putpdf创建pdf文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
	putpdf begin
putpdf paragraph, halign("center")
putpdf text ("putpdf创建pdf文件"), font("Times New Roman", 18, "black") linebreak
putpdf paragraph, halign("right")
putpdf text ("程振兴"), font("Times New Roman", 14, "black")
putpdf paragraph, halign("right")
putpdf text ("2017/9/26"), font("Times New Roman", 14, "black")

clear
sysuse auto,clear
putpdf paragraph, halign(center)
putpdf text ("表1:前10行样本"),bold
putpdf table tbl1 = data("make price mpg") in 1/10, varnames

putpdf paragraph, halign(center)
putpdf text ("表2:11-20行样本"), bold
putpdf table tbl2 = data("make price mpg") in 11/20, ///
varnames width(5) halign(center) border(all, , blue)

forvalues row = 2 (2) 11 {
forvalues col = 1/3 {
putpdf table tbl2(`row', `col'), bgcolor(lightblue) //指定单元格格底色为浅蓝色
}
}
forvalues col = 1/3 {
putpdf table tbl2(1, `col'), bold bgcolor(blue) font(,,white) //使指定单元格底色为蓝色,并且字体颜色为白色
}

//插入Stata图表
putpdf pagebreak
grss twoway scatter mpg weight if foreign == 0 ///
|| scatter mpg weight if foreign == 1, msymbol(sh)
graph export auto.png, replace
grss clear
putpdf paragraph, halign(center)
putpdf text ("行驶路程与车重的关系"), bold
putpdf paragraph, halign(center)
putpdf image auto.png, width(6)

//输出表格结果
statsby total = r(N) average = r(mean) Max = r(max) ///
Min = r(min), by(foreign):sum mpg
rename foreign Origin
putpdf paragraph, halign(center)
putpdf text ("国内与国外分别的汇总统计数据"), bold font(, 14, "black")
putpdf table tbl1 = data("Origin total average Max Min"), varnames

//我们还可以在上面的畅叙的基础上再做一些修饰:
putpdf table tbl2 = data("Origin total average Max Min"), varnames border(start, nil) border(end, nil) border(insideV, nil) border(insideH, nil)
forvalues row = 1/3{
forvalues col = 1/5{
putpdf table tbl2(`row', `col'), halign(center)
}
}
forvalues i = 1/5{
putpdf table tbl2(1,`i'), border(bottom, single, black)
}
putpdf save mypdf.pdf, replace

rcof——验证一个return代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
* 验证一个return代码
* 示例

discard /* 清除估计结果 */
rcof "regress" == 301
rcof "regress mpg weight badvar" == 111
* 如果一个命令的返回值为 111, 也就是说如果你运行这个命令,你将会看到:
regress mpg weight badvar

* 然后验证代码将会继续运行. Stata命令会继续运行. 如果你想看到运行结果包括错误信息, 你可以在Stata命令前面加上noisily:
rcof "noisily regress mpg weight badvar" == 111

* 但是代码运行将会被终止. 另一方面, 当返回值不是你推测的时候会发生如下情况:

rcof "regress mpg weight badvar" == 198

* 当Stata命令不包含下面的符号时,可以省略双引号 =, <, >, ~, and !.

rcof noisily regress mpg weight badvar == 111

* 下面的命令就会出错:

rcof gen mpg = 3 == 110

* 因为上面的命令含有等号,因此需要使用双引号:

rcof "gen mpg = 3" == 110

* 不鼓励省略双引号。

reclink——模糊匹配的方法

reclink是一种模糊匹配的方法,可以提高匹配的效率。当用于匹配的变量在两份数据中的记录不完全一样时,reclink就派上大用场了。reclink命令会在两份数据的匹配过程中生成匹配得分,也就是两个观测值之间的相似度,匹配得分的值介于0~1之间,如果匹配得分等于1,则说明两个观测值是完全一样的。

reclink语法:

1
reclink varlist using filename, idmaster(varname) idusing(varname) gen(newvarname) [wmatch(match weight list) wnomatch(non-match weight list) orblock(varlist) required(varlist) exactstr(varlist) exclude(filename) _merge(newvarname) uvarlist(varlist) uprefix(text) minscore(#) minbigram(#)]
  • gen(varname):记录匹配得分
  • idmaster(varname)和idusing(varname)分别是master data和using data中可以唯一识别每一条观测值的变量,在模糊匹配之后,可以根据idmaster(varname)和idusing(varname)查看master data和using data中的谁和谁匹配到一起了。
  • 先生成两份数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
clear all
set obs 8
input idmaster str10 prov str10 city str10 address
1 hubei wuhan "beijingroad"
2 beijing haidian "No.59"
3 zhejiang hangzhou "866"
4 shanghai shanghai "handanroad"
5 fujian xiamen "No.422"
6 sichuan chengdu "No.24"
7 beijing haidian "No.59"
8 guangzhou guangzhou "beijingroad"
save master, replace

clear
set obs 8
input idusing str10 prov str10 city str10 address
1 zhejiang hangzhou "866"
2 shanghai beijing "handanroad"
3 shanghai beijing "handan Rd"
4 sichuan chengdu "No.24"
5 fujian xiamen "siming south"
6 hubei wuhan "beijingroad"
7 beijing beijing "Yiheyuan"
8 beijing beijing "No.59"
save using, replace
clear
use master, clear
//现在我们想依据“prov+city+address”将两份数据合并在一起:
reclink prov city address using using.dta, idmaster(idmaster) idusing(idusing) gen(matchscore)
//_merge 表示匹配结果,等于3表明匹配成功,等于1说明没有成功。
//指定权重进行匹配:
clear
use master, clear
reclink prov city address using using.dta, idmaster(idmaster) idusing(idusing) gen(matchscore) wmatch(5 10 15)
//上述使用的选项是匹配成功指定权重,还可以使用匹配不成功指定权重:
clear
use master, clear
reclink prov city address using using.dta, idmaster(idmaster) idusing(idusing) gen(matchscore) wmatch(15 5 10) wnomatch(2 3 4)
//下面我们指定prov和city必须完全匹配才算匹配成功:
clear
use master, clear
reclink prov city address using using.dta, idmaster(idmaster) idusing(idusing) gen(matchscore) required(prov city)

regcheck命令:一次性检验6个线性回归假设

1
2
3
4
5
6
7
clear all
webuse nlsw88, clear
codebook //这个命令可以查看所有变量的一些基本属性
duplicates report wage //这个命令可以报告某个变量的重复性情况

reg wage hours ttl_exp age tenure
regcheck

从结果中我们可以看出这个命令检验的原假设有:
1:不存在异方差问题:BP检验
2:不存在多重共线性:方差膨胀因子
3: 残差非正态: Shapiro-Wilk 正态性检验
4:正确设定了模型:Link检验
5:选择了适当的函数形式:F检验
6:不存在离群值影响:Cook距离

regen:在生成新变量的同时添加标签

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
* Install
ssc install regen
sysuse auto, clear
* 在生成新变量的同时添加标签
regen mpg2 "Milage squared" = mpg^2

gen mpgid = 1
replace mpgid = 0 if inrange(mpg, 22, 25)
* 生成所有 mpgid = 1 的国产车的虚拟变量
regen byte id = anymatch(mpgid) if (foreign == 0), values(1)
regen byte id = anymatch(mpgid) if (foreign == 1), values(1) replace
regen byte id = anymatch(mpgid) if (foreign == 1), values(1) replace else(.)

* 使用表达式作为变量标签
regen ln_weight "\`s(expr)'" = ln(weight)
regen ln_weight "\`s(if)'" = ln(weight) if foreign == 0, replace
regen ln_weight "\`s(in)'" = ln(weight) in 1/20, replace
bysort foreign: regen ln_weight "\`s(by)'" = ln(weight), replace

regplot

1
2
3
sysuse auto, clear
regress mpg weight
regplot

1
2
3
gen weightsq = weight^2
regress mpg weight weightsq
regplot

1
2
regress mpg weight foreign
regplot, by(foreign)

1
regplot, sep(foreign)

1
2
regress mpg weight weightsq foreign
regplot, by(foreign)

1
regplot, sep(foreign)

1
2
3
gen fw = foreign * weight
regress mpg weight foreign fw
regplot, by(foreign)

1
regplot, sep(foreign)

1
2
3
logit foreign weight
regplot
`

1
2
glm mpg weight foreign, link(log)
regplot, by(foreign)

renames——批量命名变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
. sysuse auto, clear
(1978 Automobile Data)

. renames foreign \ orign

. a1
make price mpg rep78 headroom trunk weight length turn displacement gear_ratio orign

. renames `=r(czx)', p(uk)

. a1
ukmake ukprice ukmpg ukrep78 ukheadroom uktrunk ukweight uklength ukturn ukdisplacement ukgear_ratio ukorign

. renames `=r(czx)', s(00)

reshape命令——长短面板转换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
. clear all

. webuse reshape1, clear s

. list

+-------------------------------------------------------+
| id sex inc80 inc81 inc82 ue80 ue81 ue82 |
|-------------------------------------------------------|
1. | 1 0 5000 5500 6000 0 1 0 |
2. | 2 1 2000 2200 3300 1 0 0 |
3. | 3 0 3000 2000 1000 0 0 1 |
+-------------------------------------------------------+

. qui reshape long inc ue, i(id) j(year)

. list

+-----------------------------+
| id year sex inc ue |
|-----------------------------|
1. | 1 80 0 5000 0 |
2. | 1 81 0 5500 1 |
3. | 1 82 0 6000 0 |
4. | 2 80 1 2000 1 |
5. | 2 81 1 2200 0 |
|-----------------------------|
6. | 2 82 1 3300 0 |
7. | 3 80 0 3000 0 |
8. | 3 81 0 2000 0 |
9. | 3 82 0 1000 1 |
+-----------------------------+

. reshape wide
(note: j = 80 81 82)

Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 9 -> 3
Number of variables 5 -> 8
j variable (3 values) year -> (dropped)
xij variables:
inc -> inc80 inc81 inc82
ue -> ue80 ue81 ue82
-----------------------------------------------------------------------------

. list

+-------------------------------------------------------+
| id inc80 ue80 inc81 ue81 inc82 ue82 sex |
|-------------------------------------------------------|
1. | 1 5000 0 5500 1 6000 0 0 |
2. | 2 2000 1 2200 0 3300 0 1 |
3. | 3 3000 0 2000 0 1000 1 0 |
+-------------------------------------------------------+

.
. clear all

. webuse reshape1, clear s
file ~/Library/Application Support/Stata/ado/plus/r/reshape1.dta saved

. list

+-------------------------------------------------------+
| id sex inc80 inc81 inc82 ue80 ue81 ue82 |
|-------------------------------------------------------|
1. | 1 0 5000 5500 6000 0 1 0 |
2. | 2 1 2000 2200 3300 1 0 0 |
3. | 3 0 3000 2000 1000 0 0 1 |
+-------------------------------------------------------+

. qui reshape long inc ue, i(id) j(year)

. list

+-----------------------------+
| id year sex inc ue |
|-----------------------------|
1. | 1 80 0 5000 0 |
2. | 1 81 0 5500 1 |
3. | 1 82 0 6000 0 |
4. | 2 80 1 2000 1 |
5. | 2 81 1 2200 0 |
|-----------------------------|
6. | 2 82 1 3300 0 |
7. | 3 80 0 3000 0 |
8. | 3 81 0 2000 0 |
9. | 3 82 0 1000 1 |
+-----------------------------+

. qui reshape wide inc ue, i(id) j(year)

. list

+-------------------------------------------------------+
| id inc80 ue80 inc81 ue81 inc82 ue82 sex |
|-------------------------------------------------------|
1. | 1 5000 0 5500 1 6000 0 0 |
2. | 2 2000 1 2200 0 3300 0 1 |
3. | 3 3000 0 2000 0 1000 1 0 |
+-------------------------------------------------------+

riskplot_风险因子的分布

1
2
3
4
5
6
* net install gr0044.pkg, from("http://www.stata-journal.com/software/sj10-1/")
* net get gr0044.pkg, from("http://www.stata-journal.com/software/sj10-1/")
cuse data_riskplot.dta, clear
riskplot AH4 sclass depr1991, path ///
ytitle(Fluid intelligence 1995) saving(riskplotAH4, replace)
graph export riskplotAH4.png, replace

1
2
3
4
riskplot AH4 sex sclass depr1991, path ///
obs ytitle(Fluid intelligence 1995) trim(5) ///
saving(riskplotAH4trim, replace)
graph export riskplotAH4trim.png, replace

1
2
3
4
riskplot depr1995 sex sclass if Idep91==1, all ///
thick(20) scale(0.9) c(. red) ytitle(depression ///
score 1995) title(Risk plot for subjects with mild or severe depression at baseline, margin(b+5)) saving(riskplotDEP2, replace)
graph export riskplotDEP2.png, replace

1
2
3
4
5
riskplot depr1995 sex sclass if Idep91==1 [pw=wg], path obs thick(20) c(. red) /*
*/ title(Risk plot for subjects with mild or severe depression at baseline) /*
*/ subtitle((results using sampling weights), margin(b+5)) scale(0.9) /*
*/ ytitle(depression score 1995) saving(riskplotWG, replace)
graph export riskplotWG.png, replace

1
2
3
4
5
6
7
8
9
10
* 一个应用——绘制我和笑笑的好友分布
cuse me_xiaoxiao, c
egen cat_id = group(cat)
replace cat_id = cat_id - 1
riskplot id cat_id sex, ///
title(我和笑笑的好友分布) ///
path obs c(blue red green) ///
text(375 1.15 "笑笑") ///
text(250 1.1 "我")
gre treeplot

scat3

1
2
* 绘制3D散点图
scat3 weight length mpg

1
scat3 weight length mpg, rot(30) elev(60) axistype(z) titlez(, mlabang(0) mlabpos(9) mlabgap(*12)) titley(, mlabpos(7) mlabgap(*7)) title("This is a title", size(medium)) spikes(blw(vvthin)) ms(oh)

scatter_hist——绘制边缘带有柱形图的散点图

1
net install scatter_hist.pkg, from("https://bitbucket.org/keithk/kk-adofiles/raw/3668170c07edef8ad9f18af25b3e2a39673c62ee/")
1
2
3
4
sysuse lifeexp, clear
gen loggnp = log10(gnppc)
label var loggnp "以10为底的对数人均GNP"
scatter_hist lexp loggnp, percent color(dkorange) m(s) msize(small) xtitle(对数人均GPD) ytitle(出生时预期寿命)

  • percent用于指定柱形图的纵轴使用百分比刻度;
  • color(dkorange):用于指定柱形图的柱子颜色,深橙色;
  • msize(small):用于指定散点图中点的大小;
  • m(s):控制点的形状为方形,s表示方形,t表示三角形;

separate_产生分离变量

1
2
3
4
5
6
7
8
9
10
11
12
. sysuse auto, clear
(1978 Automobile Data)

. separate mpg, by(foreign)

storage display value
variable name type format label variable label
----------------------------------------------------------------------------------
mpg0 byte %8.0g mpg, foreign == Domestic
mpg1 byte %8.0g mpg, foreign == Foreign

. qqplot mpg0 mpg1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
. sysuse auto, clear
(1978 Automobile Data)

. separate mpg, by(price > 6000)

storage display value
variable name type format label variable label
----------------------------------------------------------------------------------
mpg0 byte %8.0g mpg, !(price > 6000)
mpg1 byte %8.0g mpg, price > 6000

.
. sysuse auto, clear
(1978 Automobile Data)

. separate mpg, by(price > 6000) gen(mpgpr)

storage display value
variable name type format label variable label
----------------------------------------------------------------------------------
mpgpr0 byte %8.0g mpg, !(price > 6000)
mpgpr1 byte %8.0g mpg, price > 6000

. ret list

macros:
r(varlist) : "mpgpr0 mpgpr1"

simpplot_绘制模拟p值和显著性的关系

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
* ssc install simpplot
prog drop _all
prog def sim, rclass
drop _all
set obs 500
gen x = rchi2(2)

ttest x = 2 in 1/50
ret scalar p50 = r(p)

ttest x = 2
ret scalar p500 = r(p)
end

set seed 12345
simulate p50 = r(p50) p500 = r(p500), ///
reps(5000): sim

label var p50 "N = 50"
label var p500 "N = 500"
simpplot p50 p500, main1opt(mcolor(red*0.5)) ///
main2opt(mcolor(blue*0.5))

The sixplot

  • The plot in the (1,1) position is a sequence plot of varname versus the sequence.
  • The plot in the (1,2) position is a residual versus fitted plot of the regression of varname versus sequence.
  • The plot in the (1,3) position is a boxplot of varname.
  • The plot in the (2,1) position is a first difference plot of varname versus sequence.
  • The plot in the (2,2) position is a histogram of varname.
  • The plot in the (2,3) position is a normal quantile plot of varname.
1
2
sysuse uslifeexp.dta
sixplot le_male

1
sixplot le_female

1
2
sysuse nlsw88.dta
sixplot wage

1
sixplot wage in 1/300

sliceplot

1
2
3
4
5
6
* Install
* ssc install sliceplot
* net get gr0025.pkg, from("http://www.stata-journal.com/software/sj6-3/")

cuse maunaloa, clear
sliceplot line res date, slices(6) ytitle(residual (ppm)) yla(-6(2)4, ang(h)) xti("") combine(saving(fig, replace))

sparkline_绘制多个y和一个x的关系图

1
2
3
4
5
* install
* ssc install sparkline

webuse grunfeld, clear
sparkline invest mvalue kstock year if company == 1

1
sparkline invest year, over(company)

1
sparkline invest year, over(company) extremes

1
sparkline invest year, by(company) extremes

1
sparkline invest year, by(company, col(2) compact) subtitle(, pos(9) ring(1) nobexpand bcolor(none) placement(e)) extremes ysc(log)

1
2
sparkline invest mvalue kstock year, by(company) xtitle("") extremes
gre sparkline6

1
sparkline invest mvalue kstock year, by(company, note("")) xtitle("") extremes extremeslabel ysc(r(0.3 3.7))

1
2
3
4
bysort company (year): gen clabel= string(invest[_N], "%9.0g") + " " + string(company)
* for labmask: net describe gr0034, from(http://www.stata-journal.com/software/sj8-2)
labmask company, values(clabel)
sparkline invest year, over(company) flipy xtick(1935/1954) xla(1935(5)1950 1954, tlength(*1.6)) extremes

1
2
3
4
5
sysuse auto, clear
gen gpm = 1/mpg
sort rep78 gpm weight
gen observation = _n
sparkline rep78 gpm weight observation, recast(scatter) xla(1 10(10)70 74)

1
2
3
4
5
* iris data in Stata 11 up
webuse iris, clear
pca sep* pet*
predict PC1
sparkline sep* pet* PC1, recast(scatter) variablelabels format(%3.1f)

1
sparkline sep* pet* PC1, recast(scatter) yla(1 "sepal length" 2 "sepal width" 3 "petal length" 4 "petal width", axis(2)) subtitle(all measurements in cm, place(w) size(*0.8)) format(%3.1f) yli(1.5 2.5 3.5, lstyle(grid)) flipy

1
2
3
* stocks data in Stata 12 up
webuse stocks, clear
sparkline toyota nissan honda t

1
sparkline toyota nissan honda t, limits(-0.2 0.2)

1
sparkline toyota nissan honda t, limits(-0.2 0.2) height(0.8) yla(0.6 "-0.2" 1 "0" 1.4 "0.2" 1.6 "-0.2" 2 "0" 2.4 "0.2" 2.6 "-0.2" 3 "0" 3.4 "0.2", axis(2) labgap(*1) ticks) yli(1.5 2.5, lstyle(grid))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
clear
input levels freqcores freqblanks freqtools
25 21 32 70
24 36 52 115
23 126 650 549
22 159 2342 1633
21 75 487 511
20 176 1090 912
19 132 713 578
18 46 374 266
17 550 6182 1541
16 76 846 349
15 17 182 51
14 4 51 14
13 29 228 130
12 135 2227 729
end

foreach k in cores blanks tools {
gen `k' = 100 * freq`k' / (freqcores + freqblanks + freqtools)
}

sparkline cores blanks tools levels, yaxis(1 2) vertical ysc(reverse) yla(12/25, axis(1) nogrid) format(%2.1f) recast(connected) ms(Oh Dh Th) plotregion(color(gs13)) xli(1.5 2.5, lw(*2) lstyle(grid)) yla(12/25, nogrid ang(h) axis(2)) flipy

spgrid:密度图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
sysuse auto, clear
sum price mpg
clonevar x = mpg
clonevar y = price
replace x = (x - 0) / (50 - 0)
replace y = (y - 0) / (20000 - 0)
* 下面两句是生成绘图标签
mylabels 0(10)50, myscale((@ - 0) / (50 - 0)) local(XLAB)
mylabels 0(5000)20000, myscale((@ - 0) / (20000 - 0)) local(YLAB)

keep x y
save "xy.dta", replace

* 产生一个1000*1000的格点
spgrid, shape(hexagonal) xdim(1000) ///
xr(0 1) yr(0 1) ///
dots replace ///
cells("2D-GridCells.dta") ///
points("2D-GridPoints.dta")
* 估计二元变量的密度函数
spkde using "2D-GridPoints", ///
xcoord(x) ycoord(y) ///
bandwidth(fbw) fbw(0.1) dots ///
saving("2D-Kde.dta", replace)

* 绘制密度图
use "2D-Kde", clear
recode lambda (. = 0)
spmap lambda using "2D-GridCells.dta", ///
id(spgrid_id) clnum(20) fc(Rainbow) ///
ocolor(none ..) leg(off) ///
point(data("xy.dta") x(x) y(y)) ///
freestyle aspectratio(1) ///
xti("" "MPG") ///
xlab(`XLAB') ///
yti(" " "Price({c S|}US)")
gr export 密度图.png, width(3600) height(2400)

spikeplot

1
2
3
webuse ghanaage, clear s
spikeplot age [fw=pop], ytitle("Population in 1000s") xlab(0(10)90) xmtick(5(10)85)
gre spikeplot

1
2
3
webuse splotxmpl, clear s
spikeplot normal, round(.10) xlab(-4(1)4) root yti("频率平方根")
gre spikeplot1

split&nsplit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
. clear

. set obs 5
number of observations (_N) was 0, now 5

. input empid

empid
1. 116
2. 117
3. 118
4. 119
5. 120

. format empid %03.0f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
. nsplit empid, digits(1)

. list

+----------------------------------+
| empid empid1 empid2 empid3 |
|----------------------------------|
1. | 116 1 1 6 |
2. | 117 1 1 7 |
3. | 118 1 1 8 |
4. | 119 1 1 9 |
5. | 120 1 2 0 |
+----------------------------------+

.
. nsplit empid, digits(1 2) generate(ep1 ep2)

. list

+----------------------------------------------+
| empid empid1 empid2 empid3 ep1 ep2 |
|----------------------------------------------|
1. | 116 1 1 6 1 16 |
2. | 117 1 1 7 1 17 |
3. | 118 1 1 8 1 18 |
4. | 119 1 1 9 1 19 |
5. | 120 1 2 0 1 20 |
+----------------------------------------------+

. clear

. input str20 date

date
1. "January 21, 1952"
2. "July 11, 1948"
3. "May 31, 1971"
4. "October 7, 2000"
5. end

. split date, parse("," " ") gen(ndate) notrim
variables created as string:
ndate1 ndate2 ndate3 ndate4

. list

+-------------------------------------------------------+
| date ndate1 ndate2 ndate3 ndate4 |
|-------------------------------------------------------|
1. | January 21, 1952 January 21 1952 |
2. | July 11, 1948 July 11 1948 |
3. | May 31, 1971 May 31 1971 |
4. | October 7, 2000 October 7 2000 |
+-------------------------------------------------------+

. replace ndate3 = ndate4
variable ndate3 was str1 now str4
(4 real changes made)

. drop ndate4

. list

+----------------------------------------------+
| date ndate1 ndate2 ndate3 |
|----------------------------------------------|
1. | January 21, 1952 January 21 1952 |
2. | July 11, 1948 July 11 1948 |
3. | May 31, 1971 May 31 1971 |
4. | October 7, 2000 October 7 2000 |
+----------------------------------------------+

. return list

macros:
r(nvars) : "4"
r(varlist) : "ndate1 ndate2 ndate3 ndate4 "

. foreach v in `r(varlist)' {
2. cap replace `v' = "+" + `v'
3. }

. list

+-----------------------------------------------+
| date ndate1 ndate2 ndate3 |
|-----------------------------------------------|
1. | January 21, 1952 +January +21 +1952 |
2. | July 11, 1948 +July +11 +1948 |
3. | May 31, 1971 +May +31 +1971 |
4. | October 7, 2000 +October +7 +2000 |
+-----------------------------------------------+

. order ndate3 ndate1 ndate2

. list

+-----------------------------------------------+
| ndate3 ndate1 ndate2 date |
|-----------------------------------------------|
1. | +1952 +January +21 January 21, 1952 |
2. | +1948 +July +11 July 11, 1948 |
3. | +1971 +May +31 May 31, 1971 |
4. | +2000 +October +7 October 7, 2000 |
+-----------------------------------------------+
# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了604.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×