Stata绘图技巧总结(三)

Stata绘图技巧总结(三)

本文是旧笔记的汇总,本文汇总了Stata绘图中的一些技巧。

neat:避免散点的重叠

neat: A Stata layout module to create geometric shapes out of replicates in scatter plot.

Stata
1
2
3
4
* github install haghish/neat
set scheme plottig
use "https://raw.githubusercontent.com/haghish/neat/master/test/neat3.dta", clear
scatter v1 v2

Stata
1
2
neat v1 v2
scatter v1 v2

  • msize(num): Takes the size (numeric only) of the marker into account. The default value is 1.5.
  • xsize(num): Takes the size of the X-axis into account. The default value is 5.5.
  • ysize(num): Takes the size of the Y-axis into account. The default value is 4.
Stata
1
2
neat v1 v2 , msize(1.1) xsize(6.5)
scatter v1 v2, msize(1.1) xsize(6.5)

  • dsize(num): Changes the distance scale between the duplicated observations. The default is 5.
Stata
1
2
neat v1 v2 , dsize(7)
scatter v1 v2

drarea:绘制阴影图与阴影图之间的阴影

Stata
1
2
3
4
5
sysuse sp500, clear
gen high2 = high + 15*uniform()
gen low2 = low + 15*uniform()
* ssc install drarea
drarea high low high2 low2 date in 1/20

distplot:累积分布图

Stata
1
2
3
sysuse citytemp, clear
label var tempjan "Mean January temperature ({&degree}F)"
distplot tempjan, over(region)



Stata
1
distplot tempjan, by(region)

Stata
1
distplot tempjan, by(region) reverse

Stata
1
distplot tempjan, by(region) reverse(ge)

Stata
1
distplot tempjan tempjul, by(region) legend(order(1 "January" 2 "July")) xtitle("Mean temperature ({&degree}F)")

Stata
1
2
count
local np1 = r(N) + 1
  • 把当前的观测值复制为两倍
    Stata
    1
    2
    3
    4
    5
    expand 2
    replace region = 5 in `np1'/L
    label def region 5 "Total", add
    distplot tempjan, by(region)
    gre distplot5

devnplot:偏差图

Stata
1
2
3
4
5
6
7
webuse systolic, clear
gen dd = drug*disease
anova systolic drug disease dd
predict p
predict r, r
devnplot systolic drug disease, level(p) superplines
gre devnplot

dashgph:虚线图

Stata
1
2
sysuse auto, clear
dashgph price mpg, c(D)

Stata
1
2
dashgph price weight mpg, c(lD)
`

Stata
1
dashgph price weight mpg, c(DD) dash(200 1000) space(100 50) ylab

cycleplot:绘制周期图

Stata
1
2
3
4
5
webuse air2, clear
set scheme plottig, permanently
egen month = seq(), to(12)
gen year = floor(time)
cycleplot air month year, xla(1/12) start(2) yscale(log) su(median) leg(pos(6))

Stata
1
cycleplot air month year, xla(1/12) start(2) yscale(log) su(median) myla(`c(Mons)') leg(pos(6))

Stata
1
cycleplot air month year, xla(1/12) start(2) yscale(log) su(median) myla(J F M A M J J A S O N D) leg(pos(6))

  • 例如绘制平安银行的股价周期图
    Stata
    1
    2
    3
    4
    5
    6
    7
    stkpv 1, c f(19900101) t(20180101)
    gen year = yofd(date)
    gen month = mofd(date)
    collapse clsprc opnprc, by(year month)
    egen month1 = seq(), to(12)
    set scheme plottig, permanently
    cycleplot clsprc month1 year, sch(plottig) xla(1/12) myla(`c(Mons)') ysc(log) su(median) leg(label(1 "1991-2017年平安银行股票收盘价") label(2 "中位数") pos(6) c(2)) xti(月份) yti(收盘价)

Stata
1
cycleplot clsprc opnprc month1 year, sch(plottig) xla(1/12) myla(`c(Mons)') ysc(log) su(median) leg(label(1 "1991-2017年平安银行股票收盘价") label(2 "1991-2017年平安银行股票开盘价") label(3 "收盘价中位数") label(4 "开盘价中位数") pos(6) c(2) span) xti(月份) yti(收盘价)

Stata
1
cycleplot clsprc opnprc month1 year, sch(plottig) xla(1/12) myla(`c(Mons)') ysc(log) su(median) leg(label(1 "1991-2017年平安银行股票收盘价") label(2 "1991-2017年平安银行股票开盘价") label(3 "收盘价中位数") label(4 "开盘价中位数") pos(6) c(2) span) xti(月份) yti(收盘价) msize(*0.8) recast(connected)

compuse: 比较当前数据集和本地数据集

Stata
1
2
3
4
5
6
7
8
9
* net install compuse.pkg, from("http://digital.cgdev.org/doc/stata/MO/Misc")
* 准备一个本地数据集
webuse fullauto, clear
tempfile otherfile
save `otherfile'
* 适当改变变量 "price"
replace price = price + 10000*uniform()
* 使用图形表示内存数据集和本地数据集之间的差异
compuse price using `otherfile', sortvars(order) marker(make)

Stata
1
2
* Change the color scheme.
compuse price using `otherfile', sortvars(order) marker(make) scheme(s1rcolor)

coefplot:回归系数可视化/点估计+区间估计可视化

  • 使用centile命令计算中位数和95%置信区间然后使用coefplot绘图
    Stata
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    sysuse auto, clear
    matrix C = J(3,3,.)
    matrix rownames C = median ll95 ul95
    matrix colnames C = mpg trunk turn
    local i 0
    foreach v of var mpg trunk turn {
    local ++ i
    centile `v'
    matrix C[1,`i'] = r(c_1) \ r(lb_1) \ r(ub_1)
    }
    matrix list C
    coefplot matrix(C), ci((2 3))
    gre coefplot

  • 可视化均值、最小值、最大值
    Stata
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    sysuse auto, clear
    matrix C = J(3,3,.)
    matrix rownames C = mean min max
    matrix colnames C = mpg trunk turn
    local i 0
    foreach v of var mpg trunk turn {
    local ++ i
    sum `v'
    matrix C[1,`i'] = r(mean) \ r(min) \ r(max)
    }
    matrix list C
    coefplot matrix(C), ci((2 3))
    gre coefplot1

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
* ssc install eclplot
* ssc install sencode

sysuse auto, clear
tab rep78, gen(rep78_)
* parmby:Create dataset by calling an estimation command once for each by-group
parmby "regress mpg rep78_*", by(foreign) label norestore
* sencode: 把一个字符串变量变成一个带赋值标签的数值型变量
sencode label if parm != "_cons", gen(parmlab)
label var parmlab "Repair record 1978"
label var estimate "Mean mileage(MPG)"
eclplot estimate min95 max95 parmlab, eplot(bar) ///
estopts(barwidth(0.25)) supby(for, spaceby(0.25)) ///
xsc(r(0 6)) xla(1(1)5, ang(30)) leg(pos(6) row(1))
gre eclplot

eclplot:可视化置信区间与估计量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
* ssc install eclplot
* ssc install sencode

sysuse auto, clear
tab rep78, gen(rep78_)
* parmby:Create dataset by calling an estimation command once for each by-group
parmby "regress mpg rep78_*", by(foreign) label norestore
* sencode: 把一个字符串变量变成一个带赋值标签的数值型变量
sencode label if parm != "_cons", gen(parmlab)
label var parmlab "Repair record 1978"
label var estimate "Mean mileage(MPG)"
eclplot estimate min95 max95 parmlab, eplot(bar) ///
estopts(barwidth(0.25)) supby(for, spaceby(0.25)) ///
xsc(r(0 6)) xla(1(1)5, ang(30)) leg(pos(6) row(1))
gre eclplot

byhist: 双分组柱形图

1
2
3
4
5
6
* 双分组柱形图
* Install
* ssc install byhist
sysuse auto, clear
byhist mpg, by(foreign)
gre byhist

binscatter: 分组散点图

binscatter generates binned scatterplots, and is optimized for speed in large datasets.
Binned scatterplots provide a non-parametric way of visualizing the relationship between two variables. With a large number of observations, a scatterplot that plots every data point would become too crowded to interpret visually. binscatter groups the x-axis variable into equal-sized bins, computes the mean of the x-axis and y-axis variables within each bin, then creates a scatterplot of these data points. The result is a non-parametric visualization of the conditional expectation function.

Stata
1
2
3
4
5
sysuse nlsw88, clear
keep if inrange(age, 35, 44) & inrange(race, 1, 2)

sc wage tenure
gr export binscatter1.png, replace

Stata
1
2
binscatter wage tenure
gr export binscatter2.png, replace

qfit

Stata
1
2
binscatter wage tenure, line(qfit)
gr export binscatter3.png, replace

rd

Stata
1
2
binscatter wage tenure, rd(2.5) line(qfit)
gr export binscatter4.png, replace

Stata
1
2
sc wage age
gr export binscatter5.png, replace

Stata
1
2
binscatter wage age
gr export binscatter6.png, replace

by()

Stata
1
2
binscatter wage age, by(race)
gr export binscatter7.png, replace

absorb(): 控制变量

Stata
1
2
binscatter wage age, by(race) absorb(occupation)
gr export binscatter8.png, replace

graph nicely

Stata
1
2
3
4
5
6
7
binscatter wage age, by(race) ///
absorb(occupation) ///
ms(O T) ///
xti(Age) ///
yti(Hourly Wage) ///
leg(lab(1 White) lab(2 Black))
gr export binscatter9.png, replace

  • ereturn list
Stata
1
2
3
ereturn list
`e(graphcmd)'
gr export binscatter10.png, replace

bmp2dta命令:处理卫星数据

图片下载

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
. * bmp2dta命令:处理卫星数据

. * bmp2dta命令可以将24位bmp(也就是我们通常所说的卫星图片)文件转换为dta文件

. * 对于所获得的卫星数据,我们可以通过对RGB进行聚类分析来区分水域、森林和建筑物:

. clear all

. bmp2dta using temp, pic(WechatIMG20.bmp) replace

. use temp, clear

. cluster kmeans r g b, k(3) gen(c)
cluster name: _clus_1

.
. tab c

c | Freq. Percent Cum.
------------+-----------------------------------
1 | 27,289 35.82 35.82
2 | 14,432 18.95 54.77
3 | 34,455 45.23 100.00
------------+-----------------------------------
Total | 76,176 100.00

. * 观测c的描述性统计,粗略的可以理解成这块卫星图片覆盖的区域,水域占比19.14%,森林占比45.21% 城市占比35.66%。

cbarplot: 对称柱条图

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
* ssc install cbarplot
clear
input levels freqcores freqblanks freqtools
25 21 32 70
24 36 52 115
23 126 650 549
22 159 2342 1633
21 75 487 511
20 176 1090 912
19 132 713 578
18 46 374 266
17 550 6182 1541
16 76 846 349
15 17 182 51
14 4 51 14
13 29 228 130
12 135 2227 729
end
reshape long freq, i(levels) j(kind) string
cbarplot levels kind [fw=freq]
gre cbarplot1

Stata
1
2
cbarplot levels kind [fw=freq], percent(levels)
gre cbarplot2

Stata
1
2
cbarplot levels kind [fw=freq], percent(levels) mlabsize(*.6)
gre cbarplot3

Stata
1
2
cbarplot levels kind [fw=freq], percent(levels) mlabcolor(bg) rbaropts(bfcolor(blue))
gre cbarplot4

Stata
1
2
cbarplot levels [fw=freq], by(kind, row(1)) percent(levels) mlabcolor(bg) rbaropts(bfcolor(blue))
gre cbarplot5

Stata
1
2
cbarplot levels [fw=freq], by(kind, row(1)) percent(levels) mlabcolor(bg) rbaropts(bfcolor(blue)) ysc(reverse)
gre cbarplot6

catplot: 分类图

Stata
1
2
3
sysuse auto, clear
catplot rep78
gre catplot

Stata
1
2
catplot rep78, blabel(bar, pos(base) size(4)) bar(1, bfcolor(none)) ysc(off)
gre catplot1

Stata
1
2
catplot rep78 foreign
gre catplot2

Stata
1
2
catplot rep78 foreign, nofill
gre catplot3


Stata
1
2
catplot rep78, by(foreign) percent(foreign)
gre catplot4

Stata
1
2
catplot rep78, by(foreign) percent(foreign) recast(bar)
gre catplot5

Stata
1
2
catplot rep78 foreign, percent(foreign) bar(1, bcolor(blue)) blabel(bar, position(outside) format(%3.1f)) ylabel(none) yscale(r(0,60))
gre catplot6

Stata
1
2
3
4
5
gen himpg = mpg > 25
label def himpg 1 "mpg > 25" 0 "mpg <= 25"
label val himpg himpg
catplot himpg rep78 foreign
gre catplot7

Stata
1
2
catplot rep78 foreign, by(himpg, col(1) note("")) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e))
gre catplot8

Stata
1
2
catplot rep78 foreign, recast(dot) by(himpg, col(1) note("")) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e))
gre catplot9

Stata
1
2
catplot rep78 foreign, recast(bar) by(himpg, row(1) note("")) subtitle(, pos(6) ring(1) bcolor(none) nobexpand)
gre catplot10

Stata
1
2
catplot rep78, var1opts(sort(1))
gre catplot11

Stata
1
2
catplot rep78, var1opts(sort(1) descending)
gre catplot12

Stata
1
2
3
4
cuse titanic, clear
collapse survived, by(age sex pclass)
catplot age sex [aw=100*survived], by(pclass, compact note("") col(1)) bar(1, blcolor(gs8) bfcolor(gs14)) blabel(bar, format(%4.1f) pos(base)) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) ytitle(% survived from Titanic, place(e)) var1opts(gap(0)) var2opts(gap(*.2)) outergap(*.2) ysize(5) yla(0(25)100, glcolor(gs14) glw(*.5))
gre catplot13

Stata
1
2
catplot age sex [aw=100*survived], by(pclass, compact note("") col(1) ) bar(1, blcolor(gs8) bfcolor(pink*.2)) blabel(bar, format(%4.1f) pos(base)) subtitle(, pos(9) ring(1) bcolor(none) nobexpand place(e)) ytitle(% survived from Titanic) var1opts(gap(*0.1) axis(noline)) var2opts(gap(*.2)) ysize(5) yla(none) ysc(noline) plotregion(lcolor(none))
gre catplot14

# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了767.8k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×