Stata中的egen函数

Stata中的egen函数

本文介绍了我收集的一些egen函数。

egen与常见统计量

1
2
3
4
5
6
7
8
9
10
11
12
clear
set obs 10
/* 生成1-10的连续序列 */
gen A = _n
/* 生成总的观测值数 */
gen num = _N
/* 得到变量A的均值 */
egen avg = mean(A)
egen med = median(A)
egen std = sd(A)
egen min = min(A)
egen max = max(A)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. list

+--------------------------------------------+
| A num avg med std min max |
|--------------------------------------------|
1. | 1 10 5.5 5.5 3.02765 1 10 |
2. | 2 10 5.5 5.5 3.02765 1 10 |
3. | 3 10 5.5 5.5 3.02765 1 10 |
4. | 4 10 5.5 5.5 3.02765 1 10 |
5. | 5 10 5.5 5.5 3.02765 1 10 |
|--------------------------------------------|
6. | 6 10 5.5 5.5 3.02765 1 10 |
7. | 7 10 5.5 5.5 3.02765 1 10 |
8. | 8 10 5.5 5.5 3.02765 1 10 |
9. | 9 10 5.5 5.5 3.02765 1 10 |
10. | 10 10 5.5 5.5 3.02765 1 10 |
+--------------------------------------------+

egen参与分组和排序

1
2
3
4
5
6
7
8
9
10
* 根据奇偶数分组,偶数为0,奇数为1
gen se = mod(A, 2)
* 根据奇偶数分组,A只参与排序,不参与分组求分组后组列总和
bysort se(A): egen sum = sum(A)
* A既参与分组也参与排序,求分组后的组总和
bysort se A : egen sum1 = sum(A)
* 根据奇偶数分组,求分组后的组列累计和
bysort se(A): gen sum2 = sum(A)
sort se
list A se sum sum1 sum2

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. list A se sum sum1 sum2

+-----------------------------+
| A se sum sum1 sum2 |
|-----------------------------|
1. | 2 0 30 2 2 |
2. | 4 0 30 4 6 |
3. | 6 0 30 6 12 |
4. | 8 0 30 8 20 |
5. | 10 0 30 10 30 |
|-----------------------------|
6. | 1 1 25 1 1 |
7. | 3 1 25 3 4 |
8. | 5 1 25 5 9 |
9. | 7 1 25 7 16 |
10. | 9 1 25 9 25 |
+-----------------------------+

egen与seq

两者搭配可以生成有规律的连续序列:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
clear
set obs 10
gen a1 = _n
* 生成1-10
egen a2 = seq()
* 生成1-10
range a3 1 _N
* 根据奇偶数分组
gen se = mod(a1, 2)
* 根据奇偶数分组之后,连续序列中的每个数字重复两次
bysort se(a1): egen b = seq(), block(2)
* a1也参与分组
bysort se a1 : egen b1= seq(), block(2)
* 分组之后,变量c的序列从3到1递减并循环
bysort se(a1): egen c = seq(), from(3) to(1)
* 分组之后,变量d从1到3循环
bysort se(a1): egen d = seq(), to(3)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. list

+------------------------------------+
| a1 a2 a3 se b b1 c d |
|------------------------------------|
1. | 2 2 2 0 1 1 3 1 |
2. | 4 4 4 0 1 1 2 2 |
3. | 6 6 6 0 2 1 1 3 |
4. | 8 8 8 0 2 1 3 1 |
5. | 10 10 10 0 3 1 2 2 |
|------------------------------------|
6. | 1 1 1 1 1 1 3 1 |
7. | 3 3 3 1 1 1 2 2 |
8. | 5 5 5 1 2 1 1 3 |
9. | 7 7 7 1 2 1 3 1 |
10. | 9 9 9 1 3 1 2 2 |
+------------------------------------+

egen与fill

egen与fill()搭配也可以生成一串规律的序列,在某些情况下,fill与seq起到的效果是一样的,不同点在于seq的步长只能为1,而fill可以自己设定步长。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
clear
set obs 10
/* 按1、3、5的规律填充 */
egen a = fill(1 3 5)
/* 按1、2的规律填充 */
egen b = fill(1 2)
/* 从6开始到3,步长为-3 */
egen c = fill(6(-3)3)
/* 按1、1、2、2的规律填充 */
egen d = fill(1 1 2 2)
/* 按1、1、2、1、1、2的规律填充 */
egen e = fill(1 1 2 1 1 2)
egen f = fill(-3(3)6 -3(3)6)
egen g = fill(10 20 to 50 10 20 to 50)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. list

+---------------------------------+
| a b c d e f g |
|---------------------------------|
1. | 1 1 6 1 1 -3 10 |
2. | 3 2 3 1 1 0 20 |
3. | 5 3 0 2 2 3 30 |
4. | 7 4 -3 2 1 6 40 |
5. | 9 5 -6 3 1 -3 50 |
|---------------------------------|
6. | 11 6 -9 3 2 0 10 |
7. | 13 7 -12 4 1 3 20 |
8. | 15 8 -15 4 1 6 30 |
9. | 17 9 -18 5 2 -3 40 |
10. | 19 10 -21 5 1 0 50 |
+---------------------------------+

fill不能与bysort结合使用!

egen与diff

egen与diff搭配时,若varlist中的变量值都相同,则给diff赋值为0,否则为1。

1
2
3
clear
use "http://www.stata-press.com/data/r15/egenxmpl3", clear
egen differ = diff(inc*)

结果:

1
2
3
4
5
6
7
8
9
10
11
. list in 7/11

+-----------------------------------------+
| inc1 inc2 inc3 id differ |
|-----------------------------------------|
7. | 26,290 26,290 26,290 107 0 |
8. | 25,805 25,805 25,805 108 0 |
9. | 43,148 43,148 43,148 109 0 |
10. | 42,491 41,491 41,491 110 1 |
11. | 26,075 25,075 25,075 111 1 |
+-----------------------------------------+

egen与row function

1
2
3
4
5
6
7
8
9
10
11
12
clear
use "http://www.stata-press.com/data/r15/egenxmpl4", clear
* rowtotal()、rowmean()和rowsd()函数会忽略缺失值
egen total = rowtotal(a b c)
egen avg = rowmean(a b c)
egen std = rowsd(a b c)
* 得到缺失值的个数
egen miss = rowmiss(a b c)
* 最小值
egen min = rmin(a b c)
* 得到该行最后一个变量值(缺失值不计入)
egen last = rlast(a b c)

结果:

1
2
3
4
5
6
7
8
9
10
. list

+-----------------------------------------------------------+
| a b c total avg std miss min last |
|-----------------------------------------------------------|
1. | . 2 3 5 2.5 .7071068 1 2 3 |
2. | 4 . 6 10 5 1.414214 1 4 6 |
3. | 7 8 . 15 7.5 .7071068 1 7 8 |
4. | 10 11 12 33 11 1 0 10 12 |
+-----------------------------------------------------------+

egen与pc、pctile

如果想的得到某列变量中各个观测值所占的比例或百分比或者指定百分位数的值时,我们可以这样操作:

1
2
3
4
5
6
7
8
9
10
11
12
sysuse auto, clear
keep mpg
* 得到列总和
egen sum = sum(mpg)
* 得到每个观测值占列总和的比例(小数)
egen per = pc(mpg), prop
* 百分数
egen per_1 = pc(mpg)
* 得到mpg的0.25分位数
egen pct = pctile(mpg), p(25)
* 中位数
egen pct_1 = pctile(mpg)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
. sort mpg

. list in 1/10

+------------------------------------------------+
| mpg sum per per_1 pct pct_1 |
|------------------------------------------------|
1. | 12 1576 .0076142 .7614213 18 20 |
2. | 12 1576 .0076142 .7614213 18 20 |
3. | 14 1576 .0088832 .8883249 18 20 |
4. | 14 1576 .0088832 .8883249 18 20 |
5. | 14 1576 .0088832 .8883249 18 20 |
|------------------------------------------------|
6. | 14 1576 .0088832 .8883249 18 20 |
7. | 14 1576 .0088832 .8883249 18 20 |
8. | 14 1576 .0088832 .8883249 18 20 |
9. | 15 1576 .0095178 .9517766 18 20 |
10. | 15 1576 .0095178 .9517766 18 20 |
+------------------------------------------------+

egen与rank

两者搭配可以生成几种不同样式的排列顺序:

1
2
3
4
5
6
7
8
9
10
11
12
13
sysuse auto, clear
keep in 16/30
keep mpg
* 正序无并列排名
egen rank_u = rank(mpg), unique
* 正序排名,排名可能会出现0.5
egen rank = rank(mpg)
* 倒序排名
egen rank_r = rank(-mpg)
* 指定最小观测值排名为1,若出现相同数字,并列排名
egen rank_t = rank(mpg), track
* 指定最大观测值排名为1,若出现相同数字,并列排名
egen rank_f = rank(mpg), field

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
. sort rank_u

. list

+------------------------------------------------+
| mpg rank_u rank rank_r rank_t rank_f |
|------------------------------------------------|
1. | 12 1 1.5 14.5 1 14 |
2. | 12 2 1.5 14.5 1 14 |
3. | 14 3 3.5 12.5 3 12 |
4. | 14 4 3.5 12.5 3 12 |
5. | 16 5 5 11 5 11 |
|------------------------------------------------|
6. | 17 6 6 10 6 10 |
7. | 18 7 7 9 7 9 |
8. | 19 8 8 8 8 8 |
9. | 21 9 9 7 9 7 |
10. | 22 10 11 5 10 4 |
|------------------------------------------------|
11. | 22 11 11 5 10 4 |
12. | 22 12 11 5 10 4 |
13. | 24 13 13 3 13 3 |
14. | 28 14 14 2 14 2 |
15. | 30 15 15 1 15 1 |
+------------------------------------------------+

egen与anyvalue、anymatch

1
2
3
4
5
6
sysuse auto, clear
keep rep78
* 若rep78不为3、4、5,则为缺失值
egen a = anyvalue(rep78), v(3/5)
* 若rep78不为3、4、5,则赋值为0,否则为1
egen b = anymatch(rep78), v(3/5)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. list in 1/10

+---------------+
| rep78 a b |
|---------------|
1. | 3 3 1 |
2. | 3 3 1 |
3. | . . 0 |
4. | 3 3 1 |
5. | 4 4 1 |
|---------------|
6. | 3 3 1 |
7. | . . 0 |
8. | 3 3 1 |
9. | 3 3 1 |
10. | 3 3 1 |
+---------------+

egen与std

1
2
3
4
5
6
7
8
9
10
11
12
use "http://www.stata-press.com/data/r15/states1", clear
* 将age标准化为均值为0,方差为1的变量
egen stdage = std(age)
sum age stdage
* 两者的相关系数为1
corr age stdage
* 生成均值为1(不指定均值时默认为0),标准差为2的变量
egen newage1 = std(age), std(2)
* 均值为2,标准差为4
egen newage2 = std(age), mean(2) std(4)
egen newage3 = std(age), mean(2)
sum age new*

结果:

1
2
3
4
5
6
7
8
. sum age new*

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
age | 50 29.54 1.693445 24.2 34.7
newage1 | 50 1.28e-08 2 -6.306671 6.094089
newage2 | 50 2 4 -10.61334 14.18818
newage3 | 50 2 1 -1.153336 5.047044

apport: 席位分摊函数

【安装】:

1
2
* net install st0265.pkg, from("http://www.stata-journal.com/software/sj12-3/")
* net get st0265.pkg, from("http://www.stata-journal.com/software/sj12-3/")

【使用】:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cuse uspop, clear
save uspop, replace
* 原来use还可以这么用!
use uspop if year == 1790
* 按照人口数使用汉密尔顿方法将105个席位分摊给各个州
egen ham = apport(pop), method(hamilton) size(105)
* 累计求和
gen sumham = sum(ham)
* 总和
egen sumham1 = sum(ham)
* 总和
egen sumham2 = total(ham)
* 默认使用Jefferson方法
egen jeff = apport(pop), size(105)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
. list

+---------------------------------------------------------------------------------+
| state pop year size ham sumham sumham1 sumham2 jeff |
|---------------------------------------------------------------------------------|
1. | Virginia 630560 1790 105 18 18 105 105 19 |
2. | Massachusetts 475327 1790 105 14 32 105 105 14 |
3. | Pennsylvania 432879 1790 105 13 45 105 105 13 |
4. | North Carolina 353523 1790 105 10 55 105 105 10 |
5. | New York 331589 1790 105 10 65 105 105 10 |
|---------------------------------------------------------------------------------|
6. | Maryland 278514 1790 105 8 73 105 105 8 |
7. | Connecticut 236841 1790 105 7 80 105 105 7 |
8. | South Carolina 206236 1790 105 6 86 105 105 6 |
9. | New Jersey 179570 1790 105 5 91 105 105 5 |
10. | New Hampshire 141822 1790 105 4 95 105 105 4 |
|---------------------------------------------------------------------------------|
11. | Vermont 85533 1790 105 2 97 105 105 2 |
12. | Georgia 70835 1790 105 2 99 105 105 2 |
13. | Kentucky 68705 1790 105 2 101 105 105 2 |
14. | Rhode Island 68446 1790 105 2 103 105 105 2 |
15. | Delaware 55540 1790 105 2 105 105 105 1 |
+---------------------------------------------------------------------------------+

也可以和bysrot一起使用:

1
2
3
4
* 计算每年的各州席位分摊
cuse uspop, clear
by year: egen jeff = apport(pop), method(jefferson) size(size)
erase uspop.dta

clsort:仅对某个变量进行排序而不影响其他变量的顺序

【安装】:

1
net install _gclsort.pkg, from("http://fmwww.bc.edu/RePEc/bocode/_/")

1
2
3
4
5
6
7
8
9
10
11
12
clear
input ///
varname1 select
20 0
40 1
15 1
10 1
55 0
60 1
end
egen default = clsort(varname1) if select
egen inplace = clsort(varname1) if select, inplace

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
. list

+---------------------------------------+
| varname1 select default inplace |
|---------------------------------------|
1. | 20 0 10 . |
2. | 40 1 15 10 |
3. | 15 1 40 15 |
4. | 10 1 60 40 |
5. | 55 0 . . |
|---------------------------------------|
6. | 60 1 . 60 |
+---------------------------------------+

inequal:计算不平等度

【安装】:

1
2
net install egen_inequal.pkg, from("http://fmwww.bc.edu/RePEc/bocode/e/")
cuse fincome, clear

egen_inequal包中可以计算的不平等度:

index选项的参数 指标含义 中文
rmd the relative mean deviation 相对均差
cov the coefficient of variation 变异系数
sdl the standard deviation of logs 对数标准差
gini the Gini index 基尼指数
mehran the Mehran index Mehran指数
piesch the Piesch index Piesch指数
kakwani the Kakwani index Kakwani 指数
theil Theil entropy index 泰尔熵指数
mld the mean log deviation 对数均值偏差
entropy generalized entropy measure (GE -1) 广义熵测度-1
half generalized entropy measure (GE 2) 广义熵测度-2

示例:
其中,fincome是每个家庭的收入,fnum是每个家庭的人口数量,provcd是省份编码,下面的代码可以计算各个省份的不平等测度。

1
2
3
4
5
6
7
8
9
10
egen rmd = inequal(fincome), index(rmd) weight(fnum) by(provcd)
egen cov = inequal(fincome), index(cov) weight(fnum) by(provcd)
egen sdl = inequal(fincome), index(sdl) weight(fnum) by(provcd)
egen mehran = inequal(fincome), index(mehran) weight(fnum) by(provcd)
egen piesch = inequal(fincome), index(piesch) weight(fnum) by(provcd)
egen kakwani = inequal(fincome), index(kakwani) weight(fnum) by(provcd)
egen theil = inequal(fincome), index(theil) weight(fnum) by(provcd)
egen mld = inequal(fincome), index(mld) weight(fnum) by(provcd)
egen entropy = inequal(fincome), index(entropy) weight(fnum) by(provcd)
egen half = inequal(fincome), index(half) weight(fnum) by(provcd)

peers:产生一个同辈均值的变量,这个变量是除了自己的其他成员的均值

【安装】:

1
2
net install _peers.pkg, from("http://fmwww.bc.edu/RePEc/bocode/_/")
h _gpeers

例如:

1
2
3
4
sysuse auto, clear
keep weight foreign
egen peers = peers(weight), by(foreign)
gsort foreign

结果:

1
2
3
4
5
6
7
8
9
10
11
. list in 50/54

+------------------------------+
| weight foreign peers |
|------------------------------|
50. | 3,200 Domestic 3319.412 |
51. | 3,420 Domestic 3315.098 |
52. | 2,690 Domestic 3329.412 |
53. | 2,830 Foreign 2291.429 |
54. | 2,070 Foreign 2327.619 |
+------------------------------+

rndraw:生成随机数

从GB2(广义Beta分布), Singh-Maddala, Dagum, Fisk 和 Pareto分布里面生成随机数。
【安装】:

1
2
net install _grndraw.pkg, from("http://fmwww.bc.edu/RePEc/bocode/_/")
h _grndraw

1
2
3
4
5
6
7
8
9
10
11
12
13
clear
set obs 1000
/* Singh-Maddala分布 */
egen double ysm = rndraw() , sm(5 100 1.2)
/* 广义Beta分布 */
egen double ygb2 = rndraw() , gb2(5 100 0.8 1.2)
/* Pareto分布 */
egen double ypareto = rndraw() , pareto(100 2.5)
tw ///
kdensity ysm || ///
kdensity ygb2 || ///
kdensity ypareto, range(0 500)
gre rndraw

expgen:按照某个变量扩展数据集

这个不是egen函数,但是长得蛮像就也放这里吧!
【安装】:

1
net install expgen.pkg, from("http://fmwww.bc.edu/RePEc/bocode/e/")

使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
sysuse auto, clear
keep in 1/3
keep make price rep78 foreign
sort foreign make

. list

+----------------------------------------+
| make price rep78 foreign |
|----------------------------------------|
1. | AMC Concord 4,099 3 Domestic |
2. | AMC Pacer 4,749 3 Domestic |
3. | AMC Spirit 3,799 . Domestic |
+----------------------------------------+

. expgen nreps=rep78, copy(repseq) sortedby(unique) order

. list

+---------------------------------------------------------+
| foreign make repseq price rep78 nreps |
|---------------------------------------------------------|
1. | Domestic AMC Concord 1 4,099 3 3 |
2. | Domestic AMC Concord 2 4,099 3 3 |
3. | Domestic AMC Concord 3 4,099 3 3 |
4. | Domestic AMC Pacer 1 4,749 3 3 |
5. | Domestic AMC Pacer 2 4,749 3 3 |
|---------------------------------------------------------|
6. | Domestic AMC Pacer 3 4,749 3 3 |
+---------------------------------------------------------+

pct9010:egen函数示例

这个函数可以用于计算90分位数和10分位数之间的距离。同时这个函数可以配合if、in使用。

1
2
3
4
5
6
7
8
9
10
11
prog _gpct9010
version 14.0
syntax newvarname = /exp [if] [in] [, *]
tempvar touse p90 p10
mark `touse' `if' `in'
qui{
egen double `p90' = pctile(`exp') if `touse', `options' p(90)
egen double `p10' = pctile(`exp') if `touse', `options' p(10)
gen `typlist' `varlist' = `p90' - `p10' if `touse'
}
end

pctrange:egen函数示例

计算指定分位数的内距。
可以搭配by前缀和if、in使用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
prog _gpctrange
version 14
syntax newvarname = /exp [if] [in] [, LO(integer 25) HI(integer 75) *]
if `hi' > 99 | `lo' < 1{
di as error ///
"分位数 `hi' `lo'必须在1~99之间。"
error 198
}
if `hi' <= `lo'{
di as error ///
"分位数 `hi' `lo'必须是降序排列的。"
error 198
}
tempvar touse phi plo
mark `touse' `if' `in'
qui {
egen double `phi' = pctile(`exp') if `touse', `options' p(`hi')
egen double `plo' = pctile(`exp') if `touse', `options' p(`lo')
gen `typlist' `varlist' = `phi' - `plo' if `touse'
}
end

使用示例:

1
2
3
4
5
sysuse auto, clear
keep in 1/10
keep price rep78
bysort rep78: egen iqr = pctrange(price) if inrange(rep78, 3, 5)
bysort rep78: egen p8020 = pctrange(price) if inrange(rep78, 3, 5), hi(80) lo(20)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
. list

+-------------------------------+
| price rep78 iqr p8020 |
|-------------------------------|
1. | 5,788 3 1689 1689 |
2. | 10,372 3 1689 1689 |
3. | 5,189 3 1689 1689 |
4. | 4,082 3 1689 1689 |
5. | 4,816 3 1689 1689 |
|-------------------------------|
6. | 4,749 3 1689 1689 |
7. | 4,099 3 1689 1689 |
8. | 7,827 4 0 0 |
9. | 4,453 . . . |
10. | 3,799 . . . |
+-------------------------------+

concat: 变量值字符串化合并

使用:

1
2
cuse cb5a, clear
egen rowid = concat(state year)

结果:

1
2
3
4
5
6
7
8
9
10
11
. list

+-----------------------------------------------+
| state year cand1 cand2 cand3 rowid |
|-----------------------------------------------|
1. | TX 2001 Tom Dick Harry TX2001 |
2. | TX 2005 Dick Jane Harry TX2005 |
3. | MA 2002 John Jim Jack MA2002 |
4. | MA 2003 Jim Jill Joan MA2003 |
5. | MA 2005 John Jill Jim MA2005 |
+-----------------------------------------------+

然后就可以用于长宽转换了:

1
reshape long cand, i(rowid) j(candnr)

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
. list

+----------------------------------------+
| rowid candnr state year cand |
|----------------------------------------|
1. | MA2002 1 MA 2002 John |
2. | MA2002 2 MA 2002 Jim |
3. | MA2002 3 MA 2002 Jack |
4. | MA2003 1 MA 2003 Jim |
5. | MA2003 2 MA 2003 Jill |
|----------------------------------------|
6. | MA2003 3 MA 2003 Joan |
7. | MA2005 1 MA 2005 John |
8. | MA2005 2 MA 2005 Jill |
9. | MA2005 3 MA 2005 Jim |
10. | TX2001 1 TX 2001 Tom |
|----------------------------------------|
11. | TX2001 2 TX 2001 Dick |
12. | TX2001 3 TX 2001 Harry |
13. | TX2005 1 TX 2005 Dick |
14. | TX2005 2 TX 2005 Jane |
15. | TX2005 3 TX 2005 Harry |
+----------------------------------------+

group:生成分组变量

fillin adds observations with missing data so that all interactions of varlist exist, thus making a complete rectangularization of varlist. fillin also adds the variable _fillin to the dataset. _fillin is 1 for observations created by using fillin and 0 for previously existing observations.

1
2
3
4
sysuse auto, clear
fillin rep78 foreign
tw sc price mpg, by(foreign rep78, cols(5) compact)
gre 20180917a1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sysuse auto, clear
fillin rep78 foreign
tw sc price mpg, by(foreign rep78, cols(5) compact)
gre 20180917a1

egen group = group(foreign rep78)
label define group
1 "Poor" ///
2 "Fair" ///
3 "Average" ///
4 "Good" ///
5 "Excellent" ///
6 " " ///
7 " " ///
8 " " ///
9 " " ///
10 " "

label val grou group

tw sc price mpg, by(group, cols(5) ///
r1title("汽车类型", orientation(rvertical) ///
size(medsmall)) ///
t1title("维修状态", size(medsmall)) ///
note("") compact) xti("里程数") yti("价格")
gre 20180917a2

使用split、reshape、levelsof、egen命令一起处理数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
clear all
input byte id str15 v
1 "16,23"
2 "1,5,42"
3 "34,38,44,51,6,7"
end
* 现在想要统计一下每个数字是否包含在这个变量里,并且返回一个0、1变量
* 错误做法:
forv s = 1/51{
gen v_`s' = strpos(v, "`s'") > 0
}
* 正确做法:
clear all
input byte id str15 v
1 "16,23"
2 "1,5,42"
3 "34,38,44,51,6,7"
end
split v, gen(number) destring parse(",")
reshape long number, i(id) j(_j)
levelsof number, local(options)
foreach i of local options{
egen byte v_`i' = max(number == `i'), by(id)
}
drop _j number
by id, sort: keep if _n == 1

egen+mlabvpos改变标签的方向减少遮盖

1
2
3
4
5
6
7
vguse allstates, clear
egen clock = mlabvpos(ownhome propval100)
tw sc ownhome propval100, ///
mla(stateab) ///
mlabsize(small) ///
mlabvpos(clock)
gr export sc3.png, replace

wtmean:计算加权均值

【安装】:

1
net install _gwtmean.pkg, from("http://fmwww.bc.edu/RePEc/bocode/_/")

1
2
3
4
5
sysuse auto, clear
keep rep78 price
egen wtmean = wtmean(price), weight(rep78)
* 上面的计算结果和下面的结果是一样的
sum price [w=rep78]
# Stata

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了604.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×