Stata最优化与马科维茨有效前沿

Stata最优化与马科维茨有效前沿

在我之前的两篇推文中,我用 Stata 和 Python 分别实现了马科维茨有效前沿的拟合:马科维茨有效前沿实现(Stata 版本)马科维茨有效前沿实现(Python 版本)。但是用 Stata 实现的不完美,今天终于解决了这个问题——如何在 Stata 中实现最优化。

获取股票数据

首先复习一下前面的代码,这一次我使用的是:银泰资源、大悦城、中国天楹和通化金马四只股票。时间范围是2019年1月1日到今天8月14日。
首先获取四只股票的数据,这里我使用的是我的finance命令包里面的cntrae2命令,该命令包的使用可以参考:FINANCE:Stata 中的金融工具包,这个命令修改自爬虫俱乐部的cntrade命令(主要是修改了变量的命名):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
clear all
cd "Stata马科维茨有效前沿的实现"
* 使用上证50四根票进行组合,首先获取四只股票的交易数据
foreach i in "000975" "000031" "000035" "000766"{
cntrade2 `i', start(20190101) end(20190814)
save `i', replace
}

use 000975, clear
keep close date
ren close 银泰资源
save a000975, replace

use 000031, clear
keep close date
ren close 大悦城
save a000031, replace

use 000035, clear
keep close date
ren close 中国天楹
save a000035, replace

use 000766, clear
keep close date
ren close 通化金马
save a000766, replace

横向合并四个数据集:

1
2
3
4
5
6
use a000975, clear
foreach i in "000031" "000035" "000766"{
merge 1:1 date using a`i'
drop _m
}
save rawdata, replace

观测股价走势

把期初的股价标准化为100,观测四只股票的股价走势关系:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
use rawdata, clear
/* 股价标准化(2019年1月2日的股价设定为100) */
foreach i of varlist _all{
if "`i'" != "date"{
replace `i' = (`i' / `=`i'[1]') * 100
format `i' %6.2f
}
}
colorscheme 4, palette(Dark2)
tw ///
line 银泰资源 大悦城 中国天楹 通化金马 date, ///
xla(21550(45)21775, ang(20)) leg(order(1 "银泰资源" 2 "大悦城" ///
3 "中国天楹" 4 "通化金马") pos(1) ring(0)) ///
lp(solid solid solid solid) lc(`r(colors)')

接下来计算每只股票的对数收益率,这里需要注意由于股票交易日之间可能存在间隔,所以不能直接用日期作为时间变量:

计算对数收益率

对数收益率的计算公式:

$$R_{n} = \log{P_{n}} - \log{P_{n-1}}$$

其中$P_{n}$表示第 $n$ 天的股价。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* 计算对数收益率 */
use rawdata, clear
* 由于股票交易日之间可能存在间隔,所以不能直接用日期作为时间变量
gen dateid = _n
tsset dateid
foreach i of varlist _all{
if "`i'" != "date" & "`i'" != "dateid"{
gen l`i' = l.`i'
replace `i' = log(`i'/l`i')
drop l`i'
format `i' %6.4f
}
}
drop dateid
save returndata, replace

观测一下股票对数收益率的分布:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
use returndata, clear
foreach i of varlist _all{
if "`i'" != "date"{
ren `i' returns`i'
}
}

reshape long returns, i(date) j(name) string
hist returns, by(name, r note("")) bin(50) ///
xti(日对数收益率) yti(频率) ///
xla(, format(%6.2f)) norm

use returndata, clear
drop in 1

构建投资组合

最大化夏普比率

夏普比率代表交易者每多承担一份风险,相应的就可以拿到几份超额报酬。若其为正,代表回报率高过波动风险;若其为负值,代表操作风险大过回报率。这样,每个投资组合都可以计算夏普比率,即投资回报与多冒风险的比例,这个比率越高对应的投资组合越好。

$$Sharpe-ratio = \frac{R_{p} - R_{f}}{\sigma_{p}}$$

其中,$R_{p}$表示投资组合的组合收益率,$R_{f}$ 表示无风险利率,在下文中假设无风险收益率为 $4\%$,$\sigma_{p}$ 为组合收益的标准差,代表着组合的收益波动率。

假设组合中有$n$种证券,证券$i$的权重为$w_{i}$,收益率为$r_{i}$,那么该组合收益率的标准差为:$\sqrt{w’Vw}$,其中$w$为权重向量,$V$为收益率的协方差矩阵。下面就可以据此编写最大化夏普比率的程序了。我们现在有四只股票,也就是需要确定四只股票的权重,由于权重和要求为1,所以实际上只要确定三个权重参数就好了:

首先计算每只股票对数收益率的均值,然后存储在矩阵 returns 中(其实是个一维的行向量):

1
2
3
4
5
6
7
8
9
10
11
12
13
use returndata, clear
drop in 1

local j = 1
foreach i of varlist _all{
if "`i'" != "date"{
qui sum `i'
local r`j' = r(mean)
local j = `j' + 1
}
}
di "`r1', `r2', `r3', `r4'"
mat returns = (`r1', `r2', `r3', `r4')

然后计算协方差矩阵,并将协方差矩阵存储在矩阵variance中:

1
2
corr 银泰资源 大悦城 中国天楹 通化金马, cov
mat variance = r(C)

接下来就是使用 Stata 进行最优化的,为了便于大家理解,我先举一个 Stata 帮助文档里面的例子。假如我想寻找 $x$ 最大化 $y = e^{-x^2 + x - 3}$,代码如下:

1
2
3
4
5
6
7
8
9
10
11
mata:
void myeval(todo, x, y, g, H)
{
y = exp(-x^2 + x - 3)
}
S = optimize_init()
optimize_init_evaluator(S, &myeval())
optimize_init_params(S, 0)
x = optimize(S)
x
end

mata: 表示进入 Mata 语言的环境,然后是一个 myeval 函数,这个函数名前面有个 void,表示这个函数是无返回值的。函数有五个参数——$todo, x, y, g, H$,我们这里只需要用到 $x$ 和 $y$,剩下三个参数的含义可以自行阅读帮助文档(help optimize)。

也就是说,待调优的参数是放在函数的第二个参数上;函数的返回值是放在第三个参数上。

S = optimize_init() 表示初始化,optimize_init_evaluator(S, &myeval()) 设定要最优化的函数;optimize_init_params(S, 0)是设定初始值,这里是设定初始的$x = 0$,然后默认使用的最优化方法是Newton–Raphson方法。最后执行x = optimize(S)进行最优化,$x$就是我们需要的结果。

于是就有了下面的代码,首先我们允许卖空,就是说权重参数可以为负数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/* 允许卖空 */
mata:
mata clear
void sharpe_ratio(real scalar todo, real vector w123, sharpe, g, H)
{
real scalar w4
w4 = 1 - sum(w123)
real vector rmean
rmean = st_matrix("returns")
real scalar rp
real vector w
w = (w123, w4)
rp = w * rmean'
rp = rp * 252
real matrix variancep
variancep = st_matrix("variance")
variancep = w * variancep * w'
variancep = variancep * 252
real scalar std
std = sqrt(variancep[1, 1])
sharpe = (rp - 0.04) / std
}

S = optimize_init()
optimize_init_evaluator(S, &sharpe_ratio())
optimize_init_technique(S, "nr")
optimize_init_params(S, J(1, 3, 0.25))
wh = optimize(S)
wh

w = (wh, 1-sum(wh))
rp_yearly = (wh, 1-sum(wh)) * st_matrix("returns")' * 252
variancep = (w * st_matrix("variance") * w') * 252
std = sqrt(variancep[1, 1])
sharpe = (rp_yearly - 0.04) / std
"权重分别为:"
w

"此时的组合收益为:"
rp_yearly

"此时的组合标准差为:"
std

"此时组合的夏普比率为:"
sharpe
end

这里我的$w123$ 就是类似于简单示例中的 $x$,是个三维的行向量。上面代码的运行结果是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Iteration 0:   f(p) =  .40753973  (not concave)
Iteration 1: f(p) = 1.7935652 (not concave)
Iteration 2: f(p) = 2.4598186
Iteration 3: f(p) = 2.7009405
Iteration 4: f(p) = 2.7899491
Iteration 5: f(p) = 2.7955035
Iteration 6: f(p) = 2.7955184
Iteration 7: f(p) = 2.7955184

权重分别为:
1 2 3 4
+-------------------------------------------------------------+
1 | .8653939127 1.508855624 -.5683459377 -.8059035992 |
+-------------------------------------------------------------+

此时的组合收益为:
1.764668396
此时的组合标准差为:
.6169404465
此时组合的夏普比率为:
2.795518443

这就是允许卖空时夏普比率最大的组合。

再考虑不允许卖空的情形,为了保证权重为正且总和为1,我借用了 $e^x$函数的特点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
/* 不允许卖空 */
use returndata, clear
drop in 1

/* 计算每只股票对数收益率的均值 */
local j = 1
foreach i of varlist _all{
if "`i'" != "date"{
qui sum `i'
local r`j' = r(mean)
local j = `j' + 1
}
}
di "`r1', `r2', `r3', `r4'"
mat returns = (`r1', `r2', `r3', `r4')


/* 计算对数收益率的协方差矩阵 */
corr 银泰资源 大悦城 中国天楹 通化金马, cov
mat variance = r(C)


mata:
mata clear
void sharpe_ratio_no_short(real scalar todo, real vector w1234, sharpe, g, H)
{

real vector rmean
rmean = st_matrix("returns")
real scalar rp
real vector w
w = (exp(w1234[1]), exp(w1234[2]), exp(w1234[3]), exp(w1234[4])) / sum((exp(w1234[1]), exp(w1234[2]), exp(w1234[3]), exp(w1234[4])))
rp = w * rmean'
rp = rp * 252
real matrix variancep
variancep = st_matrix("variance")
variancep = w * variancep * w'
variancep = variancep * 252
real scalar std
std = sqrt(variancep[1, 1])
sharpe = (rp - 0.04) / std
}

S = optimize_init()
optimize_init_evaluator(S, &sharpe_ratio_no_short())
optimize_init_technique(S, "nr")
optimize_init_params(S, J(1, 4, 0.01))
wh = optimize(S)
wh

w = (exp(wh[1]), exp(wh[2]), exp(wh[3]), exp(wh[4])) / sum((exp(wh[1]), exp(wh[2]), exp(wh[3]), exp(wh[4])))
rp_yearly = ((exp(wh[1]), exp(wh[2]), exp(wh[3]), exp(wh[4])) / sum((exp(wh[1]), exp(wh[2]), exp(wh[3]), exp(wh[4])))) * st_matrix("returns")' * 252
variancep = (w * st_matrix("variance") * w') * 252
std = sqrt(variancep[1, 1])
sharpe = (rp_yearly - 0.04) / std
"权重分别为:"
w

"此时的组合收益为:"
rp_yearly

"此时的组合标准差为:"
std

"此时组合的夏普比率为:"
sharpe
end
`

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Iteration 0:   f(p) =  .40753973  (not concave)
Iteration 1: f(p) = 1.515826 (not concave)
Iteration 2: f(p) = 1.618913 (not concave)
Iteration 3: f(p) = 1.7170518 (not concave)
Iteration 4: f(p) = 1.7523951 (not concave)
Iteration 5: f(p) = 1.7614576 (not concave)
Iteration 6: f(p) = 1.7657154 (not concave)
Iteration 7: f(p) = 1.7662434 (not concave)
Iteration 8: f(p) = 1.7663534 (not concave)
Iteration 9: f(p) = 1.7663884 (not concave)
Iteration 10: f(p) = 1.7663981
Iteration 11: f(p) = 1.7664021 (not concave)
Iteration 12: f(p) = 1.7664028
Iteration 13: f(p) = 1.766403 (not concave)
Iteration 14: f(p) = 1.766403 (not concave)
Iteration 15: f(p) = 1.766403 (not concave)
numerical derivatives are approximate
nearby values are missing
Iteration 16: f(p) = 1.766403 (not concave)
numerical derivatives are approximate
nearby values are missing
numerical derivatives are approximate
nearby values are missing
Iteration 17: f(p) = 1.766403
权重分别为:
1 2 3 4
+---------------------------------------------------------+
1 | .5094358622 .4905641378 1.72079e-16 6.59778e-14 |
+---------------------------------------------------------+
此时的组合收益为:
.6179797997
此时的组合标准差为:
.3272072048
此时组合的夏普比率为:
1.766403035

夏普比率函数

我还写了个可以计算夏普比率的函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/* 夏普比率函数 */
use returndata, clear
drop in 1

/* 计算每只股票对数收益率的均值 */
local j = 1
foreach i of varlist _all{
if "`i'" != "date"{
qui sum `i'
local r`j' = r(mean)
local j = `j' + 1
}
}
di "`r1', `r2', `r3', `r4'"
mat returns = (`r1', `r2', `r3', `r4')


/* 计算对数收益率的协方差矩阵 */
corr 银泰资源 大悦城 中国天楹 通化金马, cov
mat variance = r(C)
mata:
mata clear
real scalar sharpe_ratio(real scalar w1, real scalar w2, real scalar w3)
{
real scalar w4
w4 = 1 - w1 - w2 - w3
real colvector w
w = (w1 \ w2 \ w3 \ w4)
real matrix r
real colvector rmean
rmean = st_matrix("returns")
real scalar rp
rp = w' * rmean'
rp = rp * 252
real matrix variancep
variancep = st_matrix("variance")
variancep = w' * variancep * w
variancep = variancep * 252
real scalar std
std = sqrt(variancep[1, 1])
real scalar sharpe
sharpe = (rp - 0.04) / std
return(sharpe)
}
end

mata: sharpe_ratio(0.5, 0.5, 0)

结果:

1
2
. mata: sharpe_ratio(0.5, 0.5, 0)
1.766177785

和上面最优化运行得到的结果是一致的,也就是说,在不允许卖空的时候,夏普比率最大的组合是买入一半银泰资源和一半的大悦城,剩下两只股票不买入。

下面我们再用蒙特卡洛模拟观测不允许卖空时,所有可能的组合的收益和标准差的关系:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
cap prog drop front
prog def front, rclass
version 15.1
use returndata, clear
local w1 = runiform()
local w2 = runiform()
local w3 = runiform()
local w4 = runiform()
mat weight = (`w1' \ `w2' \ `w3' \ `w4')
mat list weight
mat weight = weight / (`w1'+`w2'+`w3'+`w4')
mat list weight
ret scalar w1 = weight[1, 1]
ret scalar w2 = weight[2, 1]
ret scalar w3 = weight[3, 1]
ret scalar w4 = weight[4, 1]
local j = 1
foreach i of varlist _all{
if "`i'" != "date"{
qui sum `i'
local r`j' = r(mean)
local j = `j' + 1
}
}
mat returns = (`r1', `r2', `r3', `r4')
mat list returns
mat a = returns * weight * 252
mat list a
ret scalar ret = a[1, 1]
corr 银泰资源 大悦城 中国天楹 通化金马, cov
ret list
mat variance = r(C)
mat list variance
mat b = weight' * variance * 252 * weight
mat list b
ret scalar var = b[1, 1]
ret scalar std = sqrt(b[1, 1])
end

simulate ret = r(ret) var = r(var) std = r(std) w1 = r(w1) w2 = r(w2) w3 = r(w3) w4 = r(w4), reps(100000): front
* 计算夏普比率
gen sharpe_ratio = (ret - 0.04)/std
save montecarlo, replace

绘图展示:

1
2
3
4
5
6
7
8
use montecarlo, clear
tw ///
sc ret std, msize(*0.01) xti(波动率) yti(收益率) msymbol(o) ///
xla(#6, format(%6.2f)) yla(, format(%6.1f)) ///
ti("图:投资组合收益率与波动率的关系") || ///
scatteri 1.764668396 0.6169404465 "允许卖空", mc("red") msymbol(o) || ///
scatteri 0.6179797997 0.3272072048 "不允许卖空", mc("green") msymbol(o) ///
mlabp(2)

显然,卖空即提高了收益又增加了风险。

最小化方差

最小化方差也就是最小化标准差,显然就是上图中最右边的那个点了。下面用程序寻找这个点对于的组合构建方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
use returndata, clear
drop in 1

/* 计算每只股票对数收益率的均值 */
local j = 1
foreach i of varlist _all{
if "`i'" != "date"{
qui sum `i'
local r`j' = r(mean)
local j = `j' + 1
}
}
di "`r1', `r2', `r3', `r4'"
mat returns = (`r1', `r2', `r3', `r4')


/* 计算对数收益率的协方差矩阵 */
corr 银泰资源 大悦城 中国天楹 通化金马, cov
mat variance = r(C)

/* 允许卖空 */
mata:
mata clear
void min_std(real scalar todo, real vector w123, std, g, H)
{
real scalar w4
w4 = 1 - sum(w123)
real vector rmean
rmean = st_matrix("returns")
real scalar rp
real vector w
w = (w123, w4)
rp = w * rmean'
rp = rp * 252
real matrix variancep
variancep = st_matrix("variance")
variancep = w * variancep * w'
variancep = variancep * 252
std = -sqrt(variancep[1, 1])
}

S = optimize_init()
optimize_init_evaluator(S, &min_std())
optimize_init_technique(S, "nr")
optimize_init_params(S, J(1, 3, 0.25))
wh = optimize(S)
wh

w = (wh, 1-sum(wh))
rp_yearly = (wh, 1-sum(wh)) * st_matrix("returns")' * 252
variancep = (w * st_matrix("variance") * w') * 252
std = sqrt(variancep[1, 1])
sharpe = (rp_yearly - 0.04) / std
"权重分别为:"
w

"此时的组合收益为:"
rp_yearly

"此时的组合标准差为:"
std

"此时组合的夏普比率为:"
sharpe
end

注意上面的代码里面,我在 std = -sqrt(variancep[1, 1]) 里面加了个负号,这样最大化 std 的时候就最小化标准差了。

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
Iteration 0:   f(p) = -.33170283  
Iteration 1: f(p) = -.30015054
Iteration 2: f(p) = -.29815559
Iteration 3: f(p) = -.29815517
Iteration 4: f(p) = -.29815517
权重分别为:
此时的组合收益为:
.4428133874
此时的组合标准差为:
.2981551745
此时组合的夏普比率为:
1.351019274

这个点在这里:

1
2
3
4
5
6
7
8
9
10
11
use montecarlo, clear
tw ///
sc ret std, msize(*0.01) xti(波动率) yti(收益率) msymbol(o) ///
xla(#6, format(%6.2f)) yla(, format(%6.1f)) ///
ti("图:投资组合收益率与波动率的关系") || ///
scatteri 1.764668396 0.6169404465 "允许卖空", ///
mc("red") msymbol(o) mlabc("red") || ///
scatteri 0.6179797997 0.3272072048 "不允许卖空", ///
mc("green") msymbol(o) mlabc("green") mlabp(12) || ///
scatteri 0.4428133874 0.2981551745 "最小方差点", ///
mc("pink") msymbol(o) mlabc("pink")

构造有效前沿

有效前沿是一条线,其上的点表示既定收益率下方差最小的投资组合。显然就是最小方差点到最大夏普比率点的边界线。从图中可以看出从最小方差到允许卖空的最大夏普比率点的收益率范围为:[0.44, 1.77],以 0.01 为间隔,计算每个收益率水平上的最小方差。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
use returndata, clear
drop in 1
local j = 1
foreach i of varlist _all{
if "`i'" != "date"{
qui sum `i'
local r`j' = r(mean)
local j = `j' + 1
}
}
di "`r1', `r2', `r3', `r4'"
mat returns = (`r1', `r2', `r3', `r4')
corr 银泰资源 大悦城 中国天楹 通化金马, cov
mat variance = r(C)
/* 允许卖空 */
mata:
mata clear
void min_std(real scalar todo, real vector w1234, std, g, H)
{
real vector rmean
rmean = st_matrix("returns")
real scalar rp
real vector w
w = w1234
rp = w * rmean'
rp = rp * 252
real matrix variancep
variancep = st_matrix("variance")
variancep = w * variancep * w'
variancep = variancep * 252
std = -sqrt(variancep[1, 1])
}
std_vector = J(1, 134, 0)
j = 1
for(i = 0.44; i < 1.78; i = i + 0.01)
{
S = optimize_init()
optimize_init_evaluator(S, &min_std())
optimize_init_technique(S, "nr")
optimize_init_constraints(S, ((st_matrix("returns") * 252 \ J(1, 4, 1)), (i \ 1)))
optimize_init_params(S, J(1, 4, 0.25))
wh = optimize(S)
wh

w = wh
variancep = (w * st_matrix("variance") * w') * 252
std = sqrt(variancep[1, 1])
std_vector[1, j] = std
j++
}
st_matrix("std_vector", std_vector)
end
clear
set obs 134
gen ret = (_n + 43) / 100
gen std = .
forval i = 1/134{
replace std = std_vector[1, `i'] in `i'
}
gen front = 1
save front, replace

同样我们把刚刚得到的134个点绘图展示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
use montecarlo, clear
append using front
tw ///
sc ret std, msize(*0.01) xti(波动率) yti(收益率) msymbol(o) ///
xla(#6, format(%6.2f)) yla(, format(%6.1f)) ///
ti("图:投资组合收益率与波动率的关系") || ///
scatteri 1.764668396 0.6169404465 "允许卖空", ///
mc("red") msymbol(o) mlabc("red") || ///
scatteri 0.6179797997 0.3272072048 "不允许卖空", ///
mc("green") msymbol(o) mlabc("green") mlabp(2) || ///
scatteri 0.4428133874 0.2981551745 "最小方差点", ///
mc("pink") msymbol(o) mlabc("pink") || ///
sc ret std if front == 1, msize(tiny) msymbol(D) ///
mc("dkorange") leg(off)

买股票至少也得买在有效前沿上呀!





点击下载本文的相关代码


封面图:unsplash-logoKeytion

# Stata

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×