使用R构造有效前沿

使用R构造有效前沿

本文是学习A Gentle Introduction to Finance using R: Efficient Frontier and CAPM – Part 1的笔记。原文中作者讲述了如何使用R构造资产组合的有效前沿和CAPM模型。

准备

首先准备好本文用到的数据(如果你无法从雅虎金融上下载数据可能会使用到):
fin_data.csv
mult_assets.csv

作为示例,作者搜集了道琼斯工业指数中的三只有名的股票:IBM(IBM)、Google/Alphabet(GOOG)和JP Morgan(JPM)。

股价走势

首先观察三只股票的股价走势:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
setwd("~/Desktop/使用R构造有效前沿和CAPM")
library(data.table)
library(scales)
library(ggplot2)
library(quantmod)
library(gridExtra)
library(magrittr)
theme_set(theme_bw(base_size = 15,
base_family = "STSong") +
theme(plot.title = element_text(hjust = 0.5, size = 20)))

# 从雅虎金融下载数据的函数
getData <- function(tickers = c("IBM", "GOOG", "JPM"), long = T){
res <- lapply(tickers, function(x){
dat <- getSymbols(x, from = "2000-01-01", auto.assign = F)
dt <- data.table(date = as.Date(index(dat)), ticker = x, price = as.numeric(Ad(dat)))
return(dt)
})
res <- rbindlist(res)
if(!long){
res <- dcast(res, date ~ ticker)
}
return(res)
}
# 把日度数据变成月度数据
dt <- getData()
dt[, month_id := paste(month(date), year(date), sep = "-")]
dt[, ':=' (mt_price = mean(price),
mt_date = min(date)), by = c("ticker", "month_id")]
dt <- dt[, .(date = mt_date, ticker, price = mt_price)]
dt <- unique(dt)

# 如果你无法下载雅虎金融的数据,可以读取csv文件里面的数据
# dt <- fread("fin_data.csv")
dt[, date := as.Date(date)]
# 把基期设定为第一个月,所有的价格都标准化成第一期的倍数
dt[, idx_price := price/price[1], by = ticker]
ggplot(dt, aes(date, idx_price, colour = ticker)) +
geom_line() +
labs(title = "图:IBM、GOOG和JPM价格走势比较",
x = "日期", y = "价格指数(2000年1月为基期)") +
scale_color_discrete(name = "公司")

收益与风险的计算

从图中可以看出,谷歌的表现显然超过了另外两家公司,但是谷歌的股价明显波动更大,这就是风险-收益权衡 的原则,是CAPM的基础,通常使用收益率的标准差表示波动率即风险。下面计算三只股票的平均月度收益率和平均波动率:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
dt[, ret := price/shift(price, 1) - 1, by = ticker]
tab <- dt[!is.na(ret), .(ticker, ret)]
tab <- tab[, .(er = round(mean(ret), 5),
sd = round(sd(ret), 5)), by = "ticker"]

tab

# 结果:
> tab
ticker er sd
1: IBM 0.00383 0.05338
2: GOOG 0.02087 0.07466
3: JPM 0.00852 0.07078

更加直觉的办法是把这个表格画出来:

R
1
2
3
4
5
6
7
ggplot(tab, aes(sd, er)) +
geom_point(aes(colour = ticker), size = 4) +
scale_color_hue("股票") +
labs(title = "图:风险和收益之间的关系",
x = "波动率", y = "预期收益") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format())

来自中国股市的例子

在进入下一部分之前我想用中国的股票数据看一下结果,我找了10只股票:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
dtcn <- getData(c("000001.ss", "000002.ss", "600000.ss", "600004.ss", "600007.ss", "600008.ss", "600009.ss", "600010.ss", "600011.ss", "600012.ss"))
dtcn[, month_id := paste(month(date), year(date), sep = "-")]
dtcn[, ':=' (mt_price = mean(price),
mt_date = min(date)), by = c("ticker", "month_id")]
dtcn <- dtcn[, .(date = mt_date, ticker, price = mt_price)]
dtcn <- unique(dtcn)

dtcn[, date := as.Date(date)]
dtcn[, idx_price := price/price[1], by = ticker]
ggplot(dtcn, aes(date, idx_price, colour = ticker)) +
geom_line() +
labs(title = "图:中国十家上市公司的价格走势比较",
x = "日期", y = "价格指数(2000年1月为基期)") +
scale_color_discrete(name = "公司")


R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
dtcn[, ret := price/shift(price, 1) - 1, by = ticker]
tabcn <- dtcn[!is.na(ret), .(ticker, ret)]
tabcn <- tabcn[, .(er = round(mean(ret), 5),
sd = round(sd(ret), 5)), by = "ticker"]
tabcn

# 结果
> tabcn
ticker er sd
1: 000001.ss 0.00414 0.06551
2: 000002.ss 0.00293 0.06166
3: 600000.ss 0.01285 0.08583
4: 600004.ss 0.01093 0.07634
5: 600007.ss 0.00865 0.08703
6: 600008.ss 0.00723 0.11279
7: 600009.ss 0.01665 0.07818
8: 600010.ss 0.01150 0.10202
9: 600011.ss 0.00523 0.08201
10: 600012.ss 0.00916 0.09153

同样画在 图上:

R
1
2
3
4
5
6
7
ggplot(tabcn, aes(sd, er)) +
geom_point(aes(colour = ticker), size = 4) +
scale_color_hue("股票") +
labs(title = "图:风险和收益之间的关系",
x = "波动率", y = "预期收益") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format())

计算投资组合的风险收益权衡

两种资产

考虑两种资产,则投资组合的预期回报为:
$$
\hat{r_p} = \omega_x\hat{r_x} + \omega_y\hat{r_y}
$$
预期波动率为:
$$
\sigma_p = \sqrt{ω^2_xσ^2_x + ω^2_yσ^2_y + 2ω_xω_yσ_{x,y}}
$$

下面模拟一些资产的回报,首先需要编写一个根据一个向量生成一个与之相关系数为r、均值为y_mean, 标准差为y_sd的向量y的函数。这个函数的原理是这样的:
首先将$x$中心化生成$x_2$:
$$
x_2 = \frac{x - mean(x)}{sd(x)}
$$

那么这个时候$x_2$就是一个均值为0,标准差为1的随机变量了,然后为了生成一个与之相关系数为r的随机变量y,y的形式一定是下面这样的:
$$
y = r \times x_2 + e
$$

我们可以先生成一个均值为0,标准差为1的y,所以y的标准差为:

$$
σ_y = r^2 + σ^2_e = 1
$$

那么为了让上式成立,$σ^2_e = 1-r^2$,所以需要生成一个均值为0,标准差为$\sqrt{1-r^2}$的随机变量(正态性可以保证它不与x相关)。

最后再把y的标准差乘上、均值加上就生成了符合要求的变量y。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
rmultvar <- function(x, r, y_mean, y_sd){
# x是一个向量,就是根据这个向量创建与之相关系数为r、均值为y_mean, 标准差为y_sd的向量y的
x2 <- (x - mean(x))/sd(x)
r2 <- r^2
ve <- 1 - r2
SD <- sqrt(ve)
e <- rnorm(length(x2), mean = 0, sd = SD)
y <- r*x2 + e
y <- (y - mean(y)) * y_sd + y_mean
return(y)
}

# 然后就可以模拟一个有三个资产x、y和z的数据集了:
set.seed(12345)
df <- data.table(x = rnorm(10000, mean = 0.07, sd = 0.05))
y_mean <- 0.03
y_sd <- 0.02

z_mean <- 0.04
z_sd <- 0.03

df[, y := rmultvar(x, r = 0, y_mean, y_sd)]
df[, z := rmultvar(x, r = 0, z_mean, z_sd)]

计算x资产和y资产的预期收益、预期波动率以及相关系数:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 计算三种资产的预期收益率和标准差
df_table <- melt(df)[, .(mean = mean(value), sd = sd(value)), by = variable]

# 计算x资产和y资产的预期收益和预期波动率以及相关系数
er_x <- mean(df$x)
er_y <- mean(df$y)
sd_x <- sd(df$x)
sd_y <- sd(df$y)
cov_xy <- cov(df$x, df$y)

# 创建10000个权重
two_assets_seq <- seq(0, 1, length.out = 10000)
two <- data.table(wx = two_assets_seq,
wy = 1 - two_assets_seq)
two[, ':=' (er_p = wx*er_x + wy*er_y,
sd_p = sqrt(
wx^2 * sd_x^2 +
wy^2 * sd_y^2 +
2 * wx * (1 - wx) * cov_xy
))]
plot_two <- ggplot() +
geom_point(data = two, aes(x = sd_p,
y = er_p,
colour = wx)) +
geom_point(data = df_table[variable != "z"],
aes(x = sd, y = mean),
colour = "cyan", size = 3, shape = 18) +
labs(title = "图:两资产可能性边界",
x = "波动率", y = "预期收益率") +
scale_y_continuous(labels = percent_format(), limits = c(0, max(two$er_p) * 1.2)) +
scale_x_continuous(labels = percent_format(), limits = c(0, max(two$sd_p) * 1.2)) +
scale_color_continuous(name = expression(omega[x]), labels = percent_format())
plot_two

实际上我觉得把夏普比例映射到颜色上效果更好。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 设定无风险利率为rf = 0.025
rf <- 0.025
ggplot() +
geom_point(data = two, aes(x = sd_p,
y = er_p,
colour = (er_p - 0.025)/sd_p)) +
geom_point(data = df_table[variable != "z"],
aes(x = sd, y = mean),
colour = "cyan", size = 3, shape = 18) +
labs(title = "图:两资产可能性边界",
x = "波动率", y = "预期收益率") +
scale_y_continuous(labels = percent_format(), limits = c(0, max(two$er_p) * 1.2)) +
scale_x_continuous(labels = percent_format(), limits = c(0, max(two$sd_p) * 1.2)) +
scale_color_continuous(name = "夏普比率", labels = percent_format(), low = "green", high ="red")

引入第三个资产

三种资产的情况要更加复杂一点,对于包含三种资产的组合:

$$
r^2_p = ω_x\hat{r_x} + ω_y\hat{r_y} + ω_z\hat{r_z} \\
σ_p = \sqrt{ω_x^2\hat{r_x}^2 + ω_y^2\hat{r_y}^2 + ω_z^2\hat{r_z}^2 + 2ω_xω_yσ_{x, y} + 2ω_xω_zσ_{x, z} + 2ω_zω_yσ_{z, y}}
$$

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
er_z <- mean(df$z)
sd_z <- sd(df$z)
cov_xz <- cov(df$x, df$z)
cov_yz <- cov(df$y, df$z)

three_assets_seq <- seq(0, 1, length.out = 1000)
three <- data.table(wx = rep(three_assets_seq, each = length(three_assets_seq)),
wy = rep(three_assets_seq, length(three_assets_seq)))
three[, wz := 1 - wx - wy]

three[, ':=' (er_p = wx * er_x + wy * er_y + wz * er_z,
sd_p = sqrt(
wx^2 * sd_x^2 + wy^2 * sd_y^2 + wz^2 * sd_z^2 +
2 * wx * wy * cov_xy +
2 * wz * wy * cov_yz +
2 * wx * wz * cov_xz
))]
# 设定不允许卖空
three <- three[wx >= 0 & wy >= 0 & wz >= 0]

plot_three <- ggplot() +
geom_point(data = three, aes(sd_p,
er_p,
colour = (er_p - 0.025)/sd_p)) +
geom_point(data = df_table, aes(sd, mean), colour = "pink",
size = 3, shape = 18) +
labs(title = "图:三资产可能性边界",
x = "波动率", y = "预期收益率") +
scale_y_continuous(labels = percent_format(),
limits = c(0, max(three$er_p) * 1.2)) +
scale_x_continuous(labels = percent_format(),
limits = c(0, max(three$sd_p) * 1.2)) +
scale_color_gradientn(colours = c("red", "blue", "yellow"),
name = "夏普比率")

plot_three

计算有效前沿

因为我们想要的有效是指给定风险水平的最大收益,所以我们需要输入风险水平,得到收益率。即:
$$
\hat{r_{e,f}}(σ) = \frac{β}{α} + \sqrt{⁠(\frac{β}{α})^2 - \frac{γ-δ*σ^2}{α}} \\
α = 1^Ts^{-1}1 \\
s 为投资组合的方差-协方差矩阵 \\
β = 1^Ts^{-1}\overline{ret} \\
\overline{ret}表示每种股票的平均回报率向量 \\
γ = \overline{ret}^Ts^{-1}\overline{ret} \\
δ = αγ - β^2
$$

由此可以看出,唯一的输入是股票的回报,在R中我们可以编写一个简单的函数(calcEFParams)来计算参数并返回参数列表。再编写另外一个函数(calcEFValues)用于根据这些参数计算有效前沿的值。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
calcEFParams <- function(rets){
retbar <- colMeans(rets, na.rm = T)
covs <- var(rets, na.rm = T)
invS <- solve(covs)
i <- matrix(1, nrow = length(retbar))
alpha <- t(i) %*% invS %*% i
beta <- t(i) %*% invS %*% retbar
gamma <- t(retbar) %*% invS %*% retbar
delta <- alpha * gamma - beta * beta
retlist <- list(
alpha = as.numeric(alpha),
beta = as.numeric(beta),
gamma = as.numeric(gamma),
delta = as.numeric(delta)
)
return(retlist)
}
(abcds <- calcEFParams(df))

# 结果
> (abcds <- calcEFParams(df))
$alpha
[1] 4037.551

$beta
[1] 147.8334

$gamma
[1] 5.992395

$delta
[1] 2339.881

然后是计算有效前沿的函数,abcd参数就是上个函数返回的list:

R
1
2
3
4
5
6
7
8
9
10
11
12
calcEFValues <- function(x, abcd, upper = T){
alpha <- abcd$alpha
beta <- abcd$beta
gamma <- abcd$gamma
delta <- abcd$delta
if(upper){
retval <- beta / alpha + sqrt((beta / alpha)^2 - (gamma - delta * x^2) / alpha)
} else{
retval <- beta / alpha - sqrt((beta / alpha)^2 - (gamma - delta * x^2) / alpha)
}
return(retval)
}

绘制有效前沿:

R
1
2
3
4
5
6
7
8
9
10
11
# 绘制有效前沿
ggplot(df_table, aes(sd, mean)) +
geom_point(size = 4, colour = "red", shape = 18) +
stat_function(fun = calcEFValues, args = list(abcd = abcds), n = 10000, colour = "red", size = 1) +
stat_function(fun = calcEFValues, args = list(abcd = abcds, upper = F), n = 10000, colour = "blue", size = 1) +
labs(title = "图:允许卖空情况下的有效前沿",
x = "波动率", y = "预期收益率") +
scale_y_continuous(label = percent_format(),
limits = c(0, max(df_table$mean)*1.2)) +
scale_x_continuous(label = percent_format(),
limits = c(0, max(df_table$sd)*1.2))

红色的曲线为有效前沿,由于允许卖空,所以曲线向外延伸并且有效前沿并不一定会穿过三种资产。

下面再考虑没有卖空的情况,这里使用了tseries包的portfolio.optim-package函数。

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 不允许卖空的情形
library(tseries)
er_vals <- seq(min(df_table$mean), max(df_table$mean), length.out = 1000)
# 寻找每个可能收益下的最优组合,而组合的收益一定位于其各成分最小收益和最大收益之间
sd_vals <- sapply(er_vals, function(er){
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})

plot_dt <- data.table(sd = sd_vals, er = er_vals)

# 上半部分才是有效前沿
minsd <- min(plot_dt$sd)
minsd_er <- plot_dt[sd == minsd, er]
plot_dt[, efficient := er >= minsd_er]

然后绘图:

R
1
2
3
4
5
6
7
8
9
10
11
ggplot() +
geom_point(data = plot_dt[efficient == F],
aes(sd, er), size = 0.5, colour = "blue") +
geom_point(data = plot_dt[efficient == T],
aes(sd, er), size = 0.5, colour = "red") +
geom_point(data = df_table, aes(sd, mean),
size = 4, colour = "red", shape = 18) +
labs(title = "图:不允许卖空时的有效前沿",
x = "波动率", y = "预期收益率") +
scale_y_continuous(label = percent_format(), limits = c(0, max(df_table$mean) * 1.2)) +
scale_x_continuous(label = percent_format(), limits = c(0, max(df_table$sd) * 1.2))

要直接比较两幅图可以使用下面的代码:

R
1
2
3
4
5
6
7
8
9
ggplot() +
geom_line(data = pdat, aes(sd, er, colour = type, linetype = efficient), size = 1) +
geom_point(data = df_table, aes(sd, mean), size = 4, colour = "red", shape = 18) +
labs(title = "图:有效前沿对比",
x = "波动率", y = "预期收益率") +
scale_y_continuous(label = percent, limits = c(0, max(df_table$mean) * 1.2)) +
scale_x_continuous(label = percent, limits = c(0, max(df_table$sd) * 1.2)) +
scale_color_manual(name = "是否允许卖空", values = c("red", "blue"), labels = c("允许", "不允许")) +
scale_linetype_manual(name = "是否为有效前沿", values = c(2, 1))

最后再回头画相关系数对资产组合可能性边界的影响的图:

R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# 相关系数对有效前沿的影响
plotCombinations <- function(df, r){
df_table <- melt(df)[, .(mean = mean(value), sd = sd(value)), by = variable]
er_vals <- seq(min(df_table$mean), max(df_table$mean), length.out = 1000)
sd_vals <- sapply(er_vals, function(er){
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
plot_dt <- data.table(sd = sd_vals, er = er_vals)
minsd <- min(plot_dt$sd)
minsd_er <- plot_dt[sd == minsd, er]
plot_dt[, efficient := er >= minsd_er]

ggplot() +
geom_point(data = plot_dt[efficient == F],
aes(sd, er), size = 0.5, colour = "blue") +
geom_point(data = plot_dt[efficient == T],
aes(sd, er), size = 0.5, colour = "red") +
geom_point(data = df_table, aes(sd, mean),
size = 4, colour = "red", shape = 18) +
labs(title = paste("相关系数为", r),
x = "波动率", y = "预期收益率") +
scale_y_continuous(label = percent_format(), limits = c(0, max(df_table$mean) * 1.2)) +
scale_x_continuous(label = percent_format(), limits = c(0, max(df_table$sd) * 1.2))
}

set.seed(13452)
df1 <- data.table(x = rnorm(10000, mean = 0.05, sd = 0.04))
y_mean <- 0.03
y_sd <- 0.02
df1[, y1 := rmultvar(x, r = 0.99, y_mean, y_sd)]
df1[, y2 := rmultvar(x, r = 0.5, y_mean, y_sd)]
df1[, y3 := rmultvar(x, r = 0, y_mean, y_sd)]
df1[, y4 := rmultvar(x, r = -0.5, y_mean, y_sd)]
df1[, y5 := rmultvar(x, r = -0.75, y_mean, y_sd)]
df1[, y6 := rmultvar(x, r = -0.98, y_mean, y_sd)]

p1 <- plotCombinations(df = df1[, .(x, y1)], r = 1)
p2 <- plotCombinations(df = df1[, .(x, y2)], r = 0.5)
p3 <- plotCombinations(df = df1[, .(x, y3)], r = 0)
p4 <- plotCombinations(df = df1[, .(x, y4)], r = -0.5)
p5 <- plotCombinations(df = df1[, .(x, y5)], r = -0.75)
p6 <- plotCombinations(df = df1[, .(x, y6)], r = -1)

p_all <- grid.arrange(p1, p2, p3, p4, p5, p6)

需要注意的是如果最优求解失败,可以改改种子或者相关系数,再次运行。

# R

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了659.4k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×