sweep:时间序列预测工具包

sweep:时间序列预测工具包

sweep 包是帮助预测的 tidy 工具包。本文是学习:Introduction to sweepForecasting Time Series Groups in the tidyverseForecasting Using Multiple Models 三篇小品文的笔记。

sweep 导论

获取数据

1
2
3
4
library(forecast)
library(tidyquant)
library(timetk)
library(sweep)

预测酒精的销售量

首先从 FRED 获取美国酒精销售量的数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
alcohol_sales_tbl <- tq_get("S4248SM144NCEN", get = "economic.data", from = "1992-01-01", to = "2019-02-01")
alcohol_sales_tbl

#> # A tibble: 326 x 2
#> date price
#> <date> <int>
#> 1 1992-01-01 3459
#> 2 1992-02-01 3458
#> 3 1992-03-01 4002
#> 4 1992-04-01 4564
#> 5 1992-05-01 4221
#> 6 1992-06-01 4529
#> 7 1992-07-01 4466
#> 8 1992-08-01 4137
#> 9 1992-09-01 4126
#> 10 1992-10-01 4259
#> # … with 316 more rows
1
2
3
4
5
6
7
8
9
alcohol_sales_tbl %>%
ggplot(aes(x = date, y = price)) +
geom_line(size = 1, color = palette_light()[1]) +
geom_smooth(method = "loess") +
labs(title = "US Alcohol Sales: Monthly",
x = "", y = "Millions") +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
theme_tq(base_family = enfont)

预测工作流程

  1. 强制转换为 ts 对象类
1
2
3
4
5
6
7
8
alcohol_sales_ts <- tk_ts(alcohol_sales_tbl, start = 1992, freq = 12, silent = TRUE)
alcohol_sales_ts
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 1992 3459 3458 4002 4564 4221 4529 4466 4137 4126 4259 4240 4936
#> 1993 3031 3261 4160 4377 4307 4696 4458 4457 4364 4236 4500 4974
#> ······
#> 2018 9564 10415 12683 11919 14138 14583 12640 14257 12396 13914 14174 15504
#> 2019 10768 11147

一个显著的好处是生成的 ts 对象维护着一个 timetk 索引,这会帮助预测,我们可以使用 has_timetk_idx() 进行验证:

1
2
has_timetk_idx(alcohol_sales_ts)
#> [1] TRUE
  1. 对时间序列进行建模

使用 forecast 包中的 ets() 函数获得指数平滑 ETS。这里 sweep 可以帮助评估模型:

1
2
fit_ets <- alcohol_sales_ts %>%
ets()

sw_tidy():返回模型的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sw_tidy(fit_ets)
#> # A tibble: 17 x 2
#> term estimate
#> <chr> <dbl>
#> 1 alpha 0.122
#> 2 beta 0.0154
#> 3 gamma 0.000134
#> 4 phi 0.979
#> 5 l 4199.
#> 6 b 3.43
#> 7 s0 1.16
#> 8 s1 1.04
#> 9 s2 1.04
#> 10 s3 0.982
#> 11 s4 1.06
#> 12 s5 1.01
#> 13 s6 1.11
#> 14 s7 1.06
#> 15 s8 0.973
#> 16 s9 0.976
#> 17 s10 0.827

sw_glance():返回模型质量的参数

1
2
3
4
5
sw_glance(fit_ets)
#> # A tibble: 1 x 12
#> model.desc sigma logLik AIC BIC ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ETS(M,Ad,M) 0.0455 -2828. 5692. 5760. 44.4 364. 286. 0.390 3.67 0.676 -0.326

sw_augment():返回实际值、拟合值和残差值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
augment_fit_ets <- sw_augment(fit_ets)
augment_fit_ets
#> # A tibble: 326 x 4
#> index .actual .fitted .resid
#> <S3: yearmon> <dbl> <dbl> <dbl>
#> 1 1 1992 3459 3237. 0.0686
#> 2 2 1992 3458 3509. -0.0145
#> 3 3 1992 4002 4143. -0.0340
#> 4 4 1992 4564 4115. 0.109
#> 5 5 1992 4221 4571. -0.0767
#> 6 6 1992 4529 4738. -0.0441
#> 7 7 1992 4466 4276. 0.0445
#> 8 8 1992 4137 4531. -0.0870
#> 9 9 1992 4126 4151. -0.00598
#> 10 10 1992 4259 4385. -0.0287
#> # … with 316 more rows

检查残差:

1
2
3
4
5
6
7
8
augment_fit_ets %>%
ggplot(aes(x = index, y = .resid)) +
geom_hline(yintercept = 0, color = "grey40") +
geom_point(color = palette_light()[[1]], alpha = 0.5) +
geom_smooth(method = "loess") +
scale_x_yearmon(n = 10) +
labs(title = "US Alcohol Sales: ETS Residuals", x= "") +
theme_tq(base_family = enfont)

sw_tidy_decomp():返回 ETS 模型的分解:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
decomp_fit_ets <- sw_tidy_decomp(fit_ets)
decomp_fit_ets
#> # A tibble: 327 x 5
#> index observed level slope season
#> <S3: yearmon> <dbl> <dbl> <dbl> <dbl>
#> 1 12 1991 NA 4199. 3.43 1.16
#> 2 1 1992 3459 4238. 7.80 0.770
#> 3 2 1992 3458 4238. 6.68 0.827
#> 4 3 1992 4002 4227. 4.31 0.976
#> 5 4 1992 4564 4287. 11.3 0.973
#> 6 5 1992 4221 4258. 6.01 1.06
#> 7 6 1992 4529 4241. 2.98 1.11
#> 8 7 1992 4466 4267. 5.83 1.01
#> 9 8 1992 4137 4227. -0.0207 1.06
#> 10 9 1992 4126 4224. -0.410 0.982
#> # … with 317 more rows

绘制模型分解图:

1
2
3
4
5
6
7
8
9
10
11
decomp_fit_ets %>%
gather(key = key, value = value, -index) %>%
mutate(key = forcats::as_factor(key)) %>%
ggplot(aes(x = index, y = value, group = key)) +
geom_line(color = palette_light()[[1]]) +
geom_ma(ma_fun = SMA, n = 12, size = 1, color = palette_light()[[2]]) +
facet_wrap(~ key, scales = "free_y") +
scale_x_yearmon(n = 10) +
labs(title = "US Alcohol Sales:: ETS Decomposition", x = "") +
theme_tq(base_family = enfont) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

  1. 预测模型

接下来,我们使用 forecast() 函数预测 ETS 模型,返回的 forecast 对象不是 tidy 格式,这就是 sw_sweep() 函数有帮助的地方:

1
2
3
4
5
6
7
8
9
10
11
12
13
fcast_ets  <- fit_ets %>%
forecast(h = 100)
sw_sweep(fcast_ets) %>%
ggplot(aes(x = index, y = price, color = key)) +
geom_ribbon(aes(ymin = lo.95, ymax = hi.95), fill = "#D5DBFF", color = NA, size = 0) +
geom_ribbon(aes(ymin = lo.80, ymax = hi.80), fill = "#596DD5", color = NA, size = 0, alpha = 0.8) +
geom_line(size = 1) +
labs(title = "US Alcohol Sales, ETS Model Forecast", x = "", y = "Millions", subtitle = "Regular Time Index") +
scale_y_continuous(labels = scales::dollar) +
scale_x_yearmon(n = 12, format = "%Y") +
scale_color_tq() +
scale_fill_tq() +
theme_tq(base_family = enfont)

预测时间序列组

自行车销售数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bike_sales
#> # A tibble: 15,644 x 17
#> order.date order.id order.line quantity price price.ext customer.id bikeshop.name bikeshop.city bikeshop.state
#> <date> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2011-01-07 1 1 1 6070 6070 2 Ithaca Mount… Ithaca NY
#> 2 2011-01-07 1 2 1 5970 5970 2 Ithaca Mount… Ithaca NY
#> 3 2011-01-10 2 1 1 2770 2770 10 Kansas City … Kansas City KS
#> 4 2011-01-10 2 2 1 5970 5970 10 Kansas City … Kansas City KS
#> 5 2011-01-10 3 1 1 10660 10660 6 Louisville R… Louisville KY
#> 6 2011-01-10 3 2 1 3200 3200 6 Louisville R… Louisville KY
#> 7 2011-01-10 3 3 1 12790 12790 6 Louisville R… Louisville KY
#> 8 2011-01-10 3 4 1 5330 5330 6 Louisville R… Louisville KY
#> 9 2011-01-10 3 5 1 1570 1570 6 Louisville R… Louisville KY
#> 10 2011-01-11 4 1 1 4800 4800 22 Ann Arbor Sp… Ann Arbor MI
#> # … with 15,634 more rows, and 7 more variables: latitude <dbl>, longitude <dbl>, product.id <dbl>, model <chr>,
#> # category.primary <chr>, category.secondary <chr>, frame <chr>
1
2
3
4
5
6
7
8
9
10
11
bike_sales_monthly <- bike_sales %>%
mutate(month = month(order.date),
year = year(order.date)) %>%
group_by(year, month) %>%
summarise(total.qty = sum(quantity))
bike_sales_monthly %>%
ggplot(aes(x = month, y = total.qty, group = year)) +
geom_area(aes(fill = year), position = "stack") +
labs(title = "Quantity Sold: Month Plot", x = "", y = "Sales", subtitle = "March through July tend to be most active") +
scale_y_continuous() +
theme_tq(base_family = enfont)

分组预测:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
month_qty_by_cat2 <- bike_sales %>%
mutate(order.month = as_date(as.yearmon(order.date))) %>%
group_by(category.secondary, order.month) %>%
summarise(total.qty = sum(quantity))

month_qty_by_cat2_nest <- month_qty_by_cat2 %>%
group_by(category.secondary) %>%
nest(.key = "data.tbl")
month_qty_by_cat2_nest

#> # A tibble: 9 x 2
#> category.secondary data.tbl
#> <chr> <list>
#> 1 Cross Country Race <tibble [60 × 2]>
#> 2 Cyclocross <tibble [60 × 2]>
#> 3 Elite Road <tibble [60 × 2]>
#> 4 Endurance Road <tibble [60 × 2]>
#> 5 Fat Bike <tibble [58 × 2]>
#> 6 Over Mountain <tibble [60 × 2]>
#> 7 Sport <tibble [60 × 2]>
#> 8 Trail <tibble [60 × 2]>
#> 9 Triathalon <tibble [60 × 2]>

预测工作流程

  1. 转换为 ts 对象
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
monthly_qty_by_cat2_ts <- month_qty_by_cat2_nest %>%
mutate(data.ts = map(
.x = data.tbl,
.f = tk_ts,
select = -order.month,
start = 2011,
freq = 12))
monthly_qty_by_cat2_ts
#> # A tibble: 9 x 3
#> category.secondary data.tbl data.ts
#> <chr> <list> <list>
#> 1 Cross Country Race <tibble [60 × 2]> <S3: ts>
#> 2 Cyclocross <tibble [60 × 2]> <S3: ts>
#> 3 Elite Road <tibble [60 × 2]> <S3: ts>
#> 4 Endurance Road <tibble [60 × 2]> <S3: ts>
#> 5 Fat Bike <tibble [58 × 2]> <S3: ts>
#> 6 Over Mountain <tibble [60 × 2]> <S3: ts>
#> 7 Sport <tibble [60 × 2]> <S3: ts>
#> 8 Trail <tibble [60 × 2]> <S3: ts>
#> 9 Triathalon <tibble [60 × 2]> <S3: ts>
  1. 对时间序列建模
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
monthly_qty_by_cat2_fit <- monthly_qty_by_cat2_ts %>%
mutate(fit.ets = map(
data.ts, ets
))
monthly_qty_by_cat2_fit
#> # A tibble: 9 x 4
#> category.secondary data.tbl data.ts fit.ets
#> <chr> <list> <list> <list>
#> 1 Cross Country Race <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 2 Cyclocross <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 3 Elite Road <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 4 Endurance Road <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 5 Fat Bike <tibble [58 × 2]> <S3: ts> <S3: ets>
#> 6 Over Mountain <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 7 Sport <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 8 Trail <tibble [60 × 2]> <S3: ts> <S3: ets>
#> 9 Triathalon <tibble [60 × 2]> <S3: ts> <S3: ets>

sw_tidy():查看拟合结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
monthly_qty_by_cat2_fit %>%
mutate(tidy = map(
fit.ets, sw_tidy
)) %>%
unnest(tidy, .drop = TRUE) %>%
spread(key = category.secondary, value = estimate)

#> # A tibble: 16 x 10
#> term `Cross Country … Cyclocross `Elite Road`
#> <chr> <dbl> <dbl> <dbl>
#> 1 alpha 0.0398 0.000110 0.0651
#> 2 b NA NA NA
#> 3 beta NA NA NA
#> 4 gamma 0.000101 0.00256 0.000100
#> 5 l 321. 210. 490.
#> 6 s0 0.503 0.0788 0.871
#> 7 s1 1.10 1.32 0.556
#> 8 s10 0.643 0.212 0.266
#> 9 s2 0.375 0.0969 0.617
#> 10 s3 1.12 0.349 1.52
#> 11 s4 0.630 1.34 0.663
#> 12 s5 2.06 2.06 0.545
#> 13 s6 0.873 2.01 1.69
#> 14 s7 1.64 1.38 1.91
#> 15 s8 0.487 2.29 1.27
#> 16 s9 1.41 0.754 1.86
#> # … with 6 more variables: `Endurance Road` <dbl>, `Fat
#> # Bike` <dbl>, `Over Mountain` <dbl>, Sport <dbl>,
#> # Trail <dbl>, Triathalon <dbl>

sw_glance():查看模型拟合的效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
monthly_qty_by_cat2_fit %>%
mutate(glance = map(fit.ets, sw_glance)) %>%
unnest(glance, .drop = TRUE)

#> # A tibble: 9 x 13
#> category.second… model.desc sigma logLik AIC BIC ME
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Cross Country R… ETS(M,N,M) 1.06 -464. 957. 989. 33.1
#> 2 Cyclocross ETS(M,N,M) 1.12 -409. 848. 879. -79.8
#> 3 Elite Road ETS(M,N,M) 0.895 -471. 972. 1004. 24.6
#> 4 Endurance Road ETS(M,N,M) 0.759 -439. 909. 940. -35.9
#> 5 Fat Bike ETS(M,N,M) 2.73 -343. 715. 746. -18.4
#> 6 Over Mountain ETS(M,N,M) 0.910 -423. 877. 908. -35.5
#> 7 Sport ETS(M,N,M) 0.872 -427. 884. 915. -55.1
#> 8 Trail ETS(M,A,M) 0.741 -411. 855. 891. -11.4
#> 9 Triathalon ETS(M,N,M) 1.52 -410. 850. 881. 17.6
#> # … with 6 more variables: RMSE <dbl>, MAE <dbl>, MPE <dbl>,
#> # MAPE <dbl>, MASE <dbl>, ACF1 <dbl>

sw_augment():查看实际值、拟合值和残差:

1
2
3
4
5
augment_fit_ets <- monthly_qty_by_cat2_fit %>%
mutate(augment = map(
fit.ets, sw_augment, timetk_idx = TRUE, rename_index = "date"
)) %>%
unnest(augment, .drop = T)
1
2
3
4
5
6
7
8
9
10
# 观察残差
augment_fit_ets %>%
ggplot(aes(x = date, y = .resid, group = category.secondary)) +
geom_hline(yintercept = 0, color = "grey40") +
geom_line(color = palette_light()[[2]]) +
geom_smooth(method = "loess") +
labs(title = "Bike Quantity Sold By Secondary Category", subtitle = "ETS Model Residuals", x = "") +
theme_tq(base_family = enfont) +
facet_wrap(~ category.secondary, scales = "free_y", ncol = 3) +
scale_x_date(date_labels = "%Y")

分解:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
monthly_qty_by_cat2_fit %>%
mutate(decomp = map(
fit.ets, sw_tidy_decomp,
timetk_idx = TRUE,
rename_index = "date"
)) %>%
unnest(decomp)
#> # A tibble: 538 x 6
#> category.secondary date observed level season slope
#> <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 Cross Country Race 2011-01-01 122 313. 1.16 NA
#> 2 Cross Country Race 2011-02-01 489 331. 0.643 NA
#> 3 Cross Country Race 2011-03-01 505 332. 1.41 NA
#> 4 Cross Country Race 2011-04-01 343 347. 0.487 NA
#> 5 Cross Country Race 2011-05-01 263 339. 1.64 NA
#> 6 Cross Country Race 2011-06-01 735 359. 0.873 NA
#> 7 Cross Country Race 2011-07-01 183 348. 2.06 NA
#> 8 Cross Country Race 2011-08-01 66 339. 0.630 NA
#> 9 Cross Country Race 2011-09-01 97 329. 1.12 NA
#> 10 Cross Country Race 2011-10-01 189 336. 0.375 NA
#> # … with 528 more rows
  1. 模型预测:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
monthly_qty_by_cat2_fcast <- monthly_qty_by_cat2_fit %>%
mutate(fcast.ets = map(
fit.ets, forecast, h = 100
))
monthly_qty_by_cat2_fcast
#> # A tibble: 9 x 5
#> category.secondary data.tbl data.ts fit.ets fcast.ets
#> <chr> <list> <list> <list> <list>
#> 1 Cross Country Race <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 2 Cyclocross <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 3 Elite Road <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 4 Endurance Road <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 5 Fat Bike <tibble [58 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 6 Over Mountain <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 7 Sport <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 8 Trail <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
#> 9 Triathalon <tibble [60 × 2]> <S3: ts> <S3: ets> <S3: forecast>
  1. tidy 预测
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
monthly_qty_by_cat2_fcast_tidy <- monthly_qty_by_cat2_fcast %>%
mutate(
sweep = map(fcast.ets, sw_sweep,
fitted = FALSE, timetk_idx = TRUE)
) %>%
unnest(sweep)
monthly_qty_by_cat2_fcast_tidy %>%
ggplot(aes(x = index, y = total.qty, color = key, group = category.secondary)) +
geom_ribbon(aes(ymin = lo.95, ymax = hi.95), fill = "#D5DBFF", color = NA, size = 0) +
geom_ribbon(aes(ymin = lo.80, ymax = hi.80), fill = "#596DD5", color = NA, size = 0, alpha = 0.8) +
geom_line() +
labs(title = "Bike Quantity ßold By Secondary Category", subtitle = "ETS Model Forecasts", x = "", y = "Units") +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
scale_color_tq() +
scale_fill_tq() +
facet_wrap(~ category.secondary, scales = "free_y", ncol = 3) +
theme_tq(base_family = enfont) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

使用多个模型预测

油价数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
gas_prices_monthly_raw <- tq_get(
x = "GASREGCOVM",
get = "economic.data",
from = "1990-01-01",
to = "2016-12-31"
)
gas_prices_monthly_raw
#> # A tibble: 316 x 2
#> date price
#> <date> <dbl>
#> 1 1990-09-01 1.26
#> 2 1990-10-01 1.34
#> 3 1990-11-01 1.32
#> 4 1990-12-01 NA
#> 5 1991-01-01 NA
#> 6 1991-02-01 1.09
#> 7 1991-03-01 1.04
#> 8 1991-04-01 1.08
#> 9 1991-05-01 1.13
#> 10 1991-06-01 1.13
#> # … with 306 more rows

summary(gas_prices_monthly_raw$price)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.900 1.138 1.615 1.974 2.697 4.002 2

可以看出有两个缺失值,可以使用 fill 函数进行填充:

1
2
3
4
5
6
7
8
9
10
gas_prices_monthly <- gas_prices_monthly_raw %>%
fill(price, .direction = "down") %>%
fill(price, .direction = "up")
gas_prices_monthly %>%
ggplot(aes(x = date, y = price)) +
geom_line(color = palette_light()[[1]]) +
geom_point(color = palette_light()[[1]]) +
labs(title = "Gasoline Prices, Monthly", x = "", y = "USD") +
scale_y_continuous(labels = scales::dollar) +
theme_tq(base_family = enfont)

把月度数据变成季度数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
gas_prices_quarterly <- gas_prices_monthly %>%
tq_transmute(
mutate_fun = to.period,
period = "quarters"
)

gas_prices_quarterly %>%
ggplot(aes(x = date, y = price)) +
geom_line(color = palette_light()[[1]],
size = 1) +
geom_point(color = palette_light()[[1]]) +
labs(title = "Gasoline Prices, Quarterly", x = "", y = "USD") +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(date_breaks = "5 years",
date_labels = "%Y") +
theme_tq(base_family = enfont)

在这部分,我们将会使用下面三个模型预测汽油的价格:

  1. ARIMA;
  2. ETS;
  3. BATS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
df <- tibble(
f = c("runif", "rpois", "rnorm"),
params = list(
list(n = 10),
list(n = 5, lambda = 10),
list(n = 10, mean = -3, sd = 10)
)
)
df
#> # A tibble: 3 x 2
#> f params
#> <chr> <list>
#> 1 runif <list [1]>
#> 2 rpois <list [2]>
#> 3 rnorm <list [3]>

df_out <- df %>%
mutate(out = invoke_map(f, params))
df_out
#> # A tibble: 3 x 3
#> f params out
#> <chr> <list> <list>
#> 1 runif <list [1]> <dbl [10]>
#> 2 rpois <list [2]> <int [5]>
#> 3 rnorm <list [3]> <dbl [10]>

构建多个模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
gas_prices_quarterly_ts <- gas_prices_quarterly %>%
tk_ts(select = -date, start = c(1990, 3), freq = 4)

model_list <- list(
auto.arima = list(
y = gas_prices_quarterly_ts
),
ets = list(
y = gas_prices_quarterly_ts,
damped = T
),
bats = list(
y = gas_prices_quarterly_ts
)
)

models_tbl <- enframe(model_list, name = "f", value = "params")
models_tbl
#> # A tibble: 3 x 2
#> f params
#> <chr> <list>
#> 1 auto.arima <list [1]>
#> 2 ets <list [2]>
#> 3 bats <list [1]>

models_tbl_fit <- models_tbl %>%
mutate(fit = invoke_map(f, params))
models_tbl_fit

#> # A tibble: 3 x 3
#> f params fit
#> <chr> <list> <list>
#> 1 auto.arima <list [1]> <S3: ARIMA>
#> 2 ets <list [2]> <S3: ets>
#> 3 bats <list [1]> <S3: bats>

检查拟合的模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 查看拟合模型的参数
models_tbl_fit %>%
mutate(tidy = map(fit, sw_tidy)) %>%
unnest(tidy) %>%
spread(key = f, value = estimate)
#> # A tibble: 18 x 4
#> term auto.arima bats ets
#> <chr> <dbl> <dbl> <dbl>
#> 1 alpha NA 0.588 0.831
#> 2 ar.coefficients NA NA NA
#> 3 ar1 0.834 NA NA
#> 4 b NA NA -0.0524
#> 5 beta NA NA 0.000100
#> 6 damping.parameter NA NA NA
#> 7 gamma NA NA 0.0521
#> 8 gamma.values NA -0.0262 NA
#> 9 l NA NA 1.29
#> 10 lambda NA 0.0000605 NA
#> 11 ma.coefficients NA 0.256 NA
#> 12 ma1 -0.964 NA NA
#> 13 phi NA NA 0.837
#> 14 s0 NA NA 0.0469
#> 15 s1 NA NA -0.0209
#> 16 s2 NA NA -0.0407
#> 17 sar1 0.939 NA NA
#> 18 sma1 -0.776 NA NA
1
2
3
4
5
6
7
8
9
10
11
12
# 检查模型的拟合效果
models_tbl_fit %>%
mutate(glance = map(fit, sw_glance)) %>%
unnest(glance, .drop = TRUE)
#> # A tibble: 3 x 13
#> f model.desc sigma logLik AIC BIC ME RMSE MAE
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 auto… ARIMA(1,1… 0.298 -20.6 51.2 64.4 0.0180 0.291 0.177
#> 2 ets ETS(M,Ad,… 0.118 -76.6 173. 200. 0.0149 0.292 0.170
#> 3 bats BATS(0, {… 0.116 159. 179. 184. 0.0193 0.259 0.159
#> # … with 4 more variables: MPE <dbl>, MAPE <dbl>, MASE <dbl>,
#> # ACF1 <dbl>
1
2
3
4
5
6
7
8
9
10
11
# 检查模型的实际值、拟合值和残差
models_tbl_fit %>%
mutate(augment = map(fit, sw_augment, rename_index = "date")) %>%
unnest(augment) %>%
ggplot(aes(x = date, y = .resid, group = f)) +
geom_line(color = palette_light()[[1]]) +
geom_point(color = palette_light()[[1]]) +
geom_smooth(method = "loess") +
facet_wrap(~ f, nrow = 3) +
labs(title = "Residuals Plot") +
theme_tq(base_family = enfont)

使用模型预测

1
2
3
4
5
6
7
8
9
models_tbl_fcast <- models_tbl_fit %>%
mutate(fcast = map(fit, forecast, h = 6))
models_tbl_fcast
#> # A tibble: 3 x 4
#> f params fit fcast
#> <chr> <list> <list> <list>
#> 1 auto.arima <list [1]> <S3: ARIMA> <S3: forecast>
#> 2 ets <list [2]> <S3: ets> <S3: forecast>
#> 3 bats <list [1]> <S3: bats> <S3: forecast>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 整理预测结果
models_tbl_fcast_tidy <- models_tbl_fcast %>%
mutate(sweep = map(fcast, sw_sweep, fitted = FALSE, timetk_idx = TRUE, rename_index = "date"))
models_tbl_fcast_tidy %>%
unnest(sweep) %>%
ggplot(aes(x = date, y = price, color = key, group = f)) +
geom_ribbon(aes(ymin = lo.95, ymax = hi.95), fill = "#D5DBFF", color = NA, size = 0) +
geom_ribbon(aes(ymin = lo.80, ymax = hi.80), fill = "#596DD5", color = NA, size = 0, alpha = 0.8) +
geom_line(size = 1) +
facet_wrap(~f, nrow = 3) +
labs(title = "Gasoline Price Forecasts",
subtitle = "Forecasting multiple models with sweep: ARIMA, BATS, ETS",
x = "", y = "Price") +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
theme_tq(base_family = enfont) +
scale_color_tq()

# R

评论

程振兴

程振兴 @czxa.top
截止今天,我已经在本博客上写了662.8k个字了!

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×