RITCH:在R中解析ITCH文件(金融和市场微观结构数据)

RITCH:在R中解析ITCH文件(金融和市场微观结构数据)

Introducing RITCH: Parsing ITCH Files in R (Finance & Market Microstructure)学习笔记。

首先需要了解一下什么是ITCH,ITCH是纳斯达克用于向其客户传达市场数据的出站协议,即所有信息,包括市场状态,订单,交易,断路器等,每天和每个交易所都有毫微秒时间戳(现在是13位)。一般用于市场微结构分析。

纳斯达克在其FTP服务器上提供了一些样本数据集(3个市场(NASDAQ、PSX和BX)共6天,共约25GB压缩文件)ftp://emi.nasdaq.com/ITCH/

  • 纳斯达克BX(BX)是受Reg NMS(国家市场系统管理规则)保护的报价,具有价格/时间优先市场结构和流行的交易功能。通过回扣以消除流动性,BX为流动性吸收者提供了有吸引力的经济效应。这种定价结构也为流动性创造者创造了机会,提供了一种最大化执行概率的途径。
  • PSX采用Price Setter Pro Rata模式以提供更大的激励,同时也鼓励参与者积极参与价格竞争。(Price Setter Pro Rata:只要在PSX上没有更好的价格出现,价格制定者将维持价格制定者的地位。价格设置顺序确保它的比例更大或任何输入订单的大小的40%)

为了解析ITCH协议文件,作者编写了一个RITCH包:

1
devtools::install_github("DavZim/RITCH")

然后按照需要下载数据,例如下载20180830.BX_ITCH_50.gz数据集,这个数据集有800多M,数据条目为66793375条。

1
2
3
4
# 因为太大了,所以还是用迅雷下载比较稳妥
download.file("ftp://emi.nasdaq.com/ITCH/20180830.BX_ITCH_50.gz", "20180830.BX_ITCH_50.gz", mode = "wb")
# 解压
R.utils::gunzip("20180830.BX_ITCH_50.gz", "20180830.BX_ITCH_50", remove = F)

然后获取数据集的信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library(RITCH)
file <- "20180830.BX_ITCH_50"
msg_count <- count_message(file, add_meta_data = T)
msg_count

> msg_count
msg_type count msg_name msg_group doc_nr
1: S 6 System Event Message System Event Message 4.1
2: R 8664 Stock Directory Stock Related Messages 4.2.1
3: H 8702 Stock Trading Action Stock Related Messages 4.2.2
4: Y 8759 Reg SHO Restriction Stock Related Messages 4.2.3
5: L 5977 Market Participant Position Stock Related Messages 4.2.4
6: V 1 MWCB Decline Level Message Stock Related Messages 4.2.5.1
7: W 0 MWCB Status Message Stock Related Messages 4.2.5.2
8: K 0 IPO Quoting Period Update Stock Related Messages 4.2.6
9: J 0 LULD Auction Collar Stock Related Messages 4.2.7
10: A 27469221 Add Order Message Add Order Message 4.3.1
11: F 32789 Add Order - MPID Attribution Message Add Order Message 4.3.2
12: E 1485269 Order Executed Message Modify Order Messages 4.4.1
13: C 6409 Order Executed Message With Price Message Modify Order Messages 4.4.2
14: X 561109 Order Cancel Message Modify Order Messages 4.4.3
15: D 26446823 Order Delete Message Modify Order Messages 4.4.4
16: U 3158739 Order Replace Message Modify Order Messages 4.4.5
17: P 282534 Trade Message (Non-Cross) Trade Messages 4.5.1
18: Q 0 Cross Trade Message Trade Messages 4.5.2
19: B 0 Broken Trade Message Trade Messages 4.5.3
20: I 0 NOII Message Net Order Imbalance Indicator (NOII) Message 4.6
21: N 7318373 Retail Interest Message Retail Price Improvement Indicator (RPII) 4.7
msg_type count msg_name msg_group doc_nr

该数据集中有许多不同的消息类型。目前,该软件包仅解析“添加订单消息”(类型“A”和“F”),“修改订单消息”(类型“E”,“C”,“X”,“D”和“ U’)和“交易消息”(输入’P’,’Q’和’B’)。可以分布get_orders,get_modifications以及get_trades函数解析。doc-number指的是官方文档中的部分(其中还包含每种类型的更详细的描述)。

例如解析前10个订单:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> orders <- get_orders(file, 1, 10)
10 messages found
[Loading] .
[Converting] to data.table
[Formatting]
> orders
msg_type locate_code tracking_number timestamp order_ref buy shares stock price mpid date datetime
1: A 956 0 2.52e+13 35650 TRUE 400 BOIL 28.42 <NA> 2018-08-30 2018-08-30 07:00:00
2: A 6894 0 2.52e+13 35651 TRUE 2600 SCO 15.15 <NA> 2018-08-30 2018-08-30 07:00:00
3: A 8007 0 2.52e+13 35652 TRUE 3000 USO 14.70 <NA> 2018-08-30 2018-08-30 07:00:00
4: A 1965 0 2.52e+13 35653 TRUE 200 DGAZ 22.17 <NA> 2018-08-30 2018-08-30 07:00:00
5: A 956 0 2.52e+13 35655 FALSE 400 BOIL 28.62 <NA> 2018-08-30 2018-08-30 07:00:00
6: A 8007 0 2.52e+13 35656 FALSE 3000 USO 14.72 <NA> 2018-08-30 2018-08-30 07:00:00
7: A 1965 0 2.52e+13 35657 FALSE 200 DGAZ 22.28 <NA> 2018-08-30 2018-08-30 07:00:00
8: A 6894 0 2.52e+13 35658 FALSE 2600 SCO 15.25 <NA> 2018-08-30 2018-08-30 07:00:00
9: A 7942 0 2.52e+13 35659 TRUE 2500 UNG 23.63 <NA> 2018-08-30 2018-08-30 07:00:00
10: A 2194 0 2.52e+13 35660 TRUE 2500 DWT 6.20 <NA> 2018-08-30 2018-08-30 07:00:00

前10个修改订单消息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> (mod <- get_modifications(file, 1, 10))
10 messages found
[Loading] .
[Converting] to data.table
[Formatting]
msg_type locate_code tracking_number timestamp order_ref shares match_number printable price new_order_ref date datetime
1: X 7942 0 2.520000e+13 35662 600 NA NA NA NA 2018-08-30 2018-08-30 07:00:00
2: X 7901 0 2.520001e+13 35672 100 NA NA NA NA 2018-08-30 2018-08-30 07:00:00
3: D 7942 0 2.520186e+13 35680 NA NA NA NA NA 2018-08-30 2018-08-30 07:00:01
4: D 7901 0 2.520186e+13 35715 NA NA NA NA NA 2018-08-30 2018-08-30 07:00:01
5: X 7942 0 2.520186e+13 35662 300 NA NA NA NA 2018-08-30 2018-08-30 07:00:01
6: X 7901 0 2.520186e+13 35672 100 NA NA NA NA 2018-08-30 2018-08-30 07:00:01
7: D 7901 0 2.520186e+13 35672 NA NA NA NA NA 2018-08-30 2018-08-30 07:00:01
8: D 7942 0 2.520187e+13 35870 NA NA NA NA NA 2018-08-30 2018-08-30 07:00:01
9: X 7901 0 2.520187e+13 35670 400 NA NA NA NA 2018-08-30 2018-08-30 07:00:01
10: X 7901 0 2.520187e+13 35871 1100 NA NA NA NA 2018-08-30 2018-08-30 07:00:01

前10个交易信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> (traders <- get_trades(file, 1, 10))
10 messages found
[Loading] .
[Converting] to data.table
[Formatting]
msg_type locate_code tracking_number timestamp order_ref buy shares stock price match_number cross_type date datetime
1: P 7901 2 2.883108e+13 0 TRUE 100 UGAZ 63.212 17346 <NA> 2018-08-30 2018-08-30 08:00:31
2: P 7901 4 2.883108e+13 0 TRUE 100 UGAZ 63.211 17347 <NA> 2018-08-30 2018-08-30 08:00:31
3: P 1965 2 2.884118e+13 0 TRUE 100 DGAZ 22.158 17348 <NA> 2018-08-30 2018-08-30 08:00:41
4: P 7901 2 2.884548e+13 0 TRUE 100 UGAZ 63.150 17349 <NA> 2018-08-30 2018-08-30 08:00:45
5: P 8001 2 2.899981e+13 0 TRUE 600 USLV 6.620 17352 <NA> 2018-08-30 2018-08-30 08:03:19
6: P 376 2 2.901043e+13 0 TRUE 100 AMZN 2000.900 17353 <NA> 2018-08-30 2018-08-30 08:03:30
7: P 376 2 2.920351e+13 0 TRUE 1 AMZN 2000.110 17360 <NA> 2018-08-30 2018-08-30 08:06:43
8: P 376 2 2.925918e+13 0 TRUE 10 AMZN 2000.900 17361 <NA> 2018-08-30 2018-08-30 08:07:39
9: P 3983 2 3.071357e+13 0 TRUE 50 IGC 1.760 17382 <NA> 2018-08-30 2018-08-30 08:31:53
10: P 4612 2 3.090364e+13 0 TRUE 1100 KTWO 27.250 17388 <NA> 2018-08-30 2018-08-30 08:35:03

最后使用这个数据绘制交易最多的ETF,首先寻找交易最多的四只ETF:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
> library(magrittr)
> # dt是一个table
> dt <- get_orders(file, 1, count_orders(msg_count), quiet = T) %>% .$stock %>% table %>% sort(decreasing = T)
> dt %>% head(4)
.
SPY IWO VXX URTY
149949 142919 134074 111515
> df <- as.data.frame(dt)
> df %>% head(4)
. Freq
1 SPY 149949
2 IWO 142919
3 VXX 134074
4 URTY 111515

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

library(ggplot2)
orders <- get_orders(file, 1, count_orders(msg_count))
trades <- get_trades(file, 1, count_trades(msg_count))

(tickers <- as.vector(df$. %>% head(4)))
dt_orders <- orders[stock %in% tickers]
dt_trades <- trades[stock %in% tickers]
# 对于每一只股票,只选择那些交易价格范围内1%的订单
ranges <- dt_trades[, .(min_price = min(price), max_price = max(price)), by = stock]
ranges
# 过滤订单
dt_orders <- dt_orders[ranges, on = "stock"][price >= 0.99 * min_price & price <= 1.01 * max_price]
dt_orders %>% head(4)
# 把buy变量变得更人性化
dt_orders[, buy := ifelse(buy, "Bid", "Ask")]
dt_orders[, stock := factor(stock, levels = tickers)]

# 绘图
ggplot() +
geom_point(data = dt_orders,
aes(x = datetime, y = price, color = buy), size = 0.5) +
geom_step(data = dt_trades,
aes(x = datetime, y = price)) +
facet_wrap(~stock, scales = "free_y") +
theme_light() +
labs(title = "订单和交易量最大的ETF",
subtitle = "日期:2018年8月30日 | 市场:BX",
caption = "数据来源:NASDAQ",
x = "时间",
y = "价格",
color = "交易方") +
scale_y_continuous(labels = scales::dollar) +
scale_color_brewer(palette = "Set1")
ggsave("20181015d9.png")

# R

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×