渲染单词树!

渲染单词树!

这是坐火车的时候在写的一个简单的 R 包,可以用来绘制单词树。使用的是 Google 图表库的 js 文件。

安装

1
devtools::install_github("czxa/gwordtree")

使用

1
2
data(worddf)
gwordtree(word = worddf$word, firstword = "cats")

worddf 是这样的:

1
worddf %>% slice(1:10) %>% knitr::kable(align = "c")
word
cats are better than dogs
cats eat kibble
cats are better than hamsters
cats are awesome
cats are people too
cats eat mice
cats meowing
cats in the cradle
cats eat mice
cats in the cradle lyrics

对于英文句子,可以直接使用 gwordtree() 函数,但是对于中文句子则需要先使用 jiebaR 进行分词,这里我举一个大尺度的例子,我首先爬取了 妹子图 网站上所有套图的标题,然后对每个句子进行分词,最后使用 gwordtree() 函数渲染单词树:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# 首先爬取妹子图网站的图片标题
library(rvest)
df <- data.frame(
title = rep(NA, 214*27)
)
for(i in 1:214){
try(
html <- read_html(paste0('https://www.mzitu.com/page/', i, '/'))
)
for(j in 1:27){
try(
html %>%
html_nodes(css = paste0("#pins > li:nth-child(", j, ") > span:nth-child(2) > a")) %>%
html_text() -> df$title[i * j]
)
}
}
df <- df %>%
dplyr::filter(!is.na(title))
# 分词
library(jiebaR)
engine_s <- worker(stop_word = "stopwords.txt",
user = "dictionary.txt")

for(i in 1:length(df$title)){
df$segment[i] <- paste(segment(df$title[i], engine_s), collapse = " ")
}
# 保存爬取和分词结果
readr::write_rds(df, "mztutitle.rds")

df <- readr::read_rds("mztutitle.rds")
library(gwordtree)
# 关键词1:黑丝
gwordtree(word = df$segment, firstword = "黑丝")

1
2
# 关键词2: 尤物
gwordtree(word = df$segment, firstword = "尤物")

1
2
# 关键词3: 波霸
gwordtree(word = df$segment, firstword = "波霸")

·

1
2
# 关键词4: 制服
gwordtree(word = df$segment, firstword = "制服")

可以看出单词树可以用于展示句子中词语和词语之间的关系。似乎针对中文用户并不友好。

Shiny Apps 示例:

1
2
3
dir <- system.file("examples", "gwordtree", package = "gwordtree")
setwd(dir)
shiny::shinyAppDir(".")
# R

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×