从R中的推文中删除正确的英语单词-Java 学习之路

我正在使用R处理Twitter数据，并试图从推文中删除所有正确的英语单词 . 我们的想法是查看我记录的推文中特定人群所使用的口语缩写，拼写错误和俚语 .

例：

tweet <- c("Trying to find the solution frustrated af")

经过上述操作后，我想只有'af'

我想过把字母（我会下载）上的字母洗掉，但必须有一个更简单的选择 . Python中的任何解决方案也会有所帮助 .

1 回答

另一个基于hunspell的解决方案使用了一个相当新的和有趣的package：

# install.packages("hunspell") # uncomment & run if needed
library(hunspell)
tweet <- c("Trying to find the solution frustrated af")
( tokens <- strsplit(tweet, " ")[[1]] )
# [1] "Trying"     "to"         "find"       "the"        "solution"   "frustrated" "af"        
tokens[!hunspell_check(tokens), dict = "en_US"]
# [1] "af"

回复于 2024-04-20T11:22:25+08:00

从R中的推文中删除正确的英语单词

1 回答

相关问题