这个问题是双重的 . 对这两个问题的回答都是适当的解决方案 . 非常感谢您能否将建议显示为R代码 .
1)Syuzhet数据包中的NRC词典产生最广泛的情绪,但它似乎并不能控制否定者 . 阅读文档后,我仍然不确定如何克服这个问题 . 也许通过将每个句子的正面和负面编码的单词相乘,例如, I(0)AM(0)NOT(-1)ANGRY(-1)=( - 1 * -1)= 1.但是,我不知道如何用正确的代码写这个 .
2)经过大量的研究和测试,我发现SentimentR中的jockers_rinker词典处理了否定词并且修改得更好(https://github.com/trinker/sentimentr#comparing-sentimentr-syuzhet-meanr-and-stanford) . 我可以通过比较两个包的二进制情绪输出,将SentimentR用于来自Suyzhet / NRC结果的"quality test"结果 . 如果它们偏离太多,那么NRC对于该特定的文本来说不够准确 . 但是,我只知道如何获得个人得分而不是每个情绪的总得分(正数和负数之和)
你可以看到我的测试结果如何在一个串联的字符串上进行比较,其中表达了使用和不使用修饰符和否定符的情绪 .
#Suyzhet:
library("syuzhet")
MySentiments = c("I am happy", "I am very happy", "I am not happy","It was
bad","It is never bad", "I love it", "I hate it")
get_nrc_sentiment(MySentiment, cl = NULL, language = "english")
#Result:
anger anticipation disgust fear joy sadness surprise trust negative positive
0 1 0 0 1 0 0 1 0 1
0 1 0 0 1 0 0 1 0 1
0 1 0 0 1 0 0 1 0 1
1 0 1 1 0 1 0 0 1 0
1 0 1 1 0 1 0 0 1 0
0 0 0 0 1 0 0 0 0 1
1 0 1 1 0 1 0 0 1 0
#SentimentR:
library("sentimentr")
MySentiments = c("I am happy", "I am very happy", "I am not happy","It was
bad","It is never bad", "I love it", "I hate it")
sentiment(MySentiments, polarity_dt =
lexicon::hash_sentiment_jockers_rinker,
valence_shifters_dt = lexicon::hash_valence_shifters, hyphen
= "", amplifier.weight = 0.8, n.before = 5, n.after = 2,
question.weight = 1, adversative.weight = 0.25,
neutral.nonverb.like = FALSE, missing_value = NULL)
#Results:
element_id sentence_id word_count sentiment
1 1 3 0.4330127
2 1 4 0.6750000
3 1 4 -0.3750000
4 1 3 -0.4330127
5 1 4 0.3750000
6 1 3 0.4330127
7 1 3 -0.4330127
第一个输出似乎没有认识到“非常”,“不是”和“从不”的重要性 .