首页 文章

有没有办法在使用group_by分组后重新排序变量的级别?

提问于
浏览
0

我想重现在"Text mining with R"一书的第4.1.3节中找到的图4.3 . sentiment analysis

enter image description here

本节试图通过四个关键否定词“不”,“不”,“从不”和“不”来对所有双字母组合,并且对于每个组,它将绘制情感贡献(仅通过否定词后面的单词,这意味着对这本书的错误贡献 .

因此,我将绘制单词作为y轴和贡献作为x轴,并且为了使图看起来不错,我还希望每个组按照降序排列条 . 因此与前面的部分类似,我使用贡献值重新排序单词的级别 .

但这里的问题是,在每个群体下,这些词会有不同的贡献 . 例如,在组1中,"happy"出现超过"hope",因此它具有更高的贡献,但在组2中,它是相反的方式 . 更糟糕的是,当数据帧已经 group_by(word1) 时,我无法做 mutate(word2 = reorder(word2, contribution)) .

这本书能够很好地生成情节,所以我想有一些方法可以根据不同的组重新排序 .

以下是代码, #preparing the data for plotting 之前的任何内容都是从书中获取的,所以不应该有任何问题,从那里代码是我的 .

library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr)

#getting bigrams

austen_bigrams <- austen_books() %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)  
bigrams_separated <- austen_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")  

#four negation words to look at

negation_words <- c("not", "no", "never", "without")
AFINN <- get_sentiments("afinn")

#get the sentiment score of words preceded by the four negation words

negated_words <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>%  #word1 as negation words
  inner_join(AFINN, by = c(word2 = "word")) %>%  #word2 as the word following negation words
  count(word1, word2, score, sort = TRUE) %>%
  ungroup()

#preparing the data for plotting

bigrams_plot <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>% 
  inner_join(AFINN, by = c(word2 = "word")) %>%  #getting sentiment score
  count(word1, word2, score, sort = TRUE) %>%
  mutate(contribution = n * score) %>%  #defining contribution as n*score
  group_by(word1) %>%  #group by negation words
  top_n(12,abs(contribution)) %>%
  arrange(desc(abs(contribution))) %>%
  ungroup() %>%
  mutate(word2 = reorder(word2, contribution)) 

#plotting sentiment score contribution grouped by the four negation words

ggplot(bigrams_plot, aes(word2, n * score, fill = n * score > 0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~word1, ncol = 2, scales = "free") +
  coord_flip()

我在下面创建了一个更简单的版本:

v1_grp <- c(rep('A',10),rep('B',10))
v2_Aterm <- sample(letters[1:10],10,replace=F)
v2_Bterm <- sample(letters[1:10],10,replace=F)
v3_score <- sample(-10:10,20,replace=T)

data1 <- data_frame(grp=v1_grp,term=c(v2_Aterm,v2_Bterm),score=v3_score)

dataplot <- data1 %>%
  arrange(desc(score)) %>%
  mutate(term=reorder(term,score)) 

ggplot(dataplot, aes(term,score,fill=score>0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~grp, ncol = 2, scales = "free") +
  coord_flip()

1 回答

  • 1

    (改编自https://drsimonj.svbtle.com/ordering-categories-within-ggplot2-facets

    dataplot <- data1 %>%
      arrange(grp, score) %>%
      mutate(order = row_number())
    
    ggplot(dataplot, aes(order,score,fill=score>0)) +
      geom_col(show.legend = FALSE) +
      facet_wrap(~grp, ncol = 2, scales = "free") +
      coord_flip() +
      scale_x_continuous(
        breaks = dataplot$order,
        labels = dataplot$term,
        expand = c(0,0)
      )
    

    enter image description here

相关问题