首页 文章

dplyr / tidyr - 用条件汇总数据

提问于
浏览
2

Problem 我正在尝试使用dyplr&tidyr来实现一个输出表(就像我认为的列联表),它将这些数据汇总到频率中(例如, Headers ,描述和主体的数量是负数,中性和正数) . 我尝试了许多不同的方法,我能找到的最接近的例子是Using Tidyr/Dplyr to summarise counts of groups of strings . 但这完全不合适 .

Example Data 数据看起来有点像......

df <- data.frame( "story_title"=c(0.0,0.0,0.0,-1.0,1.0),
                  "story_description"=c(-0.3,-0.3,-0.3,0.5,0.3),
                  "story_body"=c(-0.3,0.2,0.4,0.2,0))

Desired Output 输出有望看起来像这样,显示每个故事部分的摘要频率......

Negative  Neutral  Positive 
story_title              1         3        1         
story_description        3         0        2
story_body               1         1        3

(编辑了story_body的总数 - 谢谢Akrun)

Attempted Approach

如果我是对的,那么第一步就是使用 gather 来重塑数据......

df <- df %>% gather(type,score,starts_with("story"))

> df 
      type score
1        story_title   0.0
2        story_title   0.0
3        story_title   0.0
4        story_title  -1.0
5        story_title   1.0
6  story_description  -0.3
7  story_description  -0.3
8  story_description  -0.3
9  story_description   0.5
10 story_description   0.3
11        story_body  -0.3
12        story_body   0.2
13        story_body   0.4
14        story_body   0.2
15        story_body   0.0

从这里我认为这是group_by和summary的组合,我试过......

df %>% group_by(sentiment) %>%
          summarise(Negative = count("sentiment_title"<0),
                    Neutral  = count("sentiment_title"=0),
                    Positive  = count("sentiment_title">0)
                   )

显然这没有用 .

Can anyone help with a dplyr/tidyr solution (a base table answer would also be useful as an example)?

3 回答

  • 1

    尝试

    library(dplyr)
    library(tidyr)
    gather(df) %>% 
          group_by(key,value= sign(value))%>%
          tally()  %>% 
          mutate(ind= factor(value, levels=c(-1,0,1), 
                        labels=c('Negative', 'Neutral', 'Positive'))) %>% 
          select(-value) %>% 
          spread(ind, n, fill=0)
    
  • 2

    尝试使用 cut 重新标记这三个类别 . 然后,这只是用 gather 解冻数据并用 dcast 重塑'wide'的问题 .

    library(tidyr)
    library(reshape2)
    df[] <- lapply(df, function(x) {cut(x, c(-Inf,-1e-4,0,Inf), c("Negative", "Neutral", "Positive"))})
    dcast(gather(df), key~value)
    #            key Negative Neutral Positive
    #1       story_title        1       3        1
    #2 story_description        3       0        2
    #3        story_body        1       1        3
    
  • 1

    你为什么不只使用原生R的xtabs?

    继您的代码后:

    >df <- df %>% gather(type,score,starts_with("story"))
    >df$movement<-ifelse(df$score ==0 ,"Neutral",ifelse(df$score < 0 ,"Negative","Positive"))
    >xtabs(~df$type+df$movement)
    
                          df$movement
      df$type             Negative Neutral Positive
      story_title              1       3        1
      story_description        3       0        2
      story_body               1       1        3
    

相关问题