首页 文章

tidyr中的隐式排序:: spread和dplyr :: summarize

提问于
浏览
13

我的数据是有序观察,我希望在进行操作时尽可能保持顺序 .

得到this question的答案,我把"B"放在数据帧的"A"之前 . 生成的宽数据按"name"列排序,即"A"首先,然后"B" .

df = data.frame(name=c("B","B","A","A"),
                group=c("g1","g2","g1","g2"),
                V1=c(10,40,20,30),
                V2=c(6,3,1,7))

gather(df, Var, Val, V1:V2) %>% 
unite(VarG, Var, group) %>% 
spread(VarG, Val)

  name V1_g1 V1_g2 V2_g1 V2_g2
1    A    20    30     1     7
2    B    10    40     6     3

有没有办法保持原始订单?像这样:

name V1_g1 V1_g2 V2_g1 V2_g2
1    B    10    40     6     3
2    A    20    30     1     7

04/02编辑:我刚刚发现 dplyr::summarise 也做了排序 . arrange(name, df$name) 仍然可以恢复订单 . 但我想知道从包装设计中是否需要额外的分类?

df %>% 
  group_by(name) %>% 
  summarise(n()) %>% 

  name n()
1    A   2
2    B   2

2 回答

  • 8

    您可以根据原始数据框中的顺序按名称排序:

    gather(df, Var, Val, V1:V2) %>% 
      unite(VarG, Var, group) %>% 
      spread(VarG, Val) %>%
      arrange( order(match(name, df$name)))
    
    #   name V1_g1 V1_g2 V2_g1 V2_g2
    # 1    B    10    40     6     3
    # 2    A    20    30     1     7
    
  • 10

    订单取自因子水平的顺序 .

    str(df)
    'data.frame':   4 obs. of  4 variables:
     $ name : Factor w/ 2 levels "A","B": 2 2 1 1
     $ group: Factor w/ 2 levels "g1","g2": 1 2 1 2
     $ V1   : num  10 40 20 30
     $ V2   : num  6 3 1 7
    

    看到水平是“A”,“B” .

    因此,如果您将级别的顺序设置为它们显示的顺序,它将起作用:

    df = data.frame(name=c("B","B","A","A"),
                    group=c("g1","g2","g1","g2"),
                    V1=c(10,40,20,30),
                    V2=c(6,3,1,7))
    
    df %>% 
        mutate(name = factor(name,levels=unique(name))) %>% 
        mutate(group = factor(group,levels=unique(group))) %>% 
        gather(Var, Val, V1:V2) %>% 
        unite(VarG, Var, group) %>% 
        spread(VarG, Val)
    

    结果是:

    name V1_g1 V1_g2 V2_g1 V2_g2
    1    B    10    40     6     3
    2    A    20    30     1     7
    

相关问题