首页 文章

使用dplyr汇总逻辑值并按多个因素进行分组

提问于
浏览
1

我想按两列(部门和产品线)对数据框进行分组,并输出一个新数据框,其中包含每个部门和产品线的选定逻辑值的计数 . 原始数据的结构如下:

product department  line date
apple   A   big      201707
cherry  A   midlle   201609
potato  B   midlle   201801
peach   C   small    201807
pear    B   big      201807

日期是数字,其他变量是字符 .

我想添加两列x和y,x表示日期是2018年,y表示日期是201807.按部门和行分组,按降序排列 . 输出数据框如下:

department line x y
A    big       0  0
A    middle    0  0
B    big       1  1
B    midlle    1  0
C    small     1  1

我试过用dplyr . 首先,我将原始数据子集化,仅保留部门,行和日期列 . 然后我使用factor()设置department和line to factors . 当我使用str(子数据)时,我可以看到部门和行在因子类中 .

最后,我使用group_by,并总结得到我想要的数据框 . 但结果不是我想要的 .

DF <- subdata %>% 
    group_by(department, line) %>% 
    summarise(x = sum(data$date >= 201800, na.rm = TRUE),
              y = sum(data$date == 201807, na.rm = TRUE))

我做错了吗?我也尝试过reshape2包,但我也无法得到我想要的东西 . 我的数据中有2936行 . 我得到的就像这样:

str(DF)
classes ‘grouped_df’, ‘tb_df’, ‘tb1’ and ‘data.frame’: 1 obs. of 4 variables:
$ department    : chr department
$ line :  chr line
$ x : int 220
$ y : int 29

我想问题可能在于部门和线路变量的因素过程 . 由于group_by和summary过程之后的类是“因素”而不是“因素” . 但我无法弄清楚解决方案 .

有人可以帮忙吗?

3 回答

  • 0

    我建议事先在原始数据帧上使用 ifelse 来创建列x和y:

    df$x <- ifelse(df$date > 201800, 1, 0)
    df$y <- ifelse(df$date == 201807, 1, 0)
    

    现在使用dplyr进行总结

    library(dplyr)
    df_new <- df %>% group_by(department, line) %>% summarise(X = sum(x), Y = sum(y))
    
  • 0

    试试这个:

    library(tidyverse)
     df<-data.frame(product=as.character(c("apple","cherry","potato","peach","pear")),
                     department=as.character(c("A","A","B","C","B")),
                     line=c("big","midlle","midlle","small","big"),
                     date=as.character(c("201707","201609","201801","201807","201807")))
    
     df%>%
       mutate(yr= as.numeric(str_sub(date,1,4)),
              x=ifelse(yr==2018,1,0),
              y=ifelse(date=="201807",1,0))%>%
       group_by(department,line)%>%
       summarise(x=sum(x,na.rm = T),
                 y=sum(y,na.rm = T))
    # A tibble: 5 x 4
    # Groups:   department [?]
      department line       x     y
      <fct>      <fct>  <dbl> <dbl>
    1 A          big        0     0
    2 A          midlle     0     0
    3 B          big        1     1
    4 B          midlle     1     0
    5 C          small      1     1
    
  • 0

    这是使用 grepl 的不同方法:

    library(tidyverse)
    
    result <- data %>% 
      group_by(department, line) %>% 
      summarise(x = as.numeric(grepl("2018", date)),
                y = as.numeric(grepl("201807", date)))
    
    result
    ## A tibble: 5 x 4
    ## Groups:   department [?]
    #  department line       x     y
    #  <fct>      <fct>  <dbl> <dbl>
    #1 A          big        0     0
    #2 A          midlle     0     0
    #3 B          big        1     1
    #4 B          midlle     1     0
    #5 C          small      1     1
    

    数据:

    data <- read.table(header = TRUE, text = "
                   product department  line date
        apple   A   big      201707
        cherry  A   midlle   201609
        potato  B   midlle   201801
        peach   C   small    201807
        pear    B   big      201807")
    

相关问题