首页 文章

在dplyr中将一组列除以(并命名)另一组

提问于
浏览
0

在(非常害怕的)dplyr管道之后,我最终得到了这样的数据集:

year   A    B     C  [....] Z  count.A    count.B     count.C [....] count.Z
1999  10    20    10  ...   6      3          5           67    ...      6
2000   3    5      5  ...   7      5          2            5    ...      5

要重现的一些示例数据:

df <- data.frame(year = c(1999, 2000), 
                 A = c(10, 20), 
                 B = c(3, 6), 
                 C = c(1, 2), 
                 count.A = c(1, 2), 
                 count.B = c(8, 9), 
                 count.C = c(5, 7))

我真正需要的是将每一列与其“计数”对应物组合,即

weight.A = A / count.A, 
weight.B = B / count.B

我要以编程方式执行此操作,因为我有数百列 . 有没有办法在dplyr管道中这样做?

4 回答

  • 1

    假设您可以以编程方式创建所有列名称的向量,以下是我为上面的示例所做的操作

    for (c.name in c("A", "B", "C")) {
        c.weight <- sprintf("weight.%s", c.name)
        c.count <- sprintf("count.%s", c.name)
        df[,c.weight] <- df[,c.name] / df[,c.count]
    }
    
  • 1

    不要在列名中存储变量 . 如果您重塑数据以使其整洁,计算非常简单:

    library(tidyverse)
    
    df %>% gather(var, val, -year) %>%    # reshape to long
        separate(var, c('var', 'letter'), fill = 'left') %>%    # extract var from former col names
        mutate(var = coalesce(var, 'value')) %>%    # add name for unnamed var
        spread(var, val) %>%    # reshape back to wide
        mutate(weight = value / count)    # now this is very simple
    
    #>   year letter count value     weight
    #> 1 1999      A     1    10 10.0000000
    #> 2 1999      B     8     3  0.3750000
    #> 3 1999      C     5     1  0.2000000
    #> 4 2000      A     2    20 10.0000000
    #> 5 2000      B     9     6  0.6666667
    #> 6 2000      C     7     2  0.2857143
    
  • 3

    如果您的列一致地命名(并且很容易检索),您可以使用 lapply 轻松完成此操作:

    cols <- c("A","B","C")
    df[,paste0("weighted.",cols)] <- lapply(cols, function(x) df[,x] / df[, paste0("count.",x)])
    
    #  year  A B C count.A count.B count.C weighted.A weighted.B weighted.C
    #1 1999 10 3 1       1       8       5         10  0.3750000  0.2000000
    #2 2000 20 6 2       2       9       7         10  0.6666667  0.2857143
    
  • 2

    假设列是有序的,我们可以使用 data.table . 在 .SDcols 中指定感兴趣的列,并将Data.table的子集的子集除以另一半,并将其分配( := )到新列

    library(data.table)
    setDT(df)[, paste0("weighted.",names(df)[1:3]) := .SD[,1:3]/.SD[,4:6], .SDcols = A:count.C]
    df
    #   year  A B C count.A count.B count.C weighted.year weighted.A weighted.B
    #1: 1999 10 3 1       1       8       5            10  0.3750000  0.2000000
    #2: 2000 20 6 2       2       9       7            10  0.6666667  0.2857143
    

相关问题