首页 文章

dplyr根据向量中的名称改变多个列

提问于
浏览
1

我想通过使用 dplyr's mutate 函数将两列相互相乘 .

但是,我不想为每个mutate条件写一个新行,而是使用存储在向量 var1var2 中的列的名称 . 例如,最后我想在现有的 bankdata 中添加一个名为 result1 的列,其中包含将现金和贷款列相互叠加的结果 . 这将持续到创建3个新列 .

可重现的代码:

bankname <- c("Bank A", "Bank B", "Bank C", "Bank D", "Bank E")
bankid <- c(1, 2, 3, 4, 5)
year <- c(1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
cash <- c(7250, 10243, 13357, 35000, 351266)
bond <- c(20218, 185151, 177612, 20000, 314012)
loans <- c(29513, 2800, NA, 5000, NA)
bankdata <- data.frame(bankname, bankid, year, totass, cash, bond, loans)

向量 var1var2 包含我想要乘以的列名( cash*loans, bond*cash, loans*bankid ), output 是新列的名称:

var1 <- c("cash", "bond", "loans")
var2 <- c("loans","cash", "bankid")
output <- c("result1", "result2", "result3")

我想做类似这样的事情:

bankdata %>%
  mutate_at(.funs = funs(output = var1*var2), vars(var1, var2))

bankdata %>%
  mutate_at(.funs = funs(result1 = cash*., result2 = bond*., result3 = loans*.), vars(var2))

1 回答

  • 0

    使用tidyeval approach,我们构建一个函数,可以将字符串作为输入,然后创建新列 . 注意使用 rlang::sym!! (bang bang) .

    之后我们可以使用 purrr::pmap_dfc 循环 var1var2 来创建名称由 output 提供的新列

    library(tidyverse)
    
    bankname <- c("Bank A", "Bank B", "Bank C", "Bank D", "Bank E")
    bankid <- c(1, 2, 3, 4, 5)
    year <- c(1881, 1881, 1881, 1881, 1881)
    totass <- c(244789, 195755, 107736, 170600, 32000000)
    cash <- c(7250, 10243, 13357, 35000, 351266)
    bond <- c(20218, 185151, 177612, 20000, 314012)
    loans <- c(29513, 2800, NA, 5000, NA)
    bankdata <- data.frame(bankname, bankid, year, totass, cash, bond, loans)
    
    originalNames <- names(bankdata)
    var1   <- c("cash", "bond", "loans")
    var2   <- c("loans","cash", "bankid")
    output <- c("result1", "result2", "result3")
    
    my_mutate <- function(df, var1, var2, output) {      
      var1   <- rlang::sym(var1)
      var2   <- rlang::sym(var2)
      output <- rlang::sym(output)
    
      df <- df %>% 
        mutate(!! output := !! var1 * !! var2)
    
      return(df)
    }
    
    # test
    my_mutate(bankdata, var1[1], var2[1], output[1])
    
    #>   bankname bankid year   totass   cash   bond loans   result1
    #> 1   Bank A      1 1881   244789   7250  20218 29513 213969250
    #> 2   Bank B      2 1881   195755  10243 185151  2800  28680400
    #> 3   Bank C      3 1881   107736  13357 177612    NA        NA
    #> 4   Bank D      4 1881   170600  35000  20000  5000 175000000
    #> 5   Bank E      5 1881 32000000 351266 314012    NA        NA
    
    # loop through 3 lists simultaneously 
    # keep only original and result* columns
    pmap_dfc(list(var1, var2, output), ~ my_mutate(bankdata, ..1, ..2, ..3)) %>% 
      select(!! originalNames, starts_with("result"))
    
    #>   bankname bankid year   totass   cash   bond loans   result1      result2
    #> 1   Bank A      1 1881   244789   7250  20218 29513 213969250    146580500
    #> 2   Bank B      2 1881   195755  10243 185151  2800  28680400   1896501693
    #> 3   Bank C      3 1881   107736  13357 177612    NA        NA   2372363484
    #> 4   Bank D      4 1881   170600  35000  20000  5000 175000000    700000000
    #> 5   Bank E      5 1881 32000000 351266 314012    NA        NA 110301739192
    #>   result3
    #> 1   29513
    #> 2    5600
    #> 3      NA
    #> 4   20000
    #> 5      NA
    

    reprex package(v0.2.0)创建于2018-04-18 .

相关问题