首页 文章

使用列名称的查找表中的值有条件地替换列中的值

提问于
浏览
0

我正在使用两个表:

t1<-data.frame(Name=c("Waldo","Mark","Harold","Earl"),Number=c(1,4,3,9))

t2<-data.frame(Whatever=c("does","not","really","matter","at","all"),Waldo=c(0,1,1,0,0,1),Mark=c(1,0,1,1,0,0),Harold=c(0,1,0,0,0,0),Earl=c(1,1,1,1,0,0),Extra=c("another","column","appearing","in","this","table"))

我想做的是用 t1 中的查找值替换 t2 中的1 . t2 的列名称显示为 t1 中的记录 . t2 中的所有0值应保持不变 .

在我的真实数据中, t2 中有数百列, t1 中有数百行 .

t2 中还有几个列不应受此编码的影响,但应保留在最终输出中 .

有编码的最佳做法吗?

该示例的所需输出如下:

Whatever   Waldo  Mark  Harold  Earl  Extra
does       0      4     0       9     another
not        1      0     3       9     column
really     1      4     0       9     appearing
matter     0      4     0       9     in
at         0      0     0       0     this
all        1      0     0       0     table

先感谢您!

2 回答

  • 1

    这是一个 tidyverse 工作流程,对于此示例可能有点过量,但应该适用于较大的数据集 . 我很复杂,从宽形数据再到长到宽:

    首先,我将 t2 重新整形为长格式并过滤观察1:

    library(tidyverse)
    
    t2 %>%
      gather(key = Name, value = value, -Whatever, -Extra) %>%
      filter(value == 1)
    #>    Whatever     Extra   Name value
    #> 1       not    column  Waldo     1
    #> 2    really appearing  Waldo     1
    #> 3       all     table  Waldo     1
    #> 4      does   another   Mark     1
    #> 5    really appearing   Mark     1
    #> 6    matter        in   Mark     1
    #> 7       not    column Harold     1
    #> 8      does   another   Earl     1
    #> 9       not    column   Earl     1
    #> 10   really appearing   Earl     1
    #> 11   matter        in   Earl     1
    

    然后我使用 t1 加入 t1 ,以防 t2 中的任何观察在 t1 中没有匹配 . 这让我从 t1 获得 Number 列,所以现在我可以从收集中删除 value 列:

    t2 %>%
      gather(key = Name, value = value, -Whatever, -Extra) %>%
      filter(value == 1) %>%
      left_join(t1, by = "Name") %>%
      select(-value)
    #>    Whatever     Extra   Name Number
    #> 1       not    column  Waldo      1
    #> 2    really appearing  Waldo      1
    #> 3       all     table  Waldo      1
    #> 4      does   another   Mark      4
    #> 5    really appearing   Mark      4
    #> 6    matter        in   Mark      4
    #> 7       not    column Harold      3
    #> 8      does   another   Earl      9
    #> 9       not    column   Earl      9
    #> 10   really appearing   Earl      9
    #> 11   matter        in   Earl      9
    

    然后我使用 spread 将其恢复为宽格式 . 请注意,这些函数创建了对它们进行排序的因子,因此最后,扩展列按字母顺序排列 . 如果需要,可以使用 select 更改列的顺序 .

    开始完成过程:

    t2 %>%
      gather(key = Name, value = value, -Whatever, -Extra) %>%
      filter(value == 1) %>%
      left_join(t1, by = "Name") %>%
      select(-value) %>%
      spread(key = Name, value = Number, fill = 0)
    #>   Whatever     Extra Earl Harold Mark Waldo
    #> 1      all     table    0      0    0     1
    #> 2     does   another    9      0    4     0
    #> 3   matter        in    9      0    4     0
    #> 4      not    column    9      3    0     1
    #> 5   really appearing    9      0    4     1
    

    reprex package(v0.2.0)创建于2018-08-14 .

  • 1

    这应该足够灵活,适合您的实际数据集:

    my_function <- function(df, lookup) {
      for(i in names(df)) {
        df[[as.character(i)]][df[[as.character(i)]] == 1] <- lookup$Number[lookup$Name == as.character(i)]
      }
      return(df)
    }
    
    my_function(t2, t1)
    #   Whatever Waldo Mark Harold Earl     Extra
    # 1     does     0    4      0    9   another
    # 2      not     1    0      3    9    column
    # 3   really     1    4      0    9 appearing
    # 4   matter     0    4      0    9        in
    # 5       at     0    0      0    0      this
    # 6      all     1    0      0    0     table
    

相关问题