首页 文章

R:如何根据特定规则从数据框中选择数据,并将数据作为新列添加到现有数据框中

提问于
浏览
1

我有两个数据帧df1和df2 .

df1 <- data.frame(x1=c("A35", "A41", "A49"),
                  x2=c(8, 24, 33),
                  x3=c(15, 63, 54))

df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                  y2 = c(9, 20, 24, 32, 84))

我想根据以下三个条件从df2中选择行

(1)df2的y1与df1的x1相匹配;

(2)df1的y2 = d2的x2

(3)df2的y2 = df1的<x3

符合条件的数据将作为新列添加到df1 . 如果df1的行具有多个匹配,那么这些附加匹配也将作为新列添加 .

预期的结果是

data.frame(x1=c("A35", "A41", "A49"),
           x2=c(8, 24, 33),
           x3=c(15, 63, 54),
           z1 = c("A35", "A41", ""),
           z2 = c(9, 24,""),
           z3 = c("", "A41", ""),
           z4 = c("", 32, ""))

x1 x2 x3 z1 z2 z3 z4
A35 8 15 A35 9  
A41 24 63 A41 24 A41 32
A49 33 54

提前致谢!

2 回答

  • 0

    如果我正确理解你的问题,这应该有效:

    ### we use the matches to pick our values from df1
    ### we use our conditions to pick our values from df2
    matches <- match(df2$y1,df1$x1)
    matches <- matches[!is.na(matches)]
    condition1 <- df2$y1 %in% df1$x1
    condition2 <- df2$y2[condition1] >= df1$x2[matches]
    condition3 <- df2$y2[condition1] <= df1$x3[matches]
    
    ### i create these tmp variables so you can see step by step
    ### what each line of code is doing
    ### here i am finding the values that meet all the conditions
    ### then i am pulling the associated y2 values
    tmp <- data.frame(x1=df1$x1[matches],y2=df2$y2[condition1])
    tmp <- tmp[condition2&condition3,]
    tmp <- droplevels(tmp)
    
    ### now that we have the values we want
    ### we are organizing the data in the desired output you 
    ### specified. 
    x <- split(tmp[-1], tmp[[1]])
    tmp2 <- data.frame()
    for(i in 1:length(x)){
    
      df <- data.frame(t(unlist(x[[i]], use.names=FALSE)))
      colnames(df) <- seq(1,nrow(x[[i]]))
      tmp2 <- rbind.fill(tmp2,df)
    
    }
    colnames(tmp2) <- paste(rep("z",ncol(tmp2)),1:ncol(tmp2),sep="")
    res <- data.frame(df1[df1$x1 %in% names(x),],tmp2)
    res <- rbind.fill(res,df1[!df1$x1 %in% names(x),])
    
    >res
       x1 x2 x3 z1 z2
    1 A35  8 15  9 NA
    2 A41 24 63 24 32
    3 A49 33 54 NA NA
    
  • 0

    建议不要使用长度不等的数据帧,使用列表应该更好地用于此目的 .

    我创建了一个完成工作的代码,尽管我不确定它是最有效的方法 .

    首先,您需要比较两个数据帧的每一行 . 这可以使用apply函数中的apply函数来完成(基本上:对于df1中的每一行,与df2中的每一行进行比较)并返回匹配值及其索引 .

    这将存储在一个杂乱的列表中,其中包含不匹配的空元素 . 因此,在清理列表后,可以使用sapply函数将生成的匹配添加到df1的每个单独行 .

    df1 <- data.frame(x1=c("A35", "A41", "A49"),
                  x2=c(8, 24, 33),
                  x3=c(15, 63, 54))
    
    df2 <- data.frame(y1=c("A35", "A38", "A41", "A41", "A49"),
                      y2 = c(9, 20, 24, 32, 84))
    
    matches <- apply(df2,1,function(x) apply(df1,1,function(y) 
      if(x[1]==y[1]&&x[2]>=y[2]&&x[2]<=y[3]){
        c(which(df1==x[1]),x[1:2])
      }))
    addedelem <- t(array(unlist(matches), dim=c(3,length(unlist(matches))/3)))
    result <- sapply(1:length(df1$x1), function(x) (c(as.matrix(df1[x,]),t(addedelem[which(addedelem[,1]==x),2:3]))))
    

    结果列表是您正在寻找的 . 如果有必要,您可以将其重新转换为数据帧 .

    > result
    [[1]]
    [1] "A35" "8"   "15"  "A35" " 9" 
    
    [[2]]
    [1] "A41" "24"  "63"  "A41" "24"  "A41" "32" 
    
    [[3]]
    [1] "A49" "33"  "54"
    

相关问题