亲爱的stackoverflow社区,
我仍然是R的初学者,遇到了以下问题,我无法在stackoverflow或更广泛的网络上找到解决方案 . 对我来说似乎很直接,但我不知道我错过了什么或者我违反了哪些编码约定 . 下面的问题是更大功能的一部分,但下面的示例再现了该问题 .
我有两个数据框a和b,并希望在使用嵌套ifelse语句的情况下创建一个新变量foo1,其中条件基于a和b中的元素 .
a <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
b <- data.frame(foo=c(1,NA,2,3), bar=c(1,2,3,4))
a <- mutate(a, foo1 = ifelse(is.na(b$foo[1]), NA,
ifelse(a$foo == "a", "a", "f")))
我期望或正在寻找的是:第一个ifelse语句检查b的第一行中的值是否为NA . 因为它不是在这种情况下,它应该跳到第二个ifelse语句并给我
a <- data.frame(foo=c(1,NA,2,3), bar=c(1,2,3,4), foo1=c("a","f","f","f"))
因为$ foo的第一行是a而其他的不是a(b,c,d) .
相反,它给了我什么
a <- data.frame(foo=c(1,NA,2,3), bar=c(1,2,3,4), foo1=c("a","a","a","a"))
它在foo1的所有行中打印“a”,而不是识别应该为第2行到第4行分配else语句,从而分配“f” . 这是由于ifelse条件的不同维度,即第一个ifelse条件是基于单个元素,而第二个应该单独评估$ foo的每一行,这似乎没有 .
此处未显示的较大函数在第一个ifelse循环内使用is.na()条件 . 但是,我怀疑它不是由于is.na声明,而是更可能是因为我使用两个ifelse条件来调用来自两个不同数据帧的元素 .
UPDATE: Prem的将rowwise()添加到管道的解决方案修复了上面给出的简化示例的问题,但遗憾的是不是更复杂的示例 . 更复杂的示例使用lapply将函数应用于数据帧列表(a,b,c和d) . 如上面的简化示例所示,它使用第二个数据帧作为第一个ifelse语句的查找表 . 这是代码:
a <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
b <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
c <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
d <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
LookUp <- data.frame(foo=c(1,NA,2,3), bar=c("a","b","c","d"))
List <- list(a,b,c,d)
names(List) <- c("a","b","c","d")
library(dplyr)
List2 <- lapply(seq_along(List), function(i) {
a <- filter(LookUp, bar == names(List[i]))
temp <- List[[i]] %>%
rowwise() %>%
mutate(foo1 = ifelse(is.na(LookUp$foo[1]), NA,
ifelse(List[[i]]$foo == "a", "a", "f"))) %>%
data.frame()
} )
现在,对于所有数据帧,新列foo1中的所有值都被赋值为“a” . 我想要的是除列表元素2之外的所有列表元素的foo1 = c(“a”,“f”,“f”,“f”),它应该给出foo1 = c(NA,NA,NA,NA)给出第一个ifelse声明 .
此外,我列表中的一些数据框非常大 . Rowwise()大大减慢了函数的速度 . 是否有更好/更快的方式来编码我的功能?
UPDATE 2: 对于让这个问题更加复杂,我深表歉意 . 使用 Map()
的Prem的第二个解决方案为我给出的例子带来了魅力 . 不幸的是,我在更复杂的例子中犯了一个错误 . 我指定在第一个 ifelse
语句中使用 is.na(LookUp$foo[1])
而不是 is.na(a$foo[1])
. a是子集查找表,用于存储有关列表中每个元素的变量名称的信息 . 但是,如果我将代码更改为 is.na(a$foo[1])
,则 Map()
解决方案不再有效,因为该函数未指定如何循环通过i . 我希望代码为每个函数运行以不同的方式对查找表进行子集化 . 因此, b$bar
的更新值应为 c(NA,NA,NA,NA)
以下代码是更新后的代码 .
a <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
b <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
c <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
d <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
LookUp <- data.frame(foo=c(1,NA,2,3), bar=c("a","b","c","d"))
List <- list(a,b,c,d)
names(List) <- c("a","b","c","d")
library(dplyr)
List2 <- lapply(seq_along(List), function(i) {
LookUp2 <- filter(LookUp, bar == names(List[i]))
temp <- List[[i]] %>%
rowwise() %>%
mutate(foo1 = ifelse(is.na(LookUp2$bar[1]), NA,
ifelse(List[[i]]$foo == "a", "a", "f"))) %>%
data.frame()
} )
我尝试添加名称作为第二个向量,允许我按照本文How do I extract the index or name of the list item within FUN of lapply?中的建议动态更改我的函数,但没有成功 . 它继续在列表元素之内和之间给出相同的行值 . 感谢您的帮助和耐心 .
a <- data.frame(foo=c("a","b","c","d"), bar=c("e","f","g","h"))
b <- data.frame(foo=c("a","a","c","d"), bar=c("e","f","g","h"))
c <- data.frame(foo=c("a","a","a","d"), bar=c("e","f","g","h"))
d <- data.frame(foo=c("a","b","a","d"), bar=c("e","f","g","h"))
LookUp <- data.frame(foo=c(1,NA,2,3), bar=c("a","b","c","d"))
List <- list(a,b,c,d)
names(List) <- c("a","b","c","d")
library(dplyr)
List_new <- Map(function(x, name) {
i = which(LookUp$bar == name)
Lookup2 <- filter(LookUp, bar == names(List[i]))
x %>%
rowwise() %>%
mutate(foo1=ifelse(is.na(Lookup2$bar[1]), NA,
ifelse(foo == "a", "a", "f")))
}, List, names(List))
List_new
任何帮助将非常感谢 .
1 回答
希望这可以帮助!
输出是:
UPDATE: 解决了增加的要求