我想使用dplyr,piping和approx()来插入缺失值 .
数据:
test <- structure(list(site = structure(c(3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("lake", "stream", "wetland"), class = "factor"),
depth = c(0L, -3L, -4L, -8L, 0L, -1L, -3L, -5L, 0L, -2L,
-4L, -6L), var1 = c(1L, NA, 3L, 4L, 1L, 2L, NA, 4L, 1L, NA,
NA, 4L), var2 = c(1L, NA, 3L, 4L, NA, NA, NA, NA, NA, 2L,
NA, NA)), .Names = c("site", "depth", "var1", "var2"), class = "data.frame", row.names = c(NA,
-12L))
此代码有效:
library(tidyverse)
# interpolate missing var1 values for each site using approx()
test_int <- test %>%
group_by(site) %>%
mutate_at(vars(c(var1)),
funs("i" = approx(depth, ., depth, rule=1, method="linear")[["y"]]))
但是,如果代码遇到不具有至少2个非NA值的分组(site&var),则代码不再有效,例如,
# here I'm trying to interpolate missing values for var1 & var2
test_int2 <- test %>%
group_by(site) %>%
mutate_at(vars(c(var1, var2)),
funs("i" = approx(depth, ., depth, rule=1, method="linear")[["y"]]))
R适当地抛出此错误:mutate_impl(.data,dots)中的错误:评估错误:需要至少两个非NA值进行插值 .
如何包含条件语句或过滤器,以便它只尝试插入站点至少有2个非NA值并跳过其余值或返回NA的情况?
1 回答
这将做你想要的......