我正在用鼠标进行多次插补但我很惊讶地看到没有NA的变量中的原始值被改变和扭曲 .

有关可重复的示例,请参见下文 . 我将使用mtcars(base R)并在其中嵌入2列中的随机NAs - disp和hp . 我将标记这些NA的位置 . 然后我会将缺失值归为真,并将其与原始值进行比较 . 最后,我将在散点图中绘制结果:原始值与推算值 . 我希望原始值与没有NA的列的估算值一致,因为不应该有任何插补 . 但这种情况并非如此 . 代码和图表如下:

library(data.table)
library(ggplot2)
library(mice)
data(mtcars)
setDT(mtcars)
dim(mtcars)
# 32 11
mtcars_original <- copy(mtcars)
mtcars[as.numeric(sample(row.names(mtcars), 7)), ]$hp <- NA
mtcars[as.numeric(sample(row.names(mtcars), 7)), ]$disp <- NA
mtcars[, ":="(hp_NA = ifelse(is.na(hp), 1, 0) , disp_NA = ifelse(is.na(disp), 1, 0))]
mtcars_imputed <- complete(mice(mtcars))
mtcars_imputed$disp_original <- mtcars_original$disp
mtcars_imputed$hp_original <- mtcars_original$hp

ggplot(mtcars_imputed, aes(x = disp_original, y= disp, color = as.factor(disp_NA))) +
  geom_point(size = 2) + ggtitle("Match between original and imputed values \n disp") +
  geom_smooth(method = "lm", color = "red", alpha = 0.3, size = 2) + theme_economist()

ggplot(mtcars_imputed, aes(x = hp_original, y= hp, color = as.factor(hp_NA))) +
  geom_point(size = 2) + ggtitle("Match between original and imputed values \n hp") +
  geom_smooth(method = "lm", color = "red", alpha = 0.3, size = 2) + theme_economist()

enter image description here

enter image description here

您的建议将不胜感激 .