首页 文章

R根据时间间隔通过线性增加来计算NA

提问于
浏览
2

PROBLEM

我需要从最后一次观察到的值开始,每52周间隔用最后一次观察到的非NA值1来估算NA 's in my data frame that comes from a repeated measures study. On this particular outcome, I need to impute the NA' .

EXAMPLE

包含目标插补目标的示例数据框 .

df <- data.frame(
  subject = rep(1:3, each = 12),
  week = rep(c(8, 10, 12, 16, 20, 26, 32, 44, 52, 64, 78, 104),3),
  value = c(112, 97, 130, 104, NA, NA, NA, NA, NA, NA, NA, NA,
            89, 86, 94, 96, 88,107, 110, 102, 107, NA, NA, NA,
            107, 110, 102, 130, 104, 88, 82, 79, 92, 106, NA, NA),
  goal = c(112, 97, 130, 104, 104, 104, 104, 104, 104, 104, 105, 105,
            89, 86, 94, 96, 88,107, 110, 102, 107, 107,107, 108,
            107, 110, 102, 130, 104, 88, 82, 79, 92, 106, 106, 106)
)

2 回答

  • 2

    我离开了中间列,使发生的事情变得更加明显,但你可以用一个简单的 select 删除它们 .

    df = df %>%
      group_by(subject) %>%
      mutate(last_obs_week = max(week[!is.na(value)]),
             since_last_week = pmax(0, week - last_obs_week),
             inc_52 = since_last_week %/% 52,
             result = zoo::na.locf(value) + inc_52
      ) 
    
    all(df$goal == df$result)
    # [1] TRUE
    
    print.data.frame(df)
    #    subject week value goal last_obs_week since_last_week inc_52 result
    # 1        1    8   112  112            16               0      0    112
    # 2        1   10    97   97            16               0      0     97
    # 3        1   12   130  130            16               0      0    130
    # 4        1   16   104  104            16               0      0    104
    # 5        1   20    NA  104            16               4      0    104
    # 6        1   26    NA  104            16              10      0    104
    # 7        1   32    NA  104            16              16      0    104
    # 8        1   44    NA  104            16              28      0    104
    # 9        1   52    NA  104            16              36      0    104
    # 10       1   64    NA  104            16              48      0    104
    # 11       1   78    NA  105            16              62      1    105
    # 12       1  104    NA  105            16              88      1    105
    # 13       2    8    89   89            52               0      0     89
    # ...
    
  • 4

    可以使用 dplyrtidyr::fill 来获得所需的结果 . 逻辑将添加一列来跟踪具有 non-NA 值的 week . 使用 tidyr::fill 填充最后 non-NA 值,然后检查当前周与上一个 non-NA 周的差异是否大于 52 然后将该值增加 1 .

    library(dplyr)
    library(tidyr)
    
    df %>% group_by(subject) %>%
      mutate(weekWithLastNonNaValue = ifelse(is.na(value), NA, week)) %>%
      fill(value, weekWithLastNonNaValue) %>%
      mutate(value = value + (week-weekWithLastNonNaValue) %/% 52) %>%
      select(-weekWithLastNonNaValue) %>%
      as.data.frame()
    
    # subject week value goal
    # 1        1    8   112  112
    # 2        1   10    97   97
    # 3        1   12   130  130
    # 4        1   16   104  104
    # 5        1   20   104  104
    # 6        1   26   104  104
    # 7        1   32   104  104
    # 8        1   44   104  104
    # 9        1   52   104  104
    # 10       1   64   104  104
    # 11       1   78   105  105
    # 12       1  104   105  105
    # 13       2    8    89   89
    # 14       2   10    86   86
    # 15       2   12    94   94
    # 16       2   16    96   96
    # 17       2   20    88   88
    # 18       2   26   107  107
    # 19       2   32   110  110
    # 20       2   44   102  102
    #
    # so on
    #
    

相关问题