首页 文章

R中MI数据的描述性统计:取3

提问于
浏览
1

作为R初学者,我发现很难弄清楚如何计算多重估算数据的描述性统计数据(比运行一些其他基本分析更重要,例如关联和回归) .

这些类型的问题以道歉(Descriptive statistics (Means, StdDevs) using multiply imputed data: R)开头,但尚未得到回答(https://stats.stackexchange.com/questions/296193/pooling-basic-descriptives-from-several-multiply-imputed-datasets-using-mice)或很快被投下 .

以下是对mouseadds函数(https://www.rdocumentation.org/packages/miceadds/versions/2.10-14/topics/stats0)的描述,我发现很难跟踪以mids格式存储的数据 .

我已经使用摘要(完整(imp))获得了一些输出,例如均值,中位数,最小值,最大值,但是我想知道如何获得额外的汇总输出(例如,偏斜/峰度,标准偏差,方差) .

从上面的上一张海报中借来的插图:

> imp <- mice(nhanes, seed = 23109)

    iter imp variable
    1   1  bmi  hyp  chl
    1   2  bmi  hyp  chl
    1   3  bmi  hyp  chl
    1   4  bmi  hyp  chl
    1   5  bmi  hyp  chl
    2   1  bmi  hyp  chl
    2   2  bmi  hyp  chl
    2   3  bmi  hyp  chl

  > summary(complete(imp))
   age         bmi        hyp         chl     
   1:12   Min.   :20.40   1:18   Min.   :113  
   2: 7   1st Qu.:24.90   2: 7   1st Qu.:186  
   3: 6   Median :27.40          Median :199  
          Mean   :27.37          Mean   :194  
          3rd Qu.:30.10          3rd Qu.:218  
          Max.   :35.30          Max.   :284

有人会花时间来说明如何使用mids对象获取基本描述吗?

1 回答

  • 5

    以下是您可以采取的一些步骤,以便更好地了解每个步骤后R对象发生的情况 . 我还建议你看看这个教程:https://gerkovink.github.io/miceVignettes/

    library(mice)
    
    # nhanes object is just a simple dataframe: 
    data(nhanes)
    str(nhanes)
    #'data.frame':  25 obs. of  4 variables:
    #  $ age: num  1 2 1 3 1 3 1 1 2 2 ...
    #$ bmi: num  NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
    #$ hyp: num  NA 1 1 NA 1 NA 1 1 1 NA ...
    #$ chl: num  NA 187 187 NA 113 184 118 187 238 NA ...
    
    # you can generate multivariate imputation using mice() function
    imp <- mice(nhanes, seed=23109)
    
    #The output variable is an object of class "mids" which you can explore using str() function
    str(imp)
    # List of 17
    # $ call           : language mice(data = nhanes)
    # $ data           :'data.frame':  25 obs. of  4 variables:
    #   ..$ age: num [1:25] 1 2 1 3 1 3 1 1 2 2 ...
    # ..$ bmi: num [1:25] NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
    # ..$ hyp: num [1:25] NA 1 1 NA 1 NA 1 1 1 NA ...
    # ..$ chl: num [1:25] NA 187 187 NA 113 184 118 187 238 NA ...
    # $ m              : num 5
    # ...
     # $ imp            :List of 4
      #..$ age: NULL
      #..$ bmi:'data.frame':    9 obs. of  5 variables:
      #.. ..$ 1: num [1:9] 28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
      #.. ..$ 2: num [1:9] 27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
      #.. ..$ 3: num [1:9] 22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
      #.. ..$ 4: num [1:9] 27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
      #.. ..$ 5: num [1:9] 28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
    #...
    
    
    #You can extract individual components of this object using $, for example
    #To view the actual imputation for bmi column
    imp$imp$bmi
    #       1    2    3    4    5
    # 1  28.7 27.2 22.5 27.2 28.7
    # 3  30.1 30.1 30.1 22.0 28.7
    # 4  22.7 27.2 20.4 22.7 20.4
    # 6  24.9 25.5 22.5 21.7 21.7
    # 10 30.1 29.6 27.4 25.5 25.5
    # 11 35.3 26.3 22.0 27.2 22.5
    # 12 27.5 26.3 26.3 24.9 22.5
    # 16 29.6 30.1 27.4 30.1 25.5
    # 21 33.2 30.1 35.3 22.0 22.7
    
    # The above output is again just a regular dataframe:
    str(imp$imp$bmi)
    # 'data.frame':  9 obs. of  5 variables:
    #   $ 1: num  28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
    # $ 2: num  27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
    # $ 3: num  22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
    # $ 4: num  27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
    # $ 5: num  28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
    
    # complete() function returns imputed dataset:
    mat <- complete(imp)
    
    # The output of this function is a regular data frame:
    str(mat)
    # 'data.frame':  25 obs. of  4 variables:
    # $ age: num  1 2 1 3 1 3 1 1 2 2 ...
    # $ bmi: num  28.7 22.7 30.1 22.7 20.4 24.9 22.5 30.1 22 30.1 ...
    # $ hyp: num  1 1 1 2 1 2 1 1 1 1 ...
    # $ chl: num  199 187 187 204 113 184 118 187 238 229 ...
    
    # So you can run any descriptive statistics you need with this object
    # Just like you would do with a regular dataframe:
    > summary(mat)
    # age            bmi             hyp            chl       
    # Min.   :1.00   Min.   :20.40   Min.   :1.00   Min.   :113.0  
    # 1st Qu.:1.00   1st Qu.:24.90   1st Qu.:1.00   1st Qu.:187.0  
    # Median :2.00   Median :27.50   Median :1.00   Median :204.0  
    # Mean   :1.76   Mean   :27.48   Mean   :1.24   Mean   :204.9  
    # 3rd Qu.:2.00   3rd Qu.:30.10   3rd Qu.:1.00   3rd Qu.:229.0  
    # Max.   :3.00   Max.   :35.30   Max.   :2.00   Max.   :284.0
    

相关问题