首页 文章

R Studio版本0.98.1087中dplyr中group_by函数的奇怪行为

提问于
浏览
0

我是R新手,正在研究RStudio中的数据帧'damageData' . 数据框简要概述:

>str(damageData)  
'data.frame':    902297 obs. of  9 variables:
  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
  $ PROPDMGEXP: num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
  $ CROPDMGEXP: num  0 0 0 0 0 0 0 0 0 0 ...
  $ Property  : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
  $ Crops     : num  0 0 0 0 0 0 0 0 0 0 ...

> head(damageData, 10)
      EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
 1  TORNADO          0       15    25.0       1000       0          0
 2  TORNADO          0        0     2.5       1000       0          0
 3  TORNADO          0        2    25.0       1000       0          0
 4  TORNADO          0        2     2.5       1000       0          0
 5  TORNADO          0        2     2.5       1000       0          0
 6  TORNADO          0        6     2.5       1000       0          0
 7  TORNADO          0        1     2.5       1000       0          0
 8  TORNADO          0        0     2.5       1000       0          0
 9  TORNADO          1       14    25.0       1000       0          0
 10 TORNADO          0        0    25.0       1000       0          0
    Property Crops
 1     25000     0
 2      2500     0
 3     25000     0
 4      2500     0
 5      2500     0
 6      2500     0
 7      2500     0
 8      2500     0
 9     25000     0
 10    25000     0

我想通过EVTYPE对数据框进行分组 . 当我使用dplyr包和'group_by(EVTYPE)'后跟总结(TotalInjuries = sum(INJURIES),TotalFatalities = sum(FATALITIES))时,数据框不按EVTYPE分组 . 相反,我得到以下结果:

TotalInjuries TotalFatalities 1 140528 15145

我尝试将EVTYPE从'factor'更改为'character',但仍然得到相同的结果 . 请帮我解决这个奇怪的问题!

1 回答

  • 1

    如果没有可重复的例子,很难准确地说出发生了什么 . 您可能错误地使用dplyr语法?见下文:

    damageData <- data.frame(
      EVTYPE = factor(c("Y","N","Y","N","Y","N","Y","N","Y","N")),
      FATALITIES = c(0,0,0,0,0,0,0,0,1,0),
      INJURIES = c(15,0,2,2,2,6,1,0,14,0))
    
    str(damageData)
    
    library(dplyr)
    
    damageData %>%
      group_by( EVTYPE ) %>%
      summarize( TotalInjuries=sum(INJURIES),
                 TotalFatalities=sum(FATALITIES))
    

    我得到以下内容

    Source: local data frame [2 x 3]  
    
      EVTYPE TotalInjuries TotalFatalities  
    1      N             8               0  
    2      Y            34               1
    

相关问题