如何计算R中长达一年的数据集的8小时滚动平均值？-Java 学习之路

-1

我正在尝试重新采样来自此来源的每小时臭氧测量数据集 - https://aqs.epa.gov/aqsweb/airdata/hourly_44201_2016.zip

这是数据的负责人：

structure(list(date_time = structure(c(1456844400, 1456848000, 
1456851600, 1456855200, 1456858800, 1456862400, 1456866000, 1456869600, 
1456873200, 1456880400, 1456884000, 1456887600, 1456891200, 1456894800, 
1456898400, 1456902000, 1456905600, 1456912800, 1456916400, 1456920000, 
1456923600, 1456927200, 1456930800, 1456934400, 1456938000, 1456941600, 
1456945200, 1456948800, 1456952400, 1456956000), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Sample.Measurement = c(0.041, 0.041, 
0.042, 0.041, 0.038, 0.038, 0.036, 0.035, 0.029, 0.026, 0.03, 
0.03, 0.028, 0.027, 0.025, 0.023, 0.025, 0.034, 0.036, 0.038, 
0.041, 0.042, 0.043, 0.043, 0.041, 0.033, 0.01, 0.01, 0.011, 
0.007)), .Names = c("date_time", "Sample.Measurement"), row.names = c(NA, 
30L), class = "data.frame")

我结合了本地日期和时间列来使用Lubridate创建日期时间：

df$date_time = ymd_hm(paste(df$Date.Local, df$Time.Local))

我当时想要做的是将Sample.Measurement数据重新采样为一个8小时的滚动平均值 . 从那里我想要选择每天的最大值 .

在Pandas中，使用resample（）方法这将是微不足道的 .

我如何在R-Dplyr中做到这一点？

1 回答

您可以使用 zoo 包中的 rollmean 和来自 dplyr 的 group_by 和 summarise ，如下所示 . 编辑答案，以便获得每天和每月的最大值 . 如果您的数据涵盖超过一年，也可以创建一个 year 列（只需在调用 mutate 时取消注释第三行），然后创建 group_by day ， month 和 year .

library(zoo)
library(dplyr)
library(lubridate)
df %>% 
 mutate(day = as.factor(day(date_time)),
        month = as.factor(month(date_time),
        #year = as.factor(year(date_time)),
        rolling_mean = rollmean(.$Sample.Measurement,
                                k = 8,
                                fill = NA,
                                align = "center")) %>% 
 group_by(day, month) %>% 
 summarise(max_day = max(rolling_mean, na.rm = TRUE)) %>% 
 ungroup()
 # A tibble: 2 x 3
   day   month max_day
 <fct> <fct>   <dbl>
 1 1     3      0.0390
 2 2     3      0.0398

参数 align = "center" 是默认值，因此不必要 . 我只是想让你注意到你的结果可能取决于它 .

回复于 2024-05-14T23:12:56+08:00

如何计算R中长达一年的数据集的8小时滚动平均值？

1 回答

相关问题