我正在尝试编写一个函数来创建半小时的平均值(总是在整个小时和过去的30分钟),我正在使用dplyr .
我将日期列名称作为参数传递,并使用“group_by_”对数据进行分组,然后对其进行汇总 . 但是,我经常收到错误说:
Error in cut.default(colName, cuts) : 'x' must be numeric
我正在使用的代码如下 . 我的数据帧简称为“数据” .
dateColumn = "date"
measurevar = "temperature"
cuts <- seq(round(min(data[,dateColumn]), "hours")-30*60,
max(data[,dateColumn])+30*60, "30 min")
data_avg = data %>%
group_by_(dateColumn = cut(dateColumn, cuts)) %>%
summarise_at(.vars=vars(measurevar),
funs(mean = mean (., na.rm=T),
sd = sd (., na.rm=T) ))
你能帮我解决这个问题吗?
请注意,日期列是POSIXct,这是一个数据示例:
data <- structure(list(date = structure(c(1508258822, 1508258827,
1508258832, 1508258837, 1508258842, 1508258847, 1508258852, 1508258857,
1508258862, 1508258867, 1508258877, 1508259298, 1508259303, 1508259308,
1508259313, 1508259318, 1508259323, 1508259328, 1508259333, 1508259338,
1508259343, 1508259348, 1508259353, 1508259778, 1508259783, 1508259788,
1508259793, 1508259798, 1508259803, 1508259813, 1508259818, 1508259823,
1508259828, 1508259833, 1508260259, 1508260264, 1508260269, 1508260274,
1508260279, 1508260284, 1508260289, 1508260294, 1508260299, 1508260304,
1508260309, 1508260314, 1508260739, 1508260744, 1508260749, 1508260754
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), temperature = c(295.49,
295.49, 295.48, 295.47, 295.46, 295.45, 295.45, 295.45, 295.45,
295.45, 295.44, 295.24, 294.98, 295.24, 295.24, 295.24, 295.24,
295.23, 295.23, 295.21, 295.2, 295.2, 295.19, 294.93, 294.93,
294.88, 294.93, 294.93, 294.93, 294.92, 294.92, 294.91, 294.9,
294.9, 294.73, 294.72, 294.72, 294.71, 294.71, 294.71, 294.71,
294.71, 294.72, 294.71, 294.71, 294.7, 294.55, 294.55, 294.55,
294.54)), .Names = c("date", "temperature"), row.names = c(NA,
50L), class = "data.frame")
生成的“data_avg”应该类似于:
date mean sd
1 2017-10-17 16:30:00 295.46 0.1305597
2 2017-10-17 17:00:00 295.55 0.1137462
1 回答
你试过这样的吗?
见Programming with dplyr