我有一个我正在使用的示例数据框
Datetime <- c("2015-09-29 08:22:00", "2015-09-29 09:45:00", "2015-09-29 09:53:00", "2015-09-29 10:22:00", "2015-09-29 10:42:00",
"2015-09-29 11:31:00", "2015-09-29 11:47:00", "2015-09-29 12:45:00", "2015-09-29 13:11:00", "2015-09-29 13:44:00",
"2015-09-29 15:24:00", "2015-09-29 16:28:00", "2015-09-29 20:22:00", "2015-09-29 21:38:00", "2015-09-29 23:34:00")
Measurement <- c("Length","Length","Width","Height","Width","Height","Length","Width","Width","Height","Width","Length",
"Length","Height","Height")
PASSFAIL <- c("PASS","PASS","FAIL","PASS","PASS","FAIL_AVG_HIGH","FAIL#Pts","FAIL","FAIL_AVG_LOW","FAIL","PASS","PASS","FAIL#RNG#HIGH","PASS","FAIL")
df1 <- data.frame(Datetime,Measurement,PASSFAIL)
DF1
Datetime Measurement PASSFAIL
1 2015-09-29 08:22:00 Length PASS
2 2015-09-29 09:45:00 Length PASS
3 2015-09-29 09:53:00 Width FAIL
4 2015-09-29 10:22:00 Height PASS
5 2015-09-29 10:42:00 Width PASS
6 2015-09-29 11:31:00 Height FAIL_AVG_HIGH
7 2015-09-29 11:47:00 Length FAIL#Pts
8 2015-09-29 12:45:00 Width FAIL
9 2015-09-29 13:11:00 Width FAIL_AVG_LOW
10 2015-09-29 13:44:00 Height FAIL
11 2015-09-29 15:24:00 Width PASS
12 2015-09-29 16:28:00 Length PASS
13 2015-09-29 20:22:00 Length FAIL#RNG#HIGH
14 2015-09-29 21:38:00 Height PASS
15 2015-09-29 23:34:00 Height FAIL
我正在研究一个有趣的问题,以便在一天中的12 AM-12 PM和12 PM-12 AM(第二天)找到每次测量的失败率 .
注意:在df1中,PASSFAIL列中具有FAIL的任何内容都被视为失败 .
Fail Rate = (Number of Fails)/(Number of Fails + Number of Pass)
我想要的输出是这样的
Datetime FailRate_length Total_length FailRate_Width Total_Width FailRate_Height Total_Height
1 2015-09-29 00:00:00 AM 0.33 3 0.50 2 0.50 2
2 2015-09-29 12:00:00 PM 0.50 2 0.66 3 0.66 3
我正在尝试使用dplyr和data.table包来解决这个问题,但我只是不知道如何在df1中划分时间间隔以获得具有2个值的df2 - > 12AM(df1的前7次观察)和12PM(The df1)中的下8个观测值 . 有人可以帮我吗?
2 回答
使用data.table ...
这使
语法为
DT[i,j,by]
,其中by
用于分组变量;和j
用于处理列 .:=
:=
内创建新列 .重塑OP的期望输出......
这使
感谢@Arun,这是一个完成所有操作的方法:
这使
列名称是从
~
部分中的根变量和每个函数定义的第一个字自动生成的 .A dplyr + tidyr equivalent (略有不同的分箱,虽然上面的那个很优雅):
gather
,unite
,spread
序列是dcast
的tidyr等价物 . 注意