这个问题在这里已有答案:
我想根据变量的最小值对我的tbl进行子集化 .
我发现了一个使用data.table的SO帖子here . 有没有办法使用dplyr?
> glimpse(x)
Observations: 3,074,921
Variables: 9
$ sessionId <chr> "1468614023881.kvz0h9ofxbt9", "1469063434066.e9h65wdygb9", "1469240810386.2k47r07tx1or", "146933076...
$ dateHour <chr> "2016080106", "2016080118", "2016080119", "2016080120", "2016080108", "2016080106", "2016080117", "...
$ minute <ord> 25, 10, 30, 38, 32, 12, 42, 32, 42, 39, 32, 20, 0, 4, 39, 46, 54, 32, 46, 46, 33, 53, 51, 2, 22, 36...
$ userType <chr> "New Visitor", "New Visitor", "New Visitor", "New Visitor", "New Visitor", "New Visitor", "Returnin...
$ region <chr> "Virginia", "Washington", "Chihuahua", "Missouri", "Nevada", "Minnesota", "Oklahoma", "(not set)", ...
$ metro <chr> "Roanoke-Lynchburg VA", "Seattle-Tacoma WA", "(not set)", "Joplin MO-Pittsburg KS", "Reno NV", "Min...
$ city <chr> "Roanoke", "Camano Island", "Ciudad Juarez", "Joplin", "Reno", "Owatonna", "Edmond", "Port-au-Princ...
$ sessions <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ dhm <chr> "201608010625", "201608011810", "201608011930", "201608012038", "201608010832", "201608010612", "20...
dhm变量是dateHour和minute列的串联 . 我的数据有一些重复的会话ID,我想检索行,在重复的情况下,我得到了基于min(dhm)的sessionId的最早条目 .
1 回答
每个会话分组数据并按dhm排列 . 然后只筛选出第一行(每个会话)
或者在评论中指出