首页 文章

在data.table中通过id将数字变量转换为字符变量

提问于
浏览
1

我有两个数据集,我试图找出每组的第一个观察结果 . 在以下示例中,您可以看到第一个数据集(“df1”)中的“id”分组按预期工作(case1) . 当我在第二个数据集(df2)(case2a)中按“id2”分组时,它也有效 . 但是,当我在第二个数据集(case2b)中按“id1”分组时,它不起作用(如预期的那样) . 令人惊讶的是,当我将“id1”转换为字符向量时,我得到了预期的输出 .

#case1
df1<- structure(list(id = c(1, 1, 1, 2, 2, 2, 3, 3, 3), stopId = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
    stopSequence = c(1, 2, 3, 3, 1, 4, 3, 1, 2)), .Names = c("id", 
"stopId", "stopSequence"), row.names = c(NA, -9L), class = "data.frame")

# first observation of each id: 

setDT(df)[,.SD[1,],by=.(id)] #worked

#df2
df2<-structure(list(id1 = c(201601072952201, 201601072952201, 201601072952201, 
201601072952213, 201601072952213, 201601072952213, 201601072952212, 
201601072952212, 201601072952212, 201601072952176), id2 = c("TXT", 
"TXT", "TXT", "TXT", "TXT", "TXT", "PLP", "PLP", "PLP", "KYK"
), sb = c(32L, 32L, 32L, 32L, 32L, 32L, 58L, 58L, 58L, 6L), bb = c(7L, 
7L, 7L, 56L, 56L, 56L, 28L, 28L, 28L, 47L), qt = c(21, 21, 21, 
420, 420, 420, 1000, 1000, 1000, 13), amt = c(301, 301, 301, 
306, 306, 306, 515, 515, 515, 368), rate = c(6321, 6321, 6321, 
128520, 128520, 128520, 515000, 515000, 515000, 4784)), .Names = c("id1", 
"id2", "sb", "bb", "qt", "amt", "rate"), class = "data.frame", row.names = c(NA, 
-10L))
#case2a
setDT(df2)[,.SD[1,],by=.(id2)] #worked
   id2             id1 sb bb   qt amt   rate
1: TXT 201601072952201 32  7   21 301   6321
2: PLP 201601072952212 58 28 1000 515 515000
3: KYK 201601072952176  6 47   13 368   4784

#case2b
 setDT(df2)[,.SD[1,],by=.(id1)] #not worked as expected
               id1 id2 sb bb qt amt rate
1: 201601072952201 TXT 32  7 21 301 6321

df2$id1<-as.character(df2$id1)
 setDT(df2)[,.SD[1,],by=.(id1)] # worked

所以我的问题是为什么我需要在数据块2b中将数字变量转换为字符变量,而不是在case1中 .

1 回答

  • 1

    尝试使用 base 中的标准函数 . 例如:

    df2[!duplicated(df2$id1),]
    

    输出:

    id1 id2 sb bb   qt amt   rate
    1: 2.016011e+14 TXT 32  7   21 301   6321
    2: 2.016011e+14 TXT 32 56  420 306 128520
    3: 2.016011e+14 PLP 58 28 1000 515 515000
    4: 2.016011e+14 KYK  6 47   13 368   4784
    

相关问题