首页 文章

某些条件下数据帧的单列中的值

提问于
浏览
1

我有两个数据帧,我想“按照某些条件组合它们” . 我的第一个数据帧具有星期几的平均Sms(avgSms)(1 =星期日,2 =星期一,3 =星期二......),小时和分钟:

df1:
       I   | Day_of_week | hour | min |    AvgSms
       #1          1          0     0      12
       #2          1          0     30     20
       #3          1          1     0      19
       #4          1          1     30     11
       #5          1          2     0      12
       #6          1          2     30     7
       ...        ...       ...    ...   ....
       #10         2          0     0      14
       #11         2          0     30     20
       #12         2          1     0      19
       #13         2          1     30     11
       ...        ...       ...    ...   ....
       #222        7         23     30     13

第二个数据帧有时间和短信,如下所示:

df2:       Time           Sms
     1 2012-01-01 00:00:00  10
     2 2012-01-01 00:30:00  11
     3 2012-01-01 01:00:00  13
     4 2012-01-01 01:30:00  10
     5 2012-01-01 02:00:00  7
     6 2012-01-01 02:30:00  3
     7 2012-01-01 03:00:00  3
     8 2012-01-01 03:30:00  2
      .......................
    400 2015-12-31 23:30:00  16

我想添加到数据帧2,dataframe1的相应avgSms取决于变量时间,无论是第二个还是第三个以及x小时和y和min .

我想要这样的东西:

Time           Sms     avg
     1 2012-01-01 00:00:00  10  12 --> 2012-01-01 was Sunday=1, h=0 and min=0
     2 2012-01-01 00:30:00  11  20
     3 2012-01-01 01:00:00  13  19
     4 2012-01-01 01:30:00  10  11
     5 2012-01-01 02:00:00  7   ..
     6 2012-01-01 02:30:00  3
     7 2012-01-01 03:00:00  3
     8 2012-01-01 03:30:00  2
      .......................
    400 2015-12-31 23:30:00  16

1 回答

  • 0

    在df2中的日期/时间格式之后考虑 merge 并使用这些字段作为by值:

    Data

    txt=' I   Day_of_week hour min     AvgSms
    1          1          0     0      12
    2          1          0     30     20
    3          1          1     0      19
    4          1          1     30     11
    5          1          2     0      12
    6          1          2     30     7
    10         2          0     0      14
    11         2          0     30     20
    12         2          1     0      19
    13         2          1     30     11
    222        7         23     30     13'
    
    df1 <- read.table(text=txt, header = TRUE)
    
    txt = '
        Time           Sms
    1 "2012-01-01 00:00:00"  10
    2 "2012-01-01 00:30:00"  11
    3 "2012-01-01 01:00:00"  13
    4 "2012-01-01 01:30:00"  10
    5 "2012-01-01 02:00:00"  7
    6 "2012-01-01 02:30:00"  3
    7 "2012-01-01 03:00:00"  3
    8 "2012-01-01 03:30:00"  2
    '
    df2 <- read.table(text=txt, header = TRUE)
    

    Date/Time Conversion

    df2$Date <- as.Date(df2$Time, format="%Y-%m-%d")
    df2$Time <- as.POSIXct(df2$Time)
    
    df2$Day_of_week <- as.integer(strftime(df2$Date,format="%w")) + 1
    df2$hour <- as.integer(strftime(df2$Time,format="%H"))
    df2$min <- as.integer(strftime(df2$Time,format="%M"))
    

    Merge

    merge(df2, df1[-1], by=c("Day_of_week", "hour", "min"), all.x=TRUE)
    
    #   Day_of_week hour min                Time Sms       Date AvgSms
    # 1           1    0   0 2012-01-01 00:00:00  10 2012-01-01     12
    # 2           1    0  30 2012-01-01 00:30:00  11 2012-01-01     20
    # 3           1    1   0 2012-01-01 01:00:00  13 2012-01-01     19
    # 4           1    1  30 2012-01-01 01:30:00  10 2012-01-01     11
    # 5           1    2   0 2012-01-01 02:00:00   7 2012-01-01     12
    # 6           1    2  30 2012-01-01 02:30:00   3 2012-01-01      7
    # 7           1    3   0 2012-01-01 03:00:00   3 2012-01-01     NA
    # 8           1    3  30 2012-01-01 03:30:00   2 2012-01-01     NA
    

相关问题