朋友们,
该场景与wifi控制器日志文件有关,其中用户活动被跟踪,就像用户“加入”,“进入”,断开“和”离开wifi网络一样 . 我已经清理并在表格中上传结构化数据 . 我想在用户级别为他/她的活动创建会话ID . 确定一个会话可以按事件“加入”和“离开”之间的所有事件进行分组 . 我附上了示例数据以便更好地理解 .
一些有助于创建会话ID的指针:
-
可以通过"joins"事件识别用户的新会话,并且_11627_事件表示结束 .
-
"disconnects from"和"leave"具有相同的时间戳,因此如果按时间戳排序,则顺序可以是随机的 .
每个"disconnects from" -
都有"leave"的条目,但反之亦然 . 因此,我们可以在没有"disconnects from"记录的情况下使用"leave" .
-
"in"表示漫游,因此它应该是同一会话的一部分 . 它将在"joins"和"leave"事件之间 .
-
您可能只看到记录只有"in"或说"disconnects from"因为完整的日志文件不可用,仍然希望将这些记录标记为不同的会话ID .
请帮我创建会话ID,考虑给定的指针 . 以下是样本数据,几乎没有可能的场景,以便更好地理解我在寻找什么 . 已经为“必需数据”分配了虚拟会话ID,这应该是它的外观 . 发布我的任何进一步澄清 . 我想用SQL实现这一点 .
场景1
原始数据:
usermac datetime event
xx:xx:xx:xx:0e:72 2018-04-19 09:30:58.000 joins
xx:xx:xx:xx:0e:72 2018-04-19 09:51:54.000 in
xx:xx:xx:xx:0e:72 2018-04-19 09:51:54.000 in
xx:xx:xx:xx:0e:72 2018-04-19 12:01:39.000 disconnects from
xx:xx:xx:xx:0e:72 2018-04-19 12:01:39.000 leave
xx:xx:xx:xx:0e:72 2018-04-19 12:46:05.000 joins
xx:xx:xx:xx:0e:72 2018-04-19 15:27:28.000 disconnects from
xx:xx:xx:xx:0e:72 2018-04-19 15:27:28.000 leave
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:0e:72 2018-04-19 09:30:58.000 joins 1
xx:xx:xx:xx:0e:72 2018-04-19 09:51:54.000 in 1
xx:xx:xx:xx:0e:72 2018-04-19 09:51:54.000 in 1
xx:xx:xx:xx:0e:72 2018-04-19 12:01:39.000 disconnects from 1
xx:xx:xx:xx:0e:72 2018-04-19 12:01:39.000 leave 1
xx:xx:xx:xx:0e:72 2018-04-19 12:46:05.000 joins 2
xx:xx:xx:xx:0e:72 2018-04-19 15:27:28.000 disconnects from 2
xx:xx:xx:xx:0e:72 2018-04-19 15:27:28.000 leave 2
场景2
原始数据:
usermac datetime event
xx:xx:xx:xx:xx:80 2018-04-12 21:35:48.000 joins
xx:xx:xx:xx:xx:80 2018-04-12 21:40:01.000 leave
xx:xx:xx:xx:xx:80 2018-04-12 21:40:01.000 disconnects from
xx:xx:xx:xx:xx:80 2018-04-12 21:40:08.000 joins
xx:xx:xx:xx:xx:80 2018-04-12 21:43:45.000 leave
xx:xx:xx:xx:xx:80 2018-04-12 21:43:45.000 disconnects from
xx:xx:xx:xx:xx:80 2018-04-12 21:44:00.000 joins
xx:xx:xx:xx:xx:80 2018-04-12 22:01:29.000 leave
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:xx:80 2018-04-12 21:35:48.000 joins 3
xx:xx:xx:xx:xx:80 2018-04-12 21:40:01.000 leave 3
xx:xx:xx:xx:xx:80 2018-04-12 21:40:01.000 disconnects from 3
xx:xx:xx:xx:xx:80 2018-04-12 21:40:08.000 joins 4
xx:xx:xx:xx:xx:80 2018-04-12 21:43:45.000 leave 4
xx:xx:xx:xx:xx:80 2018-04-12 21:43:45.000 disconnects from 4
xx:xx:xx:xx:xx:80 2018-04-12 21:44:00.000 joins 5
xx:xx:xx:xx:xx:80 2018-04-12 22:01:29.000 leave 5
场景3
原始数据:
usermac datetime event
xx:xx:xx:xx:xx:92 2018-04-23 09:15:18.000 joins
xx:xx:xx:xx:xx:92 2018-04-23 11:02:30.000 in
xx:xx:xx:xx:xx:92 2018-04-23 11:02:30.000 in
xx:xx:xx:xx:xx:92 2018-04-23 11:06:09.000 disconnects from
xx:xx:xx:xx:xx:92 2018-04-23 11:06:09.000 leave
xx:xx:xx:xx:xx:92 2018-04-23 11:07:03.000 joins
xx:xx:xx:xx:xx:92 2018-04-23 13:35:51.000 leave
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:xx:92 2018-04-23 09:15:18.000 joins 6
xx:xx:xx:xx:xx:92 2018-04-23 11:02:30.000 in 6
xx:xx:xx:xx:xx:92 2018-04-23 11:02:30.000 in 6
xx:xx:xx:xx:xx:92 2018-04-23 11:06:09.000 disconnects from 6
xx:xx:xx:xx:xx:92 2018-04-23 11:06:09.000 leave 6
xx:xx:xx:xx:xx:92 2018-04-23 11:07:03.000 joins 7
xx:xx:xx:xx:xx:92 2018-04-23 13:35:51.000 leave 7
场景4
原始数据:
usermac datetime event
xx:xx:xx:xx:xx:5f 2018-06-23 07:03:24.000 disconnects from
xx:xx:xx:xx:xx:5f 2018-06-23 07:03:24.000 leave
xx:xx:xx:xx:xx:5f 2018-06-23 07:05:45.000 joins
xx:xx:xx:xx:xx:5f 2018-06-23 07:05:48.000 leave
xx:xx:xx:xx:xx:5f 2018-06-23 07:05:48.000 disconnects from
xx:xx:xx:xx:xx:5f 2018-06-23 07:31:59.000 joins
xx:xx:xx:xx:xx:5f 2018-06-23 07:32:02.000 leave
xx:xx:xx:xx:xx:5f 2018-06-23 07:32:02.000 disconnects from
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:xx:5f 2018-06-23 07:03:24.000 disconnects from 8
xx:xx:xx:xx:xx:5f 2018-06-23 07:03:24.000 leave 8
xx:xx:xx:xx:xx:5f 2018-06-23 07:05:45.000 joins 9
xx:xx:xx:xx:xx:5f 2018-06-23 07:05:48.000 leave 9
xx:xx:xx:xx:xx:5f 2018-06-23 07:05:48.000 disconnects from 9
xx:xx:xx:xx:xx:5f 2018-06-23 07:31:59.000 joins 10
xx:xx:xx:xx:xx:5f 2018-06-23 07:32:02.000 leave 10
xx:xx:xx:xx:xx:5f 2018-06-23 07:32:02.000 disconnects from 10
场景5
原始数据:
usermac datetime event
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:14.000 in
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:14.000 in
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:40.000 in
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:40.000 in
xx:xx:xx:xx:xx:b2 2018-06-12 07:23:39.000 leave
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:14.000 in 11
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:14.000 in 11
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:40.000 in 11
xx:xx:xx:xx:xx:b2 2018-06-12 07:22:40.000 in 11
xx:xx:xx:xx:xx:b2 2018-06-12 07:23:39.000 leave 11
场景6
原始数据:
usermac datetime event
xx:xx:xx:xx:xx:16 2018-06-23 07:55:26.000 in
xx:xx:xx:xx:xx:16 2018-06-23 07:55:26.000 in
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:xx:16 2018-06-23 07:55:26.000 in 12
xx:xx:xx:xx:xx:16 2018-06-23 07:55:26.000 in 12
场景7
原始数据:
usermac datetime event
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:30.000 rejoins
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:51.000 rejoins
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:59.000 leave
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:59.000 disconnects from
xx:xx:xx:xx:xx:m6 2018-06-24 06:57:03.000 joins
所需格式:
usermac datetime event sessionid
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:30.000 rejoins 13
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:51.000 rejoins 13
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:59.000 leave 13
xx:xx:xx:xx:xx:m6 2018-06-24 06:56:59.000 disconnects from 13
xx:xx:xx:xx:xx:m6 2018-06-24 06:57:03.000 joins 14
事情到现在为止:
with cte1 as ( SELECT row_number() over(partition by usermac ORDER BY datetime) indexno, row_number() over(partition by event ORDER BY usermac, datetime) sessionid, datetime, usermac, event from t8_wfc_log_apr_jun_syslog_cast where usermac='xx:xx:xx:xx:16' order by usermac, datetime ) SELECT case when event = 'joins' then sessionid else (select max(sessionid) from cte1 innie where innie.usermac = oute.usermac and innie.indexno < oute.indexno and innie.event = 'joins') end sessionid, datetime, usermac, event FROM cte1 oute order by usermac, datetime;