运行客户ID Bigquery的外观计数-Java 学习之路

这里有类似的问题，但要么我无法弄清楚如何转换到我的情况（可能）或他们不是那么相似但阅读接近我想做的事情（BigQuery: How to calculate the running count of distinct visitors for each day and category?）

无论如何...

我在bigquery中有一个订单表，它有许多列 Headers ，我需要使用它们，但我会在这里列出其中一些

orderID，customerID，transactionDate，Revenue

（我需要获得所有领域）

我想在表中计算出客户ID的实例作为新列，所以如果我有3个订单，而我的客户ID是1234，那么数据表中的第一个实例在新列中将为1，第二名是2，第三名是3

例如，说我的数据看起来像这样

> OrderID     ||    CustomerID    ||    TransactionDate    ||    Revenue
> 1           ||    1             ||     01/01/15          ||     £20 
> 2           ||    2             ||     01/01/15          ||     £20 
> 3           ||    3             ||     01/01/15          ||     £20 
> 4           ||    1             ||     01/01/15          ||     £20 
> 5           ||    1             ||     01/01/15          ||     £20 
> 6           ||    2             ||     01/01/15          ||     £20 
> 7           ||    4             ||     01/01/15          ||     £20

我想对它运行一个查询，在一个新列中添加一个新的列，说明如果CustomerID记录所以它想要的实例

> OrderID     ||    CustomerID    ||    TransactionDate    ||    Revenue ||Instance
> 1           ||    1             ||     01/01/15          ||     £20    ||1 
> 2           ||    2             ||     01/01/15          ||     £20    ||1
> 3           ||    3             ||     01/01/15          ||     £20    ||1 
> 4           ||    1             ||     01/01/15          ||     £20    ||2
> 5           ||    1             ||     01/01/15          ||     £20    ||3 
> 6           ||    2             ||     01/01/15          ||     £20    ||2 
> 7           ||    4             ||     01/01/15          ||     £20    ||1

每次发生已经看到的customerID时，Instance会增加1

此外，我需要针对一个目前在160万行的不断增长的表运行它 .

我希望有人可以帮助我 .

干杯

约翰

2 回答

Window Functions在这里帮助你：

窗口函数可以对结果集的特定分区或“窗口”进行计算 . 每个窗口函数都需要一个指定分区的OVER子句，使用以下语法：

OVER (
      [PARTITION BY <expr>]
      [ORDER BY <expr>]
      [ROWS <expr> | RANGE <expr>]
     )

PARTITION BY 始终是可选的 . ORDER BY 在某些情况下是可选的，但某些窗口函数（例如 rank() 或 dense_rank() ）需要该子句 .

JOIN EACH 和 GROUP EACH BY 子句不能用于窗口函数的输出 . 要在使用窗口函数时生成大型查询结果，必须使用 PARTITION BY .

select *,
row_number() over (partition by CustomerID order by TransactionDate) as Instance
from  (select 1 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 2 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 3 as OrderID, 3 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 4 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 5 as OrderID, 1 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 6 as OrderID, 2 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue), 
 (select 7 as OrderID, 4 as CustomerID, '01/01/15' as TransactionDate,'£20' as Revenue)
 order by OrderID

+-----+---------+------------+-----------------+---------+----------+---+
| Row | OrderID | CustomerID | TransactionDate | Revenue | Instance |   |
+-----+---------+------------+-----------------+---------+----------+---+
|   1 |       1 |          1 | 01/01/15        | £20     |        1 |   |
|   2 |       2 |          2 | 01/01/15        | £20     |        1 |   |
|   3 |       3 |          3 | 01/01/15        | £20     |        1 |   |
|   4 |       4 |          1 | 01/01/15        | £20     |        2 |   |
|   5 |       5 |          1 | 01/01/15        | £20     |        3 |   |
|   6 |       6 |          2 | 01/01/15        | £20     |        2 |   |
|   7 |       7 |          4 | 01/01/15        | £20     |        1 |   |
+-----+---------+------------+-----------------+---------+----------+---+

回复于 2024-05-19T06:08:36+08:00

1

你应该使用像row_number OVER这样的窗口函数（bygroupbyfields order by transaction_date）

回复于 2024-05-19T06:08:36+08:00

运行客户ID Bigquery的外观计数

2 回答

相关问题