首页 文章

在大查询中连接多个表

提问于
浏览
-1

我想在BigQuery中加入多个表,但是Joining multiple tables in bigquery的解决方案并没有帮助我得到我想要的输出 .

我的出发点如下 . 我正在创建5个单独的表,显示特定页面可能的每个评级值 . 请参阅此处的示例输出:

raw tables

该表按以下方式创建:

#standardSQL
  CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN (
    SELECT
      FORMAT_DATE('%y%m%d',
        DATE('2018-06-01')))
    AND (
    SELECT
      FORMAT_DATE('%y%m%d',
        DATE('2018-06-30'))));

SELECT
  h.page.pagePath AS page,
  Count(h.eventInfo.eventLabel)as five_star
FROM
  `table.ga_sessions_20*` AS t,
  t.hits AS h
WHERE
  h.eventInfo.eventAction='rating'
  AND h.eventInfo.eventLabel ='5'
  AND tables_in_range(_TABLE_SUFFIX)
  AND REGEXP_CONTAINS(h.page.pagePath,
    r'/xyz/')
  AND h.type='EVENT'
group by 1

当按照此处所述加入表格时Joining multiple tables in bigquery我很遗憾没有得到预期的结果 . 相反,Join仅查看所有5个表共有的页面 - 这意味着这些页面的五个可能值中的每一个都具有1-5的评级 . 见下面的示例输出 . joint table results

select
five_star.page as page,
five_star.five_star as five_star,
four_star.four_star as four_star,
three_star.three_star as three_star,
two_star.two_star as two_star,
one_star.one_star as one_star
from five_star
join four_star using (page)
join three_star using (page)
join two_star using (page)
JOIN one_star using (page)

我希望通过我的加入实现的是这样一个表:desired output . 我看到的问题是,如果页面没有收到某个评级,它将不会在查询atm中加入 . 不幸的是,我无法找到Union all,Cross Join或者left join的解决方案,所以我非常感谢这里的任何支持!

3 回答

  • 0

    您的查询中的问题:您're only adding to those event pages, which had a 5-star rating. That'为什么建议使用full outer join - 它会向最左侧的表添加新行 .

    我认为在您的情况下,解决方案更容易,根本不需要连接,因为所有数据都在同一个表中 . 这个是扁平的非枢轴:

    #standardSQL
      CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
    
    SELECT
      h.page.pagePath AS page,
      h.eventInfo.eventLabel stars,
      COUNT(1) as events
    FROM
      `project.dataset.ga_sessions_*` AS t, t.hits AS h
    WHERE
      h.eventInfo.eventAction='rating'
      AND h.eventInfo.eventLabel between '1' and '5'
      AND tables_in_range(_TABLE_SUFFIX)
      AND REGEXP_CONTAINS(h.page.pagePath,
        r'/xyz/')
      AND h.type='EVENT'
    GROUP BY 1, 2
    

    如果你真的需要类似于pivot的列,它将如下所示:

    #standardSQL
      CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
    
    SELECT
      h.page.pagePath AS page,
      SUM( IF(h.eventInfo.eventLabel = '1', 1, 0) ) as oneStarEvents,
      SUM( IF(h.eventInfo.eventLabel = '2', 1, 0) ) as twoStarEvents,
      SUM( IF(h.eventInfo.eventLabel = '3', 1, 0) ) as threeStarEvents,
      SUM( IF(h.eventInfo.eventLabel = '4', 1, 0) ) as fourStarEvents,
      SUM( IF(h.eventInfo.eventLabel = '5', 1, 0) ) as fiveStarEvents
    FROM
      `project.dataset.ga_sessions_*` AS t, t.hits AS h
    WHERE
      h.eventInfo.eventAction='rating'
      AND h.eventInfo.eventLabel between '1' and '5'
      AND tables_in_range(_TABLE_SUFFIX)
      AND REGEXP_CONTAINS(h.page.pagePath,
        r'/xyz/')
      AND h.type='EVENT'
    GROUP BY 1
    

    而不是 SUM(IF(condition,1,0)) 你也可以 COUNT(IF(condition,1,NULL))

    心连心!

  • 0

    以下是BigQuery Standard SQL

    #standardSQL
    SELECT 
      page,
      SUM(five_star_rating) five_star_rating,
      SUM(four_star_rating) four_star_rating,
      SUM(three_star_rating) three_star_rating,
      SUM(two_star_rating) two_star_rating,
      SUM(one_star_rating) one_star_rating
    FROM (
      SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
      SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
      SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
      SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
      SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1` 
    )
    GROUP BY page
    

    您可以使用您问题中的虚拟数据进行测试,使用上面的数据,如下所示

    #standardSQL
    WITH `project.dataset.table5` AS (
      SELECT 'A' page, 1 five_star_rating UNION ALL
      SELECT 'B', 1 UNION ALL
      SELECT 'C', 1 
    ), `project.dataset.table4` AS (
      SELECT 'C' page, 1 four_star_rating UNION ALL
      SELECT 'D', 1 UNION ALL
      SELECT 'F', 1 
    ), `project.dataset.table3` AS (
      SELECT 'F' page, 1 three_star_rating UNION ALL
      SELECT 'G', 1 UNION ALL
      SELECT 'H', 1 
    ), `project.dataset.table2` AS (
      SELECT 'H' page, 1 two_star_rating UNION ALL
      SELECT 'I', 1 UNION ALL
      SELECT 'J', 1 
    ), `project.dataset.table1` AS (
      SELECT 'J' page, 1 one_star_rating UNION ALL
      SELECT 'K', 1 UNION ALL
      SELECT 'L', 1 
    )
    SELECT 
      page,
      SUM(five_star_rating) five_star_rating,
      SUM(four_star_rating) four_star_rating,
      SUM(three_star_rating) three_star_rating,
      SUM(two_star_rating) two_star_rating,
      SUM(one_star_rating) one_star_rating
    FROM (
      SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
      SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
      SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
      SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
      SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1` 
    )
    GROUP BY page
    
  • 0

    不幸的是,我无法找到Union all,Cross Join或者left join的解决方案......

    另一种选择是使用FULL JOIN,如下例所示

    #standardSQL
    SELECT
      COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
      IFNULL(five_star.five_star_rating, 0) AS five_star,
      IFNULL(four_star.four_star_rating, 0) AS four_star,
      IFNULL(three_star.three_star_rating, 0) AS three_star,
      IFNULL(two_star.two_star_rating, 0) AS two_star,
      IFNULL(one_star.one_star_rating, 0) AS one_star
    FROM `project.dataset.table5` five_star
    FULL JOIN `project.dataset.table4` four_star USING (page)
    FULL JOIN `project.dataset.table3` three_star USING (page)
    FULL JOIN `project.dataset.table2` two_star USING (page)
    FULL JOIN `project.dataset.table1` one_star USING (page)
    

    您可以使用您问题中的虚拟数据进行测试,使用上面的数据,如下所示

    #standardSQL
    WITH `project.dataset.table5` AS (
      SELECT 'A' page, 1 five_star_rating UNION ALL
      SELECT 'B', 1 UNION ALL
      SELECT 'C', 1 
    ), `project.dataset.table4` AS (
      SELECT 'C' page, 1 four_star_rating UNION ALL
      SELECT 'D', 1 UNION ALL
      SELECT 'F', 1 
    ), `project.dataset.table3` AS (
      SELECT 'F' page, 1 three_star_rating UNION ALL
      SELECT 'G', 1 UNION ALL
      SELECT 'H', 1 
    ), `project.dataset.table2` AS (
      SELECT 'H' page, 1 two_star_rating UNION ALL
      SELECT 'I', 1 UNION ALL
      SELECT 'J', 1 
    ), `project.dataset.table1` AS (
      SELECT 'J' page, 1 one_star_rating UNION ALL
      SELECT 'K', 1 UNION ALL
      SELECT 'L', 1 
    )
    SELECT
      COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
      IFNULL(five_star.five_star_rating, 0) AS five_star,
      IFNULL(four_star.four_star_rating, 0) AS four_star,
      IFNULL(three_star.three_star_rating, 0) AS three_star,
      IFNULL(two_star.two_star_rating, 0) AS two_star,
      IFNULL(one_star.one_star_rating, 0) AS one_star
    FROM `project.dataset.table5` five_star
    FULL JOIN `project.dataset.table4` four_star USING (page)
    FULL JOIN `project.dataset.table3` three_star USING (page)
    FULL JOIN `project.dataset.table2` two_star USING (page)
    FULL JOIN `project.dataset.table1` one_star USING (page)
    

    结果如预期:

    Row page    five_star   four_star   three_star  two_star    one_star     
    1   A       1           0           0           0           1    
    2   B       1           0           0           0           1    
    3   C       1           1           0           0           1    
    4   D       0           1           0           0           0    
    5   F       0           1           1           0           0    
    6   G       0           0           1           0           0    
    7   H       0           0           1           1           0    
    8   I       0           0           0           1           0    
    9   J       0           0           0           1           0
    

相关问题