首页 文章

如何使此查询高效运行?

提问于
浏览
2

在BigQuery中,我们尝试运行:

SELECT day, AVG(value)/(1024*1024) FROM ( 
    SELECT value, UTC_USEC_TO_DAY(timestamp) as day, 
         PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [Datastore.PerformanceDatum]
    WHERE type = "MemoryPerf"
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc;

它返回相对少量的数据 . 但我们得到的信息是:

Error: Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby

是什么让这个查询失败,子查询的大小?我们可以做一些等效的查询来避免这个问题吗?


编辑以响应注释:如果我添加GROUP EACH BY(并删除外部ORDER BY),查询将失败,声称GROUP EACH BY在此处不可并行化 .

1 回答

  • 1

    我写了一个适合我的等效查询:

    SELECT day, AVG(value)/(1024*1024) FROM (
    SELECT data value, UTC_USEC_TO_DAY(dtimestamp) as day, 
             PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
        FROM [io_sensor_data.moscone_io13]
        WHERE sensortype = "humidity"
    ) WHERE rank >= 0.9 AND rank <= 0.91 
    GROUP BY day 
    ORDER BY day desc;
    

    如果我只运行内部查询,我得到3,660,624个结果 . 你的数据集是否大于那个?

    外部选择在按天分组时仅给出4个结果 . 我会尝试不同的分组,看看我是否能达到极限:

    SELECT day, AVG(value)/(1024*1024) FROM (
    SELECT data value, dtimestamp / 1000 as day, 
             PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
        FROM [io_sensor_data.moscone_io13]
        WHERE sensortype = "humidity"
    ) WHERE rank >= 0.9 AND rank <= 0.91 
    GROUP BY day 
    ORDER BY day desc;
    

    也运行,现在有57,562个不同的组 .

    我尝试了不同的组合来得到同样的错误 . 我能够得到与初始数据量翻倍相同的错误 . 将数据量翻倍的简单“黑客”正在改变:

    FROM [io_sensor_data.moscone_io13]
    

    至:

    FROM [io_sensor_data.moscone_io13], [io_sensor_data.moscone_io13]
    

    然后我得到了同样的错误 . 你有多少数据?你可以申请额外的过滤器吗?由于您已经在白天对percentile_rank进行了分区,您是否可以添加其他查询以仅分析一小部分天(例如,仅在上个月)?

相关问题