我有以下查询,大约需要15-20秒才能运行 .
with cte0 as (
SELECT
label,
date,
CASE
WHEN
Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL
THEN
'1'::numeric
ELSE
'0'::numeric
END As "unique"
FROM table_data
LEFT JOIN table_mapper ON
table_mapper."type" = table_data."type"
WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1
)
SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1
UNION ALL
SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1
UNION ALL
SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1
在表 table_data
中,我在 date
列上有一个索引 .
CREATE INDEX ix_cli_date
ON table_data
USING btree
(date);
表定义(\ d table_data)
Table "public.table_data"
Column | Type | Modifiers
------------------+------------------------+-----------
date | date | not null
number | bigint | not null
time | time without time zone | not null
end time | time without time zone | not null
duration | integer | not null
time1 | integer | not null
time2 | integer | not null
time3 | integer | not null
time4 | integer | not null
time5 | integer | not null
time6 | integer | not null
time7 | integer | not null
type | text | not null
name | text | not null
id1 | integer | not null
id2 | integer | not null
key | integer | not null
status | text | not null
Indexes:
"ix_cli_date" btree (date)
Table Definition (\d table_mapper)
Table "public.table_mapper"
Column | Type | Modifiers
------------+------+-----------
type | text | not null
label | text | not null
label2 | text | not null
label3 | text | not null
label4 | text | not null
label5 | text | not null
EXPLAIN ANALYZE查询
Result (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)"
CTE cte0"
-> WindowAgg (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)"
-> Sort (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)"
Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time""
Sort Method: external merge Disk: 39480kB"
-> Hash Left Join (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)"
Hash Cond: (table_data."type" = table_mapper."type")"
-> Index Scan using ix_cli_date on table_data (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)"
Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date Hash (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)"
Buckets: 1024 Batches: 1 Memory Usage: 15kB"
-> Seq Scan on table_mapper (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)"
-> Append (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)"
-> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)"
Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
-> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)"
Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
-> Aggregate (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)"
Filter: (date = (('now'::text)::date - 1))"
Total runtime: 25710.506 ms"
Description :
我正在从 table_data
获取唯一计数和重复计数,并且 LEAD
帮助我在哪里为列的最后一个重复值赋值0 .
假设我在列中有3个 x
. 我将 1
值赋予前2个 x
,并将第3个 x
赋予0 .
实际上通过 cte
我从表 table_data
获取整行并使用线索进行一些计算并将字符串连接到定义的日期范围,其中每个行 1
和 0
值按照标准定义 .
如果前导为空,则它将被计为1,如果它不为空,则为0 .
并且我分别返回3行 MTD
, Current Week
和 FTD
,并计算了我从引导线和 count(*)
整行获取的 sum()
.
对于MTD,我有当月的总和和计数 .
周 - 这是本周,FTD是昨天 .
2 回答
CTE对大表有意义,因此您只需要扫描一次 . 对于较小的表,没有...可能会更快
使用
thedate
而不是保留字date
(在标准SQL中) .thetime
,uni
而不是time
,unique
. 等等 .简化了
lead()
调用 . 您获得前导行的值或NULL . 这似乎是唯一的相关信息 .在window function的
ORDER BY
子句中重复PARTITION
子句中的列也没有意义 .在此基础上,
count(leader) / count(*)
而不是sum(uni) / count(uni)
. 那有点快 .count(column)
仅计算非空值,而count(*)
计算所有行 .UNION查询第一站的条件是多余的 .
在问题评论中有关数据定义的更多建议和链接 .
表设计/索引
你应该有主键 . 我建议将serial列作为
table_data
的代理pk:使
type
成为table_mapper
的主键(以下fk约束也需要):为
type
添加外键约束以保证参照完整性 . 就像是:为了获得最终的读取性能(以一定的写入成本),添加一个多列索引以允许index-only scans用于上述查询:
在编写查询时,您指的是CTE三次 . 相反,如果您愿意将值包含在三列而不是三行中,则可以使用条件聚合:
这可以加快查询速度 . 此外,您可以将此逻辑合并到CTE查询本身中,从而消除了实现步骤 .