从较大的jsonb字段中快速检索多个值（postgresql 9.4）-Java 学习之路

tl;dr

使用PSQL 9.4，有没有办法从jsonb字段中检索多个值，例如使用虚函数：

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

希望加快选择多个值所需的几乎线性时间（1值= 300ms，2值= 450ms，3值= 600ms）

Background

我有以下jsonb表：

CREATE TABLE "public"."analysis" (
  "date" date NOT NULL,
  "name" character varying (10) NOT NULL,
  "country" character (3) NOT NULL,
  "x" jsonb,
  PRIMARY KEY(date,name)
);

大约有100 000行，其中每行有一个带有90个键和相应值的jsonb字典 . 我正在尝试编写一个SQL查询，以相当快的方式选择一些（<10）键值（<500 ms）

Index and querying: 190ms

我开始添加一个索引：

CREATE INDEX ON analysis USING GIN (x);

这使得基于“x”字典中的值快速查询，例如：

SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

这需要大约190毫秒（我们可以接受）

Retrieving dictionary values

但是，一旦我开始在SELECT部分中添加要返回的键，执行时间几乎呈线性增长：

1 value: 300ms

select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

需要366毫秒（175毫秒）

select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

需要300毫秒（110毫秒）

3 values: 600ms

select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

需要600毫秒（选择每个值410或100）

select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

需要600毫秒（选择每个值410或100）

Retrieving more values faster

有没有办法从jsonb字段中检索多个值，例如使用虚函数：

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

这可能会加速这些查找 . 它可以将它们作为列或列表/数组甚至是json对象返回 .

Retrieving an array using PL/Python

只是为了它，我使用PL / Python创建了一个自定义函数，但这要慢得多（5s），可能是由于json.loads：

CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[])
RETURNS TEXT[] AS $$
  if not data:
    return []

  import simplejson as json
  j = json.loads(data) 

  l = []
  for i in k:
    l.append(j[i])

  return l

$$ LANGUAGE plpython2u;

# Usage:
# select retrieve_objects(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key']) from analysis  where date > '2014-01-01' and date < '2014-05-01'

Update 2015-05-21

我使用带有GIN索引的hstore重新实现了表，并且性能几乎与使用jsonb相同，即在我的情况下没有帮助 .

1 回答

您正在使用#> operator，它看起来像执行路径搜索 . 你尝试过正常的 -> 查询吗？喜欢：

select  json_column->'json_field1'
,       json_column->'json_field2'

如果你使用临时表会发生什么事情会很有趣 . 喜欢：

create temporary table tmp_doclist (doc jsonb)
;
insert  tmp_doclist
        (doc)
select  x
from    analysis
where   ... your conditions here ...
;
select  doc->'col1'
,       doc->'col2'
,       doc->'col3'
from    tmp_doclist
;

回复于 2024-04-27T03:13:34+08:00

从较大的jsonb字段中快速检索多个值（postgresql 9.4）

1 回答

相关问题