对于搜索应用程序,我需要显示可用数据字段的值 . 此外,这些值还需要通过用户输入的搜索词缩小 . 可以通过存储桶聚合收集数据/索引字段的值 . 但是,如果存在一对多关系,则问题表明存储桶包含的值与输入的搜索项不匹配 .

例如,使用这样的映射:

PUT my_index
{
  "mappings": {
    "product": {
      "properties": {
        "product_name": {"type": "text", "index": true},
          "tags": {
            "properties": {
                "tag": {"type": "keyword", "index": true}
                }
            }
      }
    }
  }
}

数据:

PUT _bulk
{"index": {"_index": "my_index", "_type": "product", "_id": "1"}}
{"product_name": "the book you were looking for", "tags": [{"tag": "book"},{"tag": "suspense"}]}
{"index": {"_index": "my_index", "_type": "product", "_id": "2"}}
{"product_name": "combinatorics for the wicked", "tags": [{"tag": "book"},{"tag": "mathematics"}, {"tag": "combinatorics"}]}
{"index": {"_index": "my_index", "_type": "product", "_id": "3"}}
{"product_name": "the story of the lonely bit", "tags": [{"tag": "book"},{"tag": "mathematics"}, {"tag": "suspense"},{"tag": "drama"}]}
{"index": {"_index": "my_index", "_type": "product", "_id": "4"}}
{"product_name": "a vector growing wrong", "tags": [{"tag": "book"},{"tag": "mathematics"},{"tag": "suspense"},{"tag": "drama"}]}

对索引中所有文档的聚合:

GET my_index/_search
{
    "aggs" : {
        "tags" : {
            "terms" : { "field" : "tags.tag" }
        }
    }
}

结果:

... "buckets": [
{
  "key": "book",
  "doc_count": 4
},
{
  "key": "mathematics",
  "doc_count": 3
},
{
  "key": "suspense",
  "doc_count": 3
},
{
  "key": "drama",
  "doc_count": 2
},
{
  "key": "combinatorics",
  "doc_count": 1
},
{
  "key": "mystery",
  "doc_count": 1
}

] ...

只要不需要缩小值,这个结果就可以了 . 现在假设用户想要只看到那些以“m”开头的字段值 .

GET my_index/_search
{
     "query": {
        "wildcard" : { "tags.tag" : { "value" : "m*" } }
    },
  "aggs" : {
        "tags" : {
            "terms" : { "field" : "tags.tag" }
        }
    }
}

这种聚合的结果是:

......“桶”:[{“key”:“book”,“doc_count”:3},{“key”:“mathics”,“doc_count”:3},{“key”:“drama”,“ doc_count“:2},{”key“:”suspense“,”doc_count“:2},{”key“:”combinatorics“,”doc_count“:1}] ...

在与查询匹配的文档上计算存储区 . 但由于产品与标签之间存在一对多的关系,因此查询匹配产品的所有标记值都包含在存储桶中 .

有没有办法从存储桶列表中排除不匹配的存储桶?所以这里的例子应该只有一个桶:“数学” . 是否有必要对这类要求的数据进行完全非规范化?