对于搜索应用程序,我需要显示可用数据字段的值 . 此外,这些值还需要通过用户输入的搜索词缩小 . 可以通过存储桶聚合收集数据/索引字段的值 . 但是,如果存在一对多关系,则问题表明存储桶包含的值与输入的搜索项不匹配 .
例如,使用这样的映射:
PUT my_index
{
"mappings": {
"product": {
"properties": {
"product_name": {"type": "text", "index": true},
"tags": {
"properties": {
"tag": {"type": "keyword", "index": true}
}
}
}
}
}
}
数据:
PUT _bulk
{"index": {"_index": "my_index", "_type": "product", "_id": "1"}}
{"product_name": "the book you were looking for", "tags": [{"tag": "book"},{"tag": "suspense"}]}
{"index": {"_index": "my_index", "_type": "product", "_id": "2"}}
{"product_name": "combinatorics for the wicked", "tags": [{"tag": "book"},{"tag": "mathematics"}, {"tag": "combinatorics"}]}
{"index": {"_index": "my_index", "_type": "product", "_id": "3"}}
{"product_name": "the story of the lonely bit", "tags": [{"tag": "book"},{"tag": "mathematics"}, {"tag": "suspense"},{"tag": "drama"}]}
{"index": {"_index": "my_index", "_type": "product", "_id": "4"}}
{"product_name": "a vector growing wrong", "tags": [{"tag": "book"},{"tag": "mathematics"},{"tag": "suspense"},{"tag": "drama"}]}
对索引中所有文档的聚合:
GET my_index/_search
{
"aggs" : {
"tags" : {
"terms" : { "field" : "tags.tag" }
}
}
}
结果:
... "buckets": [
{
"key": "book",
"doc_count": 4
},
{
"key": "mathematics",
"doc_count": 3
},
{
"key": "suspense",
"doc_count": 3
},
{
"key": "drama",
"doc_count": 2
},
{
"key": "combinatorics",
"doc_count": 1
},
{
"key": "mystery",
"doc_count": 1
}
] ...
只要不需要缩小值,这个结果就可以了 . 现在假设用户想要只看到那些以“m”开头的字段值 .
GET my_index/_search
{
"query": {
"wildcard" : { "tags.tag" : { "value" : "m*" } }
},
"aggs" : {
"tags" : {
"terms" : { "field" : "tags.tag" }
}
}
}
这种聚合的结果是:
......“桶”:[{“key”:“book”,“doc_count”:3},{“key”:“mathics”,“doc_count”:3},{“key”:“drama”,“ doc_count“:2},{”key“:”suspense“,”doc_count“:2},{”key“:”combinatorics“,”doc_count“:1}] ...
在与查询匹配的文档上计算存储区 . 但由于产品与标签之间存在一对多的关系,因此查询匹配产品的所有标记值都包含在存储桶中 .
有没有办法从存储桶列表中排除不匹配的存储桶?所以这里的例子应该只有一个桶:“数学” . 是否有必要对这类要求的数据进行完全非规范化?