我在以前的帖子中找不到答案,所以我希望我的帖子是相关的 . 我在使用ElasticSearch术语方面遇到了麻烦 .
当我查询每个术语方面的文档计数时,我得到,对于某些字段值,我会说8但是当我查询具有该字段的特定值的文档计数时,我得到,比方说19 .
为了更加深思熟虑,我正在使用Kibana,这里是查询和响应(我被告知要重命名字段值fyi):
all term facets count query:
{
"facets" : {
"terms" : {
"terms" : {
**"fields" : ["field.name"],**
"size" : 6,
"order" : "count",
"exclude" : []
},
"facet_filter" : {
"fquery" : {
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [{
"query_string" : {
"query" : "*"
}
}
]
}
},
"filter" : {
"bool" : {
"must" : [{
"match_all" : {}
}
]
}
}
}
}
}
}
}
},
"size" : 0
}
the response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 20374,
"max_score" : 0.0,
"hits" : []
},
"facets" : {
"terms" : {
"_type" : "terms",
"missing" : 10567,
"total" : 9918,
"other" : 9781,
"terms" : [{
"term" : "fieldValue1"
"count" : 43
}, {
"term" : "fieldValue2",
"count" : 27
}, {
"term" : "fieldValue3",
"count" : 23
}, {
"term" : "fieldValue4",
"count" : 23
}, {
"term" : "fieldValue5",
"count" : 13
}, {
"term" : "fieldValue6",
"count" : 8
}
]
}
}
}
the query on "fieldValue6"
{
"facets" : {
"terms" : {
"terms" : {
"fields" : ["field.name"],
"size" : 6,
"order" : "count",
"exclude" : []
},
"facet_filter" : {
"fquery" : {
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [{
"query_string" : {
"query" : "*"
}
}
]
}
},
"filter" : {
"bool" : {
"must" : [{
"terms" : {
"field.name" : ["fieldValue6"]
}
}
]
}
}
}
}
}
}
}
},
"size"
the response :
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 20374,
"max_score" : 0.0,
"hits" : []
},
"facets" : {
"terms" : {
"_type" : "terms",
"missing" : 0,
"total" : 19,
"other" : 0,
"terms" : [{
"term" : "fieldValue6",
"count" : 19
}
]
}
}
}
我应用facet过滤器(或实际应该调用的任何东西)的字段设置为"not analyzed":
properties: {
type_ref2Strack: {
properties: {
position: {
type: long
}
name: {
index: not_analyzed
norms: {
enabled: false
}
index_options: docs
type: string
}
}
}
}
1 回答
这是弹性研究方面(现在称为聚合)的长期已知限制 .
关键问题在于它针对每个具有给定大小的分片运行方面,然后组合结果,这意味着计数可以被切断 .
有两种非理想的方法可以解决这个问题:
添加比您真正需要的更大的"shard_size"输入 . 这将主要起作用,但仍然无法确保计数 .
索引只是一个分片 . 这样,它将始终收集确切的结果 . 这会影响将索引缩放到大量文档,但YMMV
有关详情,请参阅此处:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate