我正在对'text'和'keywords'字段执行模糊弹性搜索查询 . 我在elasticsearch中有两个文档,一个是“text”“testPhone 5”,另一个是“testPhone 4s” . 当我使用“testPhone 5”执行模糊查询时,我发现两个文档都被赋予了完全相同的分数值 . 为什么会这样?
额外信息:我使用'uax_url_email'标记器和'小写'过滤器索引文档 .
这是我正在进行的查询:
{
query : {
bool: {
// match one or the other fuzzy query
should: [
{
fuzzy: {
text: {
min_similarity: 0.4,
value: 'testphone 5',
prefix_length: 0,
boost: 5,
}
}
},
{
fuzzy: {
keywords: {
min_similarity: 0.4,
value: 'testphone 5',
prefix_length: 0,
boost: 1,
}
}
}
]
}
},
sort: [
'_score'
],
explain: true
}
这是结果:
{ max_score: 0.47213298,
total: 2,
hits:
[ { _index: 'test',
_shard: 0,
_id: '51fbf95f82e89ae8c300002c',
_node: '0Mtfzbe1RDinU71Ordx-Ag',
_source:
{ next: { id: '51fbf95f82e89ae8c3000027' },
cards: [ '51fbf95f82e89ae8c3000027', [length]: 1 ],
other: false,
_id: '51fbf95f82e89ae8c300002c',
category: '51fbf95f82e89ae8c300002b',
image: 'https://s3.amazonaws.com/sold_category_icons/Smartphones.png',
text: 'testPhone 5',
keywords: [ [length]: 0 ],
__v: 0 },
_type: 'productgroup',
_explanation:
{ details:
[ { details:
[ { details:
[ { details:
[ { details:
[ { value: 3.8888888, description: 'boost' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.17020021,
description: 'queryNorm' },
[length]: 3 ],
value: 0.99999994,
description: 'queryWeight, product of:' },
{ details:
[ { details:
[ { value: 1, description: 'termFreq=1.0' },
[length]: 1 ],
value: 1,
description: 'tf(freq=1.0), with freq of:' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.625,
description: 'fieldNorm(doc=0)' },
[length]: 3 ],
value: 0.944266,
description: 'fieldWeight in 0, product of:' },
[length]: 2 ],
value: 0.94426596,
description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
[length]: 1 ],
value: 0.94426596,
description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
[length]: 1 ],
value: 0.94426596,
description: 'sum of:' },
{ value: 0.5, description: 'coord(1/2)' },
[length]: 2 ],
value: 0.47213298,
description: 'product of:' },
_score: 0.47213298 },
{ _index: 'test',
_shard: 4,
_id: '51fbf95f82e89ae8c300002d',
_node: '0Mtfzbe1RDinU71Ordx-Ag',
_source:
{ next: { id: '51fbf95f82e89ae8c3000027' },
cards: [ '51fbf95f82e89ae8c3000029', [length]: 1 ],
other: false,
_id: '51fbf95f82e89ae8c300002d',
category: '51fbf95f82e89ae8c300002b',
image: 'https://s3.amazonaws.com/sold_category_icons/Smartphones.png',
text: 'testPhone 4s',
keywords: [ 'apple', [length]: 1 ],
__v: 0 },
_type: 'productgroup',
_explanation:
{ details:
[ { details:
[ { details:
[ { details:
[ { details:
[ { value: 3.8888888, description: 'boost' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.17020021,
description: 'queryNorm' },
[length]: 3 ],
value: 0.99999994,
description: 'queryWeight, product of:' },
{ details:
[ { details:
[ { value: 1, description: 'termFreq=1.0' },
[length]: 1 ],
value: 1,
description: 'tf(freq=1.0), with freq of:' },
{ value: 1.5108256,
description: 'idf(docFreq=2, maxDocs=5)' },
{ value: 0.625,
description: 'fieldNorm(doc=0)' },
[length]: 3 ],
value: 0.944266,
description: 'fieldWeight in 0, product of:' },
[length]: 2 ],
value: 0.94426596,
description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
[length]: 1 ],
value: 0.94426596,
description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
[length]: 1 ],
value: 0.94426596,
description: 'sum of:' },
{ value: 0.5, description: 'coord(1/2)' },
[length]: 2 ],
value: 0.47213298,
description: 'product of:' },
_score: 0.47213298 },
[length]: 2 ] }
2 回答
不对模糊查询进行分析,但是该字段是这样的,因此您搜索
testphone 5
,距离为0.4
会为两个文档生成分析的术语testphone
,该术语用于进一步筛选结果描述:'weight(文字: testphone ^ 3.8888888 in 0)[PerFieldSimilarity],结果:'},
另见@imotov优秀答案:ElasticSearch's Fuzzy Query
您可以使用
_analyze
API查看字符串的标记方式http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
即
http://localhost:9200/prefix_test/_analyze?field=text&text=testphone+5
将返回:
因此,即使您索引值
testphone sammsung
,"testphone samsunk"的模糊查询也不会产生任何只有samsunk
的内容 .通过不分析(或使用关键字分析器)字段,您可以获得更好的结果 .
如果要对单个字段进行不同的分析,可以使用
multi_field
构造 .http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-multi-field-type.html
我最近自己遇到了这个问题 . 我无法确切地告诉你它为什么会发生,但我可以告诉你我是如何修理它的:
我在同一个字段上运行了2个查询,一个具有完全匹配,然后在同一字段上完全相同的查询,启用了模糊匹配和较低的提升 .
这确保了我的完全匹配总是比模糊匹配更高 .
附:我认为他们的得分是平等的,因为由于模糊性,两者的匹配和ES并不关心只要两者匹配就是一个完全匹配,但这是纯粹的理论制作,因为我不是非常熟悉评分算法 .