我正在尝试为可能是空格分隔的单词添加同义词 . 在尝试分析时,同义词的第二个标记会重叠不同的单词 . 这对于多匹配cross_field查询至关重要 .
这是一个例子
DELETE my_index
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"britishshirt => british shirt"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
GET my_index/_analyze
{
"analyzer": "my_synonyms",
"text":"britishshirt cat"
}
回应是:
{
"tokens": [
{
"token": "british",
"start_offset": 0,
"end_offset": 12,
"type": "SYNONYM",
"position": 0
},
{
"token": "cat",
"start_offset": 13,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "shirt",
"start_offset": 13,
"end_offset": 16,
"type": "SYNONYM",
"position": 1
}
]
}
所以:(英国)(衬衫,猫)
搜索“britishshirt cat”交叉字段时应保证两个术语都出现在文档中,但是,由于“cat”具有“Shirt”同义词,因此查询逻辑不正确 .
我想到了像_1074731这样的东西,但是对于将英国人在一个领域和衬衫上与另一个领域相匹配的文件,这将失败 .