我正在尝试为可能是空格分隔的单词添加同义词 . 在尝试分析时,同义词的第二个标记会重叠不同的单词 . 这对于多匹配cross_field查询至关重要 .

这是一个例子

DELETE my_index
PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [ 
            "britishshirt => british shirt"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter" 
          ]
        }
      }
    }
  }
}

GET my_index/_analyze
{
  "analyzer": "my_synonyms",
  "text":"britishshirt cat"
}

回应是:

{
  "tokens": [
    {
      "token": "british",
      "start_offset": 0,
      "end_offset": 12,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "cat",
      "start_offset": 13,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "shirt",
      "start_offset": 13,
      "end_offset": 16,
      "type": "SYNONYM",
      "position": 1
    }
  ]
}

所以:(英国)(衬衫,猫)

搜索“britishshirt cat”交叉字段时应保证两个术语都出现在文档中,但是,由于“cat”具有“Shirt”同义词,因此查询逻辑不正确 .

我想到了像_1074731这样的东西,但是对于将英国人在一个领域和衬衫上与另一个领域相匹配的文件,这将失败 .