带有“＆”的同义词过滤器在弹性搜索中不起作用建议使用标准标记器-Java 学习之路

我的目标是，如果我有 "s & p indices" 索引的内容，如果用户搜索 s and p ， s & p 或 s p ，我也可以建议这样做 . 然而，似乎有一些特殊的关于＆，因为下面的同义词设置不起作用 . 我有 suggest index 的下面的映射 .

{
  "settings": {
    "analysis": {
      "analyzer": {
        "suggest_analyzer": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    [ "lowercase", "my_synonym_filter" ]
        }
      },
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [ "&, and", "foo, bar" ]
        }
      }
    }
  }
}

我有以下映射我的 type

{
  "properties" : {
    "name" : { "type" : "string" },
    "name_suggest" : {
      "type" : "completion",
      "index_analyzer" :  "suggest_analyzer",
      "search_analyzer" : "suggest_analyzer"
    }
  } 
}

如果我索引以下对象：

{
  "name" : "s & p indices",
  "name_suggest" : { 
    "input" : [ "s & p indices"] 
  }
}

搜索 s and 不会返回索引建议 . 但是，foo和bar的同义词按预期工作 .

我假设它可能与标准标记器如何标记和＆有关，但我不知道如何解决该问题 . 有没有办法让令牌器排除＆和/或以不同方式处理它？

2 回答

您当前的问题显然在于为 suggest_analyzer 选择了tokenizer . 标准标记生成器不会为 & 生成标记，因此传递给过滤器的标记流不会看到 & 标记，因为它们无法替换它 . 您可以使用_analyze endpoint查看其工作原理

在这种情况下，标准tokenizer生成的标记对于文本 s & p 看起来像这样 .

"tokens": [
      {
         "token": "s",
         "start_offset": 5,
         "end_offset": 6,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "p",
         "start_offset": 9,
         "end_offset": 10,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]

标准的标记器吃＆ . 让这一切在这里工作的最简单方法是改变您的分析仪使用空白分析器，它不会去除特殊字符或做很多工作，它的工作是分裂在空白区域 .

我将您的映射修改为：

"settings": {
    "analysis": {
      "analyzer": {
        "suggest_analyzer": {
          "type":      "custom",
          "tokenizer": "whitespace",
          "filter":    [ "lowercase", "my_synonym_filter" ]
        }
      },
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [
              "&, and",
              "foo, bar" ]
        }
      }
    }
  }

这会得到这样的结果：

{
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "name_suggest": [
      {
         "text": "s and",
         "offset": 0,
         "length": 5,
         "options": [
            {
               "text": "s & p",
               "score": 1
            }
         ]
      }
   ]
}

回复于 2024-05-15T17:12:37+08:00

另一种选择是在使用char过滤器命中标记器之前替换＆符号 . 像这样：

...
            "char_filter" : {
                "replace_ampersands" : {
                    "type" : "mapping",
                    "mappings" : ["&=>and"]
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "char_filter" : ["replace_ampersands"],
                    "filter": [
                        "lowercase",
                        "addy_synonym_filter",
                        "autocomplete_filter",
                    ]
                }
            }
            ...

回复于 2024-05-15T17:12:37+08:00

带有“＆”的同义词过滤器在弹性搜索中不起作用建议使用标准标记器

2 回答

相关问题