如何在elasticsearch中获得单词三元组-Java 学习之路

我一直在尝试使用elasticsearch tokenizers进行三元组 . 我已经按照http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html和http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams的教程进行了操作

遵循这些文档并测试分析仪

curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'

产生像 # FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04 这样的nGrams

虽然我想要的是全字三卦

例如 the quick red fox jumps over the lazy brown dog 的三元组将是 .

the quick red
quick red fox
red fox jumps
fox jumps over
jumps over the
over the lazy
the lazy brown
lazy brown dog

简而言之，如何使用elasticsearch创建上面的trgrams

1 回答

找到了 . 答案在于木瓦过滤器 . 这种映射使它工作

{
   "settings": {
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "shingle",
               "max_shingle_size": 3,
               "min_shingle_size": 3,
               output_unigrams:false
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   }
}

这里的关键属性是type-> shingle和min / max shingle大小 .

回复于 2024-05-03T19:07:37+08:00

如何在elasticsearch中获得单词三元组

1 回答

相关问题