首页 文章

如何在elasticsearch中获得单词三元组

提问于
浏览
2

我一直在尝试使用elasticsearch tokenizers进行三元组 . 我已经按照http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.htmlhttp://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams的教程进行了操作

遵循这些文档并测试分析仪

curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'

产生像 # FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04 这样的nGrams

虽然我想要的是全字三卦

例如 the quick red fox jumps over the lazy brown dog 的三元组将是 .

the quick red
quick red fox
red fox jumps
fox jumps over
jumps over the
over the lazy
the lazy brown
lazy brown dog

简而言之,如何使用elasticsearch创建上面的trgrams

1 回答

  • 2

    找到了 . 答案在于木瓦过滤器 . 这种映射使它工作

    {
       "settings": {
          "analysis": {
             "filter": {
                "nGram_filter": {
                   "type": "shingle",
                   "max_shingle_size": 3,
                   "min_shingle_size": 3,
                   output_unigrams:false
                }
             },
             "analyzer": {
                "nGram_analyzer": {
                   "type": "custom",
                   "tokenizer": "whitespace",
                   "filter": [
                      "lowercase",
                      "asciifolding",
                      "nGram_filter"
                   ]
                },
                "whitespace_analyzer": {
                   "type": "custom",
                   "tokenizer": "whitespace",
                   "filter": [
                      "lowercase",
                      "asciifolding"
                   ]
                }
             }
          }
       }
    }
    

    这里的关键属性是type-> shingle和min / max shingle大小 .

相关问题