首页 文章

Elasticsearch索引搜索货币$和£符号

提问于
浏览
2

在我的一些文件中,我有$或£符号 . 我想搜索£并检索包含该符号的文档 . 我已经经历了the documentation,但我得到了一些认知失调 .

# Delete the `my_index` index
DELETE /my_index    

# Create a custom analyzer
PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": [
            "&=> and ",
            "$=> dollar "
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&_to_and"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }
}

这将返回“the”,“quick”,“and”,“brown”,“fox”,正如文档所述:

# Test out the new analyzer
GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%26%20brown%20fox

这将返回“the”,“quick”,“dollar”,“brown”,“fox”

GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%24%20brown%20fox

添加一些记录:

PUT /my_index/test/1
{
  "title": "The quick & fast fox"
}    

PUT /my_index/test/1
{
  "title": "The daft fox owes me $100"
}

我想如果我搜索“美元”,我会得到一个结果?相反,我得不到任何结果:

GET /my_index/test/_search
{ "query": {
    "simple_query_string": {
      "query": "dollar"
    }
  }
}

甚至在分析仪上使用'$':

GET /my_index/test/_search
{ "query": {
  "query_string": {
    "query": "dollar10",
    "analyzer": "my_analyzer"
  }
 }
}

1 回答

  • 1

    您的问题是您指定了自定义分析器但从未使用过 . 如果您使用term vertors,则可以验证 . 请按照以下步骤操作:

    为` Headers 字段创建和索引设置自定义分析器时:

    GET /my_index
    
    {
      "settings": {
        "analysis": {
          "char_filter": {
            "&_to_and": {
              "type": "mapping",
              "mappings": [
                "&=> and ",
                "$=> dollar "
              ]
            }
          },
          "analyzer": {
            "my_analyzer": {
              "type": "custom",
              "char_filter": [
                "html_strip",
                "&_to_and"
              ],
              "tokenizer": "standard",
              "filter": [
                "lowercase"
              ]
            }
          }
        }
      }, "mappings" :{
        "test" : {
          "properties" : {
            "title" : {
              "type":"string",
              "analyzer":"my_analyzer"
            }
          }
        }
      }
    }
    

    插入数据:

    PUT my_index/test/1
    
    {
      "title": "The daft fox owes me $100"
    }
    

    检查术语向量:

    GET /my_index/test/1/_termvectors?fields=title
    

    响应:

    {
       "_index":"my_index",
       "_type":"test",
       "_id":"1",
       "_version":1,
       "found":true,
       "took":3,
       "term_vectors":{
          "title":{
             "field_statistics":{
                "sum_doc_freq":6,
                "doc_count":1,
                "sum_ttf":6
             },
             "terms":{
                "daft":{
                   "term_freq":1,
                   "tokens":[
                      {
                         "position":1,
                         "start_offset":4,
                         "end_offset":8
                      }
                   ]
                },
                "dollar100":{       <-- You can see it here
                   "term_freq":1,
                   "tokens":[
                      {
                         "position":5,
                         "start_offset":21,
                         "end_offset":25
                      }
                   ]
                },
                "fox":{
                   "term_freq":1,
                   "tokens":[
                      {
                         "position":2,
                         "start_offset":9,
                         "end_offset":12
                      }
                   ]
                },
                "me":{
                   "term_freq":1,
                   "tokens":[
                      {
                         "position":4,
                         "start_offset":18,
                         "end_offset":20
                      }
                   ]
                },
                "owes":{
                   "term_freq":1,
                   "tokens":[
                      {
                         "position":3,
                         "start_offset":13,
                         "end_offset":17
                      }
                   ]
                },
                "the":{
                   "term_freq":1,
                   "tokens":[
                      {
                         "position":0,
                         "start_offset":0,
                         "end_offset":3
                      }
                   ]
                }
             }
          }
       }
    }
    

    现在搜索:

    GET /my_index/test/_search
    
    {
      "query": {
        "match": {
          "title": "dollar100"
        }
      }
    }
    

    那将找到匹配 . 但是使用查询字符串搜索:

    GET /my_index/test/_search
    
    { "query": {
        "simple_query_string": {
          "query": "dollar100"
        }
      }
    }
    

    什么都找不到 . 因为它搜索特殊的_all字段 . 正如我所看到的那样,它会聚合字段,因为它们没有被分析:

    GET /my_index/test/_search
    
    {
      "query": {
        "match": {
          "_all": "dollar100"
        }
      }
    }
    

    找不到结果 . 但:

    GET /my_index/test/_search
    
    {
      "query": {
        "match": {
          "_all": "$100"
        }
      }
    }
    

    认定 . 我不确定,但原因可能是默认分析仪不是自定义分析仪 . 要将自定义分析器设置为默认检查:

    Changing the default analyzer in ElasticSearch or LogStash

    http://elasticsearch-users.115913.n3.nabble.com/How-we-can-change-Elasticsearch-default-analyzer-td4040411.html

    http://grokbase.com/t/gg/elasticsearch/148kwsxzee/overriding-built-in-analyzer-and-set-it-as-default

    http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-default-analyzer-td3935275.html

相关问题