首页 文章

在关键字分析字段上应用html_strip和小写过滤器

提问于
浏览
1

我尝试在关键字分析字段上应用html_strip和小写过滤器 . 搜索时我注意到搜索结果不符合预期 .

这是我们尝试创建的索引

PUT /test_index
{
  "settings": {
  "number_of_shards": 5,
  "number_of_replicas": 0,
  "analysis": {
    "analyzer": {
      "ExportPrimaryAnalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": "lowercase",
        "char_filter": "html_strip"
      },
      "ExportRawAnalyzer": {
        "type": "custom",
        "buffer_size": "1000",
        "tokenizer": "keyword",
        "filter": "lowercase",
        "char_filter": "html_strip"
      }
    }
  }
}, 
  "mappings": {
    "test_type": {
      "properties": {
        "city": {
          "type": "string",
          "analyzer" : "ExportPrimaryAnalyzer"
        },
        "city_raw":{
          "type": "string",
          "analyzer" : "ExportRawAnalyzer"
        }
      }
    }
  }
}

以下是数据示例:

PUT test_index/test_type/4
{
  "city": "<p>I am from Pune</p>",
  "city_raw": "<p>I am from Pune</p>"
}

当我们尝试使用通配符时,我们没有得到结果 . 以下是我们尝试触发的查询 .

{
  "query": {
    "wildcard": {
      "city_raw": "i am*"
    }
  }
}

任何帮助赞赏

1 回答

  • 0

    html_strip_filter 将用 new-lines 替换html块元素 . 因此,如果您使用 keyword-tokenizer ,则需要一个额外的过滤器来将 new-lines 替换为空字符串 .

    示例:

    PUT test
    {
       "settings": {
          "number_of_shards": 5,
          "number_of_replicas": 0,
          "analysis": {
             "char_filter": {
                "remove_new_line": {
                   "type": "mapping",
                   "mappings": [
                      "\\n =>"
                   ]
                }
             },
             "analyzer": {
                "ExportPrimaryAnalyzer": {
                   "type": "custom",
                   "tokenizer": "whitespace",
                   "filter": [
                      "lowercase"
                   ],
                   "char_filter": [
                      "html_strip"
                   ]
                },
                "ExportRawAnalyzer": {
                   "type": "custom",
                   "buffer_size": "1000",
                   "tokenizer": "keyword",
                   "filter": [
                      "lowercase"
                   ],
                   "char_filter": [
                      "html_strip",
                      "remove_new_line"
                   ]
                }
             }
          }
       },
       "mappings": {
          "test_type": {
             "properties": {
                "city": {
                   "type": "string",
                   "analyzer": "ExportPrimaryAnalyzer"
                },
                "city_raw": {
                   "type": "string",
                   "analyzer": "ExportRawAnalyzer"
                }
             }
          }
       }
    }
    
    PUT test/test_type/4
    {
      "city": "<p>I am from Bangalore I like Pune too</p>",
      "city_raw": "<p>I am from Bangalore I like Pune too</p>"
    }
    
    post test/_search
    {
      "query": {
        "wildcard": {
          "city_raw": "i am *"
        }
      }
    }
    

    结果:

    "hits": [
         {
            "_index": "test",
            "_type": "test_type",
            "_id": "4",
            "_score": 1,
            "_source": {
               "city": "<p>I am from Bangalore I like Pune too</p>",
               "city_raw": "<p>I am from Bangalore I like Pune too</p>"
            }
         }
      ]
    

相关问题