在关键字分析字段上应用html_strip和小写过滤器-Java 学习之路

我尝试在关键字分析字段上应用html_strip和小写过滤器 . 搜索时我注意到搜索结果不符合预期 .

这是我们尝试创建的索引

PUT /test_index
{
  "settings": {
  "number_of_shards": 5,
  "number_of_replicas": 0,
  "analysis": {
    "analyzer": {
      "ExportPrimaryAnalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": "lowercase",
        "char_filter": "html_strip"
      },
      "ExportRawAnalyzer": {
        "type": "custom",
        "buffer_size": "1000",
        "tokenizer": "keyword",
        "filter": "lowercase",
        "char_filter": "html_strip"
      }
    }
  }
}, 
  "mappings": {
    "test_type": {
      "properties": {
        "city": {
          "type": "string",
          "analyzer" : "ExportPrimaryAnalyzer"
        },
        "city_raw":{
          "type": "string",
          "analyzer" : "ExportRawAnalyzer"
        }
      }
    }
  }
}

以下是数据示例：

PUT test_index/test_type/4
{
  "city": "<p>I am from Pune</p>",
  "city_raw": "<p>I am from Pune</p>"
}

当我们尝试使用通配符时，我们没有得到结果 . 以下是我们尝试触发的查询 .

{
  "query": {
    "wildcard": {
      "city_raw": "i am*"
    }
  }
}

任何帮助赞赏

1 回答

html_strip_filter 将用 new-lines 替换html块元素 . 因此，如果您使用 keyword-tokenizer ，则需要一个额外的过滤器来将 new-lines 替换为空字符串 .

示例：

PUT test
{
   "settings": {
      "number_of_shards": 5,
      "number_of_replicas": 0,
      "analysis": {
         "char_filter": {
            "remove_new_line": {
               "type": "mapping",
               "mappings": [
                  "\\n =>"
               ]
            }
         },
         "analyzer": {
            "ExportPrimaryAnalyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase"
               ],
               "char_filter": [
                  "html_strip"
               ]
            },
            "ExportRawAnalyzer": {
               "type": "custom",
               "buffer_size": "1000",
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ],
               "char_filter": [
                  "html_strip",
                  "remove_new_line"
               ]
            }
         }
      }
   },
   "mappings": {
      "test_type": {
         "properties": {
            "city": {
               "type": "string",
               "analyzer": "ExportPrimaryAnalyzer"
            },
            "city_raw": {
               "type": "string",
               "analyzer": "ExportRawAnalyzer"
            }
         }
      }
   }
}

PUT test/test_type/4
{
  "city": "<p>I am from Bangalore I like Pune too</p>",
  "city_raw": "<p>I am from Bangalore I like Pune too</p>"
}

post test/_search
{
  "query": {
    "wildcard": {
      "city_raw": "i am *"
    }
  }
}

结果：

"hits": [
     {
        "_index": "test",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
           "city": "<p>I am from Bangalore I like Pune too</p>",
           "city_raw": "<p>I am from Bangalore I like Pune too</p>"
        }
     }
  ]

回复于 2024-05-05T00:52:18+08:00

在关键字分析字段上应用html_strip和小写过滤器

1 回答

相关问题