我尝试在关键字分析字段上应用html_strip和小写过滤器 . 搜索时我注意到搜索结果不符合预期 .
这是我们尝试创建的索引
PUT /test_index
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"ExportPrimaryAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": "lowercase",
"char_filter": "html_strip"
},
"ExportRawAnalyzer": {
"type": "custom",
"buffer_size": "1000",
"tokenizer": "keyword",
"filter": "lowercase",
"char_filter": "html_strip"
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"city": {
"type": "string",
"analyzer" : "ExportPrimaryAnalyzer"
},
"city_raw":{
"type": "string",
"analyzer" : "ExportRawAnalyzer"
}
}
}
}
}
以下是数据示例:
PUT test_index/test_type/4
{
"city": "<p>I am from Pune</p>",
"city_raw": "<p>I am from Pune</p>"
}
当我们尝试使用通配符时,我们没有得到结果 . 以下是我们尝试触发的查询 .
{
"query": {
"wildcard": {
"city_raw": "i am*"
}
}
}
任何帮助赞赏
1 回答
html_strip_filter
将用new-lines
替换html块元素 . 因此,如果您使用keyword-tokenizer
,则需要一个额外的过滤器来将new-lines
替换为空字符串 .示例:
结果: