我有以下数据要在ElasticSearch上编制索引 .
我想实现自动完成功能,并突出显示特定文档与查询匹配的原因 .
这是我的索引的设置:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"autocomplete_filter"
]
}
}
}
}
}
Index Analyzing
-
拆分字边界上的文本 .
-
删除了pontuation .
-
小写
-
Edge NGrams每个令牌
因此倒置指数看起来像:
这就是我为名称字段定义映射的方式:
{
"index_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
当我查询时:
GET http://localhost:9200/index/type/_search
{
"query": {
"match": {
"name": "soft"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}
Search for: soft
应用标准标记符,“软”是用于在倒排索引上查找的术语 . 此搜索匹配文档:1,3,4,5,6,7这是正确的,但突出显示的部分我希望是“软”而不是整个单词:
{
"hits": [
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
},
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> AG"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> AG2"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> AG good <em>software</em> better"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> AG"
]
}
},
{
"_source": {
"name": "is soft ware ok"
},
"highlight": {
"name": [
"is <em>soft</em> ware ok"
]
}
}
]
}
Search for: software ag
应用标准标记符,将“软件ag”转换为“软件”和“ag”,以找到倒排索引 . 这个搜索匹配文档:1,3,4,5,6,这是正确的,但突出显示的部分我希望是“软件”和“ag”,而不是围绕“软件”和“ag”的整个词:
{
"hits": [
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG2</em>"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em> good <em>software</em> better"
]
}
},
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
}
]
}
I read the highlight documentation on elasticsearch, but I cannot understand how the highlighting is performed. For the two examples above I expect only the matched token on the inverted index to be highlighted and not the whole word. Can anyone help how to highlight only the passed value?
Update
所以,似乎在ElasticSearch网站上,服务器端的自动完成与我的实现类似 . 但是,它们似乎突出显示了客户端上匹配的查询 . 如果他们这样做,我开始认为在ElasticSearch方面没有合适的解决方案,所以我在服务器端实现了突出显示功能,而不是在客户端(就像他们似乎那样) .
我在服务器端的实现(使用PHP)是:
public function search($term)
{
$params = [
'index' => $this->getIndexName(),
'type' => $this->getIndexType(),
'body' => [
'query' => [
'match' => [
'name' => $term
]
]
]
];
$results = $this->client->search($params);
$hits = $results['hits']['hits'];
$data = [];
$wrapBefore = '<strong>';
$wrapAfter = '</strong>';
foreach ($hits as $hit) {
$data[] = [
$hit['_source']['id'],
$hit['_source']['name'],
preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
];
}
return $data;
}
输出我对此问题的目标:
我添加了一笔赏金,看看ElasticSearch级别是否有解决方案来实现我上面描述的内容 .
1 回答
截至目前使用最新版本的弹性这是不可能的,因为高亮度文档不会引用任何设置或查询 . 我在xhr请求选项卡下的浏览器控制台中检查了弹性自动完成示例,并找到关键字“att”自动完成响应的响应,如下所示 .
但是在前端,他们只是在autosuggest结果中显示“att” . 因此,他们正在处理浏览器层上的突出显示内容 .