首页 文章

突出显示ElasticSearch自动完成功能

提问于
浏览
5

我有以下数据要在ElasticSearch上编制索引 .

enter image description here

我想实现自动完成功能,并突出显示特定文档与查询匹配的原因 .

这是我的索引的设置:

{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

Index Analyzing

  • 拆分字边界上的文本 .

  • 删除了pontuation .

  • 小写

  • Edge NGrams每个令牌

因此倒置指数看起来像:

enter image description here

这就是我为名称字段定义映射的方式:

{
    "index_type": {
        "properties": {
            "name": {
                "type":     "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

当我查询时:

GET http://localhost:9200/index/type/_search

{
    "query": {
        "match": {
            "name": "soft"
        }
    },
    "highlight": {
        "fields" : {
            "name" : {}
        }
    }
}

Search for: soft

应用标准标记符,“软”是用于在倒排索引上查找的术语 . 此搜索匹配文档:1,3,4,5,6,7这是正确的,但突出显示的部分我希望是“软”而不是整个单词:

{
  "hits": [
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG2"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "is soft ware ok"
      },
      "highlight": {
        "name": [
          "is <em>soft</em> ware ok"
        ]
      }
    }
  ]
}

Search for: software ag

应用标准标记符,将“软件ag”转换为“软件”和“ag”,以找到倒排索引 . 这个搜索匹配文档:1,3,4,5,6,这是正确的,但突出显示的部分我希望是“软件”和“ag”,而不是围绕“软件”和“ag”的整个词:

{
  "hits": [
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG2</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em> good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    }
  ]
}

I read the highlight documentation on elasticsearch, but I cannot understand how the highlighting is performed. For the two examples above I expect only the matched token on the inverted index to be highlighted and not the whole word. Can anyone help how to highlight only the passed value?

Update

所以,似乎在ElasticSearch网站上,服务器端的自动完成与我的实现类似 . 但是,它们似乎突出显示了客户端上匹配的查询 . 如果他们这样做,我开始认为在ElasticSearch方面没有合适的解决方案,所以我在服务器端实现了突出显示功能,而不是在客户端(就像他们似乎那样) .

我在服务器端的实现(使用PHP)是:

public function search($term)
{
    $params = [
        'index' => $this->getIndexName(),
        'type' => $this->getIndexType(),
        'body' => [
            'query' => [
                'match' => [
                    'name' => $term
                ]
            ]
        ]
    ];

    $results = $this->client->search($params);

    $hits = $results['hits']['hits'];

    $data = [];

    $wrapBefore = '<strong>';
    $wrapAfter = '</strong>';

    foreach ($hits as $hit) {
        $data[] = [
            $hit['_source']['id'],
            $hit['_source']['name'],
            preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
        ];
    }

    return $data;
}

输出我对此问题的目标:

enter image description here

我添加了一笔赏金,看看ElasticSearch级别是否有解决方案来实现我上面描述的内容 .

1 回答

  • 1

    截至目前使用最新版本的弹性这是不可能的,因为高亮度文档不会引用任何设置或查询 . 我在xhr请求选项卡下的浏览器控制台中检查了弹性自动完成示例,并找到关键字“att”自动完成响应的响应,如下所示 .

    url - https://search.elastic.co/suggest?q=att
        {
            "current_page": 1,
            "last_page": 4,
            "total_hits": 49,
            "hits": [
                {
                    "tags": [],
                    "url": "/elasticon/tour/2016/jp/not-attending",
                    "section": "Elasticon",
                    "title": "Not <em>Attending</em> - JP"
                },
                {
                    "section": "Elasticon",
                    "title": "<em>Attending</em> from Training - JP",
                    "tags": [],
                    "url": "/elasticon/tour/2016/jp/attending-training"
                },
                {
                    "tags": [],
                    "url": "/elasticon/tour/2016/jp/attending-keynote",
                    "title": "<em>Attending</em> from Keynote - JP",
                    "section": "Elasticon"
                },
                {
                    "tags": [],
                    "url": "/elasticon/tour/2016/not-attending",
                    "section": "Elasticon",
                    "title": "Thank You - Not <em>Attending</em>"
                },
                {
                    "tags": [],
                    "url": "/elasticon/tour/2016/attending",
                    "section": "Elasticon",
                    "title": "Thank You - <em>Attending</em>"
                },
                {
                    "section": "Blog",
                    "title": "What It's Like to <em>Attend</em> Elastic Training",
                    "tags": [],
                    "url": "/blog/what-its-like-to-attend-elastic-training"
                },
                {
                    "tags": "Elasticsearch",
                    "url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
                    "section": "Docs/",
                    "title": "Highlighting <em>attachments</em>"
                },
                {
                    "title": "<em>attachments</em> » email",
                    "section": "Docs/",
                    "tags": "Logstash",
                    "url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
                },
                {
                    "section": "Docs/",
                    "title": "Configuring Email <em>Attachments</em> » Actions",
                    "tags": "Watcher",
                    "url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
                },
                {
                    "url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
                    "tags": "Watcher",
                    "title": "HipChat Action <em>Attributes</em> » Actions",
                    "section": "Docs/"
                },
                {
                    "title": "Slack Action <em>Attributes</em> » Actions",
                    "section": "Docs/",
                    "tags": "Watcher",
                    "url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
                }
            ],
            "aggs": {
                "sections": [
                    {
                        "Elasticon": 5
                    },
                    {
                        "Blog": 1
                    },
                    {
                        "Docs/": 43
                    }
                ],
                "top_tags": [
                    {
                        "XPack": 14
                    },
                    {
                        "Elasticsearch": 12
                    },
                    {
                        "Watcher": 9
                    },
                    {
                        "Logstash": 4
                    },
                    {
                        "Clients": 3
                    },
                    {
                        "Shield": 1
                    }
                ]
            }
        }
    

    但是在前端,他们只是在autosuggest结果中显示“att” . 因此,他们正在处理浏览器层上的突出显示内容 .

相关问题