我想使用功能得分查询和文本接近度与权重 . 但查询无法正确计算"query.function_score.functions"中"match_phrase"的分数
例如,让's say I' m创建策展媒体并放置“ Financial articles in 2017 ”的 Banner 链接 .
我想过滤并得分如下,
-
过滤器
-
必须在2017年创建文章 .
-
该类别必须为"finance" .
-
得分
-
文章越多,得分越高 .
-
如果文章在过去1个月内有评论,则得分越高 .
-
如果文章有特定标签,则得分越高 .
-
(标签可能超过 100+ 个字)
和数据有先决条件,
-
前提条件
-
数据集超过200万个文档
-
篇文章必须有一个"category"
-
篇文章可能有一个或多个"tags"
单篇文章中 -
标签可能超过 1000+
-
"tags_text"是字符串文本,它是按字母顺序排列的,并由空格连接
-
ref:[Finding most similar arrays of integers in elasticsearch
-
"favorite"是人们将文章设置为"favorite"的数字(例如,类似Facebook的按钮)
示例数据和查询
// create index
$ curl -XPUT 'http://localhost:9200/blog'
放文章,
// create articles
curl -XPUT http://localhost:9200/blog/article/1 -d '
{
"article_id": 1,
"title": "Fintech company list in London",
"tags": ["fintech", "uk", "london"],
"tags_text": "fintech london uk",
"category": "finance",
"created_at": "2016-12-01T00:00:00Z",
"last_comment_at": null,
"favorite": 100
}'
curl -XPUT http://localhost:9200/blog/article/2 -d '
{
"article_id": 2,
"title": "World economy",
"tags": ["world", "economy", "regression", "war"],
"tags_text": "economy regression war world",
"category": "finance",
"created_at": "2017-02-15T00:00:00Z",
"last_comment_at": "2017-11-01T00:00:00Z",
"favorite": 20
}'
curl -XPUT http://localhost:9200/blog/article/3 -d '
{
"article_id": 3,
"title": "Bitcoin bubble",
"tags": ["bitcoin", "bubble", "btc", "mtgox", "wizsec"],
"tags_text": "bitcoin btc bubble mtgox wizsec",
"category": "finance",
"created_at": "2017-08-03T00:00:00Z",
"last_comment_at": null,
"favorite": 50
}'
curl -XPUT http://localhost:9200/blog/article/4 -d '
{
"article_id": 4,
"title": "Virtual currency in China",
"tags": ["bitcoin", "ico", "china"],
"tags_text": "bitcoin china ico",
"category": "finance",
"created_at": "2017-09-03T00:00:00Z",
"last_comment_at": null,
"favorite": 10
}'
curl -XPUT http://localhost:9200/blog/article/5 -d '
{
"article_id": 5,
"title": "Average FX rate in 2017-10",
"tags": ["fx", "currency", "doller"],
"tags_text": "currency doller fx",
"category": "finance",
"created_at": "2017-11-01T00:00:00Z",
"last_comment_at": null,
"favorite": 10
}'
curl -XPUT http://localhost:9200/blog/article/6 -d '
{
"article_id": 6,
"title": "Cat and Dog",
"tags": ["pet", "cat", "dog", "family"],
"tags_text": "cat dog family pet",
"category": "pet",
"created_at": "2017-11-02T00:00:00Z",
"last_comment_at": null,
"favorite": 500
}'
然后执行查询,
curl -XGET 'http://localhost:9200/blog/article/_search' -d '
{
"_source": {
"includes": ["article_id", "title", "tags_text"]
},
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"factor": 1,
"modifier": "log",
"field": "favorite"
},
"weight": 0.3
},
{
"filter": {
"range": {
"last_comment_at": {
"from": "now-30d",
"to": null,
"include_lower": true,
"include_upper": false
}
}
},
"weight": 0.3
},
{
"filter": {
"match_phrase": {
"tags_text": {
"query": "bitcoin fintech smartphone",
"slop": 100
}
}
},
"weight": 0.4
}
],
"query": {
"bool": {
"filter": [
{"term": {"category": "finance"} },
{
"range": {
"created_at": {
"from": "2017-01-01T00:00:00",
"to": "2017-12-31T23:59:59",
"include_lower": true,
"include_upper": true
}
}
}
],
"must": {
"match_all": {}
}
}
},
"score_mode": "sum"
}
}
}'
结果如下,
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.69030905,
"hits": [
{
"_index": "blog",
"_type": "article",
"_id": "2",
"_score": 0.69030905,
"_source": {
"article_id": 2,
"tags_text": "economy regression war world",
"title": "World economy"
}
},
{
"_index": "blog",
"_type": "article",
"_id": "3",
"_score": 0.509691,
"_source": {
"article_id": 3,
"tags_text": "bitcoin btc bubble mtgox wizsec",
"title": "Bitcoin bubble"
}
},
{
"_index": "blog",
"_type": "article",
"_id": "5",
"_score": 0.3,
"_source": {
"article_id": 5,
"tags_text": "currency doller fx",
"title": "Average FX rate in 2017-10"
}
},
{
"_index": "blog",
"_type": "article",
"_id": "4",
"_score": 0.3,
"_source": {
"article_id": 4,
"tags_text": "bitcoin china ico",
"title": "Virtual currency in China"
}
}
]
}
}
我用"explain"检查了结果,但似乎对"tags_text"字段的"match_phrase"查询根本不影响得分 .
如何使用加权相似性评分和功能评分查询? (我通过ES v2.4.0查看)