Elasticsearch自定义排序/添加过滤器子句分数-Java 学习之路

我有这个简单的文件集：

{
  id : 1,
  book_ids : [2,3],
  collection_ids : ['a','b']
},
{
  id : 2,
  book_ids : [1,2]
}

如果我运行 filter query ，它将匹配两个文件：

{
    bool: {
        filter: [
            {
                bool: {
                    should: [
                        {
                            bool: {
                                must_not: {
                                    exists: {
                                        field: 'book_ids'
                                    }
                                }
                            }
                        },
                        {
                            bool: {
                                filter: {
                                    term: {
                                        book_ids: 2
                                    }
                                }
                            }
                        }
                    ]
                }
            },
            {
                bool: {
                    should: [
                        {
                            bool: {
                                must_not: {
                                    exists: {
                                        field: 'collection_ids'
                                    }
                                }
                            }
                        },
                        {
                            bool: {
                                filter: {
                                    term: {
                                        collection_ids: 'a'
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        ]
    }
}

问题是我想对这些文档进行排序，我希望首先返回第一个（id：1），因为它匹配 book_ids 值和提供的 collection_ids 值 .

像这样一个简单的 sort 条款不起作用：

[
  'book_ids',
  'collection_ids'
]

因为它将返回第一个文档2，因为 book_ids 数组的第一个值 .

编辑：这是我所面临的问题的简化示例，它在should子句中有N个这样的子句 . 此外，子句之间有一个顺序，因为我试图用 sort 片段反映：匹配第一个子句（ book_ids ）的结果应该出现在匹配第二个子句（ collection_ids ）的结果之前 . 我真的在寻找某种SQL排序操作，我只考虑字段数组的匹配值 . 一个可行的选择可能是根据预期的排序顺序为每个 term 子句分配递减的constant_scores，并且ES必须将这个子得分相加以计算最终得分 . 但我无法弄清楚如何做到这一点，或者是否有可能 .

奖金问题：ElasticSearch有没有办法返回某种只有匹配值的新文档？以下是我对以上 filter query 的回应：

{
  id : 1,
  book_ids : [2],
  collection_ids : ['a']
},
{
  id : 2,
  book_ids : [2]
}

1 回答

我认为你对持续得分的想法是正确的 . 我想你可以这样做：

{
  query: {
    bool: {
      must: [
        {
          bool: {
            should: [
              {
                bool: {
                  must_not: {
                    exists: {
                      field: 'book_ids'
                    }
                  }
                }
              },
              {
                constant_score: {
                  filter: {
                    term: {
                      book_ids: 2
                    }
                  },
                  boost: 100
                }
              }
            ]
          }
        },
        {
          bool: {
            should: [
              {
                bool: {
                  must_not: {
                    exists: {
                      field: 'collection_ids'
                    }
                  }
                }
              },
              {
                constant_score: {
                  filter: {
                    term: {
                      collection_ids: 'a'
                    }
                  },
                  boost: 50
                }
              }
            ]
          }
        }
      ]
    }  
  }
}

我认为使用常量分数唯一缺少的可能就是顶级查询需要 must ，而不是 filter . （过滤器没有得分，所有得分都是0.）

另一种方法是将过滤器置于 function_score 查询中（但将其保留为过滤器），然后根据需要计算得分（https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html）

至于奖金问题，你可以使用一个脚本字段来过滤和添加一个你想要的新字段（https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html），但它可能更容易，并且在收到结果后进行过滤更有意义，除非你你的 Value 观中有很长的清单 .

回复于 2024-05-02T06:52:05+08:00

Elasticsearch自定义排序/添加过滤器子句分数

1 回答

相关问题