Elasticsearch使用嵌套查询提升分数-Java 学习之路

我在Elasticsearch 1.3.4版中有以下查询：

{
   "filtered": {
      "query": {
         "bool": {
            "should": [
               {
                  "bool": {
                     "should": [
                        {
                           "match_phrase": {
                              "_all": "java"
                           }
                        },
                        {
                           "bool": {
                              "should": [
                                 {
                                    "match_phrase": {
                                       "_all": "adobe creative suite"
                                    }
                                 }
                              ]
                           }
                        }
                     ]
                  }
               },
               {
                  "bool": {
                     "should": [
                        {
                           "nested": {
                              "path": "skills",
                              "query": {
                                 "bool": {
                                    "must": [
                                       {
                                          "term": {
                                             "skills.name.original": "java"
                                          }
                                       },
                                       {
                                          "bool": {
                                             "should": [
                                                {
                                                   "match": {
                                                      "skills.source": {
                                                         "query": "linkedin",
                                                         "boost": 5
                                                      }
                                                   }
                                                }, 
                                                {
                                                   "match": {
                                                      "skills.source": {
                                                         "query": "meetup",
                                                         "boost": 5
                                                      }
                                                   }
                                                }                                                
                                             ]
                                          }
                                       }
                                    ],
                                    "minimum_should_match": "100%"
                                 }
                              }
                           }
                        }
                     ]
                  }
               }
            ],
            "minimum_should_match": "100%"
         }
      },
      "filter": {
         "and": [
            {
               "bool": {
                  "should": [
                     {
                        "term": {
                           "skills.name.original": "java"
                        }
                     }
                  ]
               }
            },
            {
               "bool": {
                  "should": [
                     {
                        "term": {
                           "skills.name.original": "ajax"
                        }
                     },
                     {
                        "term": {
                           "skills.name.original": "html"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

映射看起来像这样：

skills: {
    type: "nested", 
    include_in_parent: true, 
    properties: {                 
      name: {
        type: "multi_field",
        fields: {
          name: {type: "string"},
          original: {type : "string", analyzer : "string_lowercase"} 
        }              
      }                                                       
    }
  }

最后，文档结构，技能（排除其他部分），如下所示：

"skills": 
  [
    {
      "name": "java",
      "source": [
         "linkedin", 
         "facebook"
      ]
    },
    {
      "name": "html",
      "source": [
         "meetup"
      ]
    }
  ]

我对此查询的目标是，首先使用过滤器（查询的底部）过滤掉一些不相关的命中，然后通过在整个文档中搜索match_phrase“java”来评分一个人，如果它还包含match_phrase“adobe”则额外提升创造性的西装“，然后检查我们在”技能“中获得命中的嵌套值，看看技能来自哪种”来源“ . 然后根据嵌套对象的源或源提供查询 .

这种作品，至少我没有得到任何错误，但最终得分是奇怪的，很难看出它是否有效 . 如果我给出一个小的提升，让我们说2，得分略微下降，我的最高点此刻得分为32.176407，提升= 1.增加5，它下降到31.637703 . 我希望它会上升，而不是下降？增加1000，得分降至2.433376 .

这是正确的方法，还是有更好/更简单的方法？我可以改变结构和映射等 . 为什么我的分数会降低？

编辑：我稍微简化了查询，只处理了一个“技能”：

{
   "filtered": {
      "query": {
         "bool": {
            "must": [
               {
                  "bool": {
                     "must": [
                        {
                           "bool": {
                              "should": [
                                 {
                                    "match_phrase": {
                                       "_all": "java"
                                    }
                                 }
                              ],
                              "minimum_should_match": 1
                           }
                        }
                     ]
                  }
               }
            ],
            "should": [
               {
                  "nested": {
                     "path": "skills",
                     "score_mode": "avg",
                     "query": {
                        "bool": {
                           "must": [
                              {
                                 "term": {
                                    "skills.name.original": "java"
                                 }
                              }
                           ],
                           "should": [
                              {
                                 "match": {
                                    "skills.source": {
                                       "query": "linkedin",
                                       "boost": 1.2
                                    }
                                 }
                              },
                              {
                                 "match": {
                                    "skills.source": {
                                       "query": "meetup",
                                       "boost": 1.2
                                    }
                                 }
                              }
                           ]
                        }
                     }
                  }
               }
            ]
         }
      },
      "filter": {
         "and": [
            {
               "bool": {
                  "should": [
                     {
                        "term": {
                           "skills.name.original": "java"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

现在的问题是我期望两个类似的文档，唯一的区别是技能“java”的“源”值 . 它们分别是“linkedin”和“meetup” . 在我的新查询中，它们都获得相同的提升，但最终的_score对于这两个文档是非常不同的 .

从doc 1的查询说明：

"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"

对于文档二：

"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"

这些值是唯一不同的值，我不明白为什么 .

1 回答

0

我无法回答关于提升的问题，但你对索引有多少分片？每个分片计算TF和IDF而不是每个索引，这可能会产生分数差异 . https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ .

如果仅使用1个碎片重新索引，确实会改变结果吗？

编辑：此外，doc范围是分片中每个文档的文档范围，您可以使用它来计算每个文档的IDF以验证分数 .

回复于 2024-05-01T19:21:43+08:00

Elasticsearch使用嵌套查询提升分数

编辑：我稍微简化了查询，只处理了一个“技能”：

1 回答

相关问题