首页 文章

如何使用Elasticsearch获取每个文档的平均丢失字段数?

提问于
浏览
1

简而言之:使用Elasticsearch,给定一个字段列表,如何将每个文档的平均缺失字段数作为聚合?

详情

使用 missing 聚合类型,我可以获得缺少给定字段的文档总数 . 所以有以下数据:

"hits": [{
    "name": "A name",
    "nickname": "A nickname",
    "bestfriend": "A friend",
    "hobby": "An hobby"
},{
    "name": "A name",
    "hobby": "An hobby"
},{
    "name": "A name",
    "nickname": "A nickname",
    "hobby": "An hobby"
},{
    "name": "A name",
    "bestfriend": "A friend"
}]

我可以运行以下查询:

{
    "aggs": {
        "name_missing": {
            "missing": {"field": "name"}
        },
        "nickname_missing": {
            "missing": {"field": "nickname"}
        },
        "hobby_missing": {
            "missing": {"field": "hobby"}
        },
        "bestfriend_missing": {
            "missing": {"field": "bestfriend"}
        }
    }
}

我得到以下聚合:

...
"aggregations": {
    "name_missing": {
        "doc_count": 0
    },
    "nickname_missing": {
        "doc_count": 2
    },
    "hobby_missing": {
        "doc_count": 1
    },
    "bestfriend_missing": {
        "doc_count": 1
    }   
}
...

我现在需要的是为每个文档获取 average number of missing fields . 我可以通过代码对结果进行数学计算:

  • 总和所有 missing 聚合 doc_count

  • 除以总点击次数

但是,如何从Elasticsearch获得与聚合相同的结果?

感谢您的任何回复/建议 .

1 回答

  • 1

    这是一个丑陋的解决方案,但它可以解决问题 .

    GET missing/missing/_search
    {
      "size": 0,
      "aggs": {
        "result": {
          "terms": {
            "script": "'aaa'"
          },
          "aggs": {
            "name_missing": {
              "missing": {
                "field": "name"
              }
            },
            "nickname_missing": {
              "missing": {
                "field": "nickname"
              }
            },
            "hobby_missing": {
              "missing": {
                "field": "hobby"
              }
            },
            "bestfriend_missing": {
              "missing": {
                "field": "bestfriend"
              }
            },
            "avg_missing": {
              "bucket_script": {
                "buckets_path": {            // This is kind of defining variables. name_missing._count will take the doc_count of the name_missing aggregation and same for others(nickname_missing,hobby_missing,bestfriend_missing) as well. "count":"_count" will take doc_count of the documents on which aggregation is performed(total no. of Hits).
                  "name_missing": "name_missing._count",
                  "nickname_missing": "nickname_missing._count",
                  "hobby_missing": "hobby_missing._count",
                  "bestfriend_missing": "bestfriend_missing._count",
                  "count":"_count"
                },
                "script": "(name_missing+nickname_missing+hobby_missing+bestfriend_missing)/count" // Here we are adding all the missing values and dividing it by the total no. of Hits as you require.
              }
            }
          }
        }
      }
    }
    

    我已经告诉你如何做到这一点,现在它是你想要按摩你的参数和提取你想要的东西 .

相关问题