MongoDB嵌套对象聚合计数-Java 学习之路

我有一个高度嵌套的mongoDB对象集，我想计算匹配给定条件 Edit: (in each document) 的子文档数 . 例如：

{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
    {
        "study_id":"Study1",
        "samples":[
            {
                "sample_id":"NA00001",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"NA00002",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE1",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE2",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE3",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE7",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}

我想知道有多少子文档包含GT：“1 | 0”，在这种情况下，在第一个文档中为1，在第二个文档中为2，在第3个文档中为0 . 我已经尝试了展开和聚合函数，但我显然没有做正确的事情 . 当我尝试通过“GT”字段计算子文档时，mongo抱怨：

db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])

因为我的小组的名字不能包含“ . ”，但如果我把它们留下：

db.collection.aggregate([{$group: {"$GT":1,_id:0}}])

它抱怨因为“$ GT不能是运营商名称”

有任何想法吗？

1 回答

使用数组时需要处理$unwind，并且需要执行三次：

db.collection.aggregate([

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": "$studies.samples.formdata.GT",
         "count": { "$sum": 1 }
     }}
 ])

理想情况下，您希望过滤输入 . 可能在处理$ unwind之前和之后使用$match执行此操作，并使用$regex来匹配文档，其中数据以"1"开头 .

db.collection.aggregate([

     // Match first to exclude documents where this is not present in any array member
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Match to filter
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": {
              "_id": "$_id",
              "key": "$studies.samples.formdata.GT"
         },
         "count": { "$sum": 1 }
     }}
 ])

请注意，在所有情况下，“$ dollar”前缀条目是指向文档属性的“变量” . 这些是在右侧使用输入的“值” . 必须将左侧“键”指定为普通字符串键 . 没有变量可用于命名键 .

回复于 2024-05-02T05:02:43+08:00

MongoDB嵌套对象聚合计数

1 回答

相关问题