MongoDB - 聚合性能调优-Java 学习之路

我的一个聚合管道运行速度很慢 .

关于该系列

该集合名为 Document ，每个文档可以属于多个广告系列，并且位于五个雕像之一'a'至'e'中 . 一小部分文档可能不属于任何文档，其 campaigns 字段设置为 null .

样本文件：

{_id:id,  campaigns:['c1', 'c2], status:'a', ...other fields...}

一些收集统计数据

文件数量：仅限200万:(
尺寸：2GB
平均文档大小：980字节 .
存储大小：780MB
总索引大小：134MB
索引数：12
文档中的字段数：30-40，可能有数组或对象 .

关于查询

如果状态位于['a'，'b'，'c']，则该查询的目标是计算每个广告系列每个广告系列的文档数量

[
    {$match:{campaigns:{$ne:null}, status:{$in:['a','b','c']}}},
    {$unwind:'$campaigns'},
    {$group:{_id:{campaign:'$campaigns', status:'$status'}, total:{$sum:1}}}
]

预计聚合将几乎击中整个集合 . 如果没有索引，聚合将在 8 seconds 左右完成 .

我试图创建一个索引

{campaings:1, status:1}

解释计划显示已扫描索引，但聚合需要 near 11 seconds 才能完成 .

问题

索引包含聚合执行计数所需的所有字段 . 聚合是否应仅仅触及索引？该索引的大小仅为10MB . 它怎么会慢？如果没有索引，还有其他任何调整查询的建议吗？

获胜计划显示：

{
    "stage" : "FETCH",
    "filter" : {"$not" : {"campaigns" : {"$eq" : null}}},
    "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {"campaigns" : 1.0,"status" : 1.0},
        "indexName" : "campaigns_1_status_1",
        "isMultiKey" : true,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
            "campaigns" : ["[MinKey, null)", "(null, MaxKey]"],
            "status" : [ "[\"a\", \"a\"]", "[\"b\", \"b\"]", "[\"c\", \"c\"]"]
        }
    }
}

如果没有索引，中奖计划：

{
    "stage" : "COLLSCAN",
    "filter" : {
        "$and":[
            {"status": {"$in": ["a", "b", "c"]}},
            {"$not" : {"campaigns": {"$eq" : null}}}
        ]
    },
    direction" : "forward"
}

更新

根据@Kevin的要求，这里有一些关于其他所有索引的细节，大小以MB为单位 .

"indexSizes" : {
    "_id_" : 32,
    "team_1" : 8, //Single value field of ObjectId
    "created_time_1" : 16, //Document publish time in source system.
    "parent_1" : 2, //_id of parent document. 
    "by.id_1" : 13, //_id of author from a different collection. 
    "feedids_1" : 8, //Array, _id of ETL jobs contributing to sync of this doc.
    "init_-1" : 2, //Initial load time of the doc.
    "campaigns_1" : 10, //Array, _id of campaigns
    "last_fetch_-1" : 13, //Last sync time of the doc. 
    "categories_1" : 8, //Array, _id of document categories. 
    "status_1" : 8, //Status
    "campaigns_1_status_1" : 10 //Combined index of campaign _id and status. 
},

1 回答

0
在阅读MongoDB的文档后，我发现了这个：

不等运算符$ ne不是很有选择性，因为它经常匹配索引的大部分 . 因此，在许多情况下，带有索引的$ ne查询可能不会比必须扫描集合中所有文档的$ ne查询执行得更好 . 另请参见查询选择性 .

使用$ type运算符查看一些不同的文章可能会解决问题 .

你可以使用这个查询：
```
db.data.aggregate([
    {$match:{campaigns:{$type:2},status:{$in:["a","b","c"]}}},
    {$unwind:'$campaigns'},
    {$group:{_id:{campaign:'$campaigns', status:'$status'}, total:{$sum:1}}}])
```
回复于 2024-05-12T17:09:06+08:00

MongoDB - 聚合性能调优

关于该系列

关于查询

问题

更新

1 回答

相关问题