首页 文章

Mongo聚合查询返回的重复数据删除结果

提问于
浏览
0

一些背景:

这涉及3个集合:

  • 帖子

  • postsubcategories

  • postsupercategories

帖子中的文件示例:

{
    "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"),
    "__v" : 6,
    "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"),
    "postSubCategories" : [ 
        ObjectId("5806344baa0bbf284a2316e4")//reference to document in postsubcategories collection
    ],
    "postSuperCategories" : [ 
        ObjectId("580679958a5f5f448ba5aae9"), 
        ObjectId("580679958a5f5f448ba5aaf2")//references to documents in postsupercategories collection
    ],
    "publishedDate" : ISODate("2016-10-10T04:00:00.000Z"),
    "state" : "published",
    "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"),
    "title" : "My title",
    "topics" : []}

我的疑问是

db.posts.aggregate([
{'$unwind': 
    {'path':"$postSubCategories"}
},
{'$lookup': {
  'from':"postsubcategories",
  'localField': "postSubCategories",
  'foreignField': "_id",
  'as': "subObject"
}},
{'$unwind': 
    {'path':"$postSuperCategories"}
},
{'$lookup': {
  'from':"postsupercategories",
  'localField': "postSuperCategories",
  'foreignField': "_id",
  'as': "superObject"
}},
{'$match': {
    '$or':
        [{ "subObject.searchKeywords": "home monitor" }, 
        { "superObject.searchKeywords": "home monitor" }]
    }
},
{'$match': {
    "state": "published"
}}

postsubcategories和postsupercategories集合都包含一个名为searchKeywords的字段,该字段是其文档中的文本数组 . 我希望能够查询这些searchKeywords字段并返回匹配的帖子文档 . 我需要一组重复的返回_id .

该查询返回四个文档 . 例:

ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
ObjectId("57fbf40b7ccbc906ed87cef7")

我理解为什么它返回4.一个文档包含postSubCategories对象 5806344baa0bbf284a2316e4 和postSuperCategories id 580679958a5f5f448ba5aae9 .

第二个文档包含postSubCategories对象 5806344baa0bbf284a2316e4 和postSuperCategories 580679958a5f5f448ba5aaf2 . 这是第二篇文章的重复

有没有办法可以根据返回的_id“重复数据删除”?

我的最终结果是:

ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")

我知道从技术上来说,4个列表中的2个匹配_id不完全相同,因为它们每个都包含一个不同的postSuperCategories对象,但此时我不再关心那个字段了,只需要一个单独的帖子文档,因为我需要访问其他字段 .

任何帮助将不胜感激 . 我试过调查 $group$addToSet$setUnion 并且到目前为止都没有成功 .

1 回答

  • 1

    您可以添加一个 $group 检索distinct _id ,其中为每个要为 _id 提取的属性找到的第一个值 .

    对于 $group 聚合:

    {
        '$group': {
            _id: '$_id',
            item: { $first: "$$ROOT" } 
        }
    }
    

    这将为您提供 item 字段中root document的第一项:

    { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "items" : { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-12-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef4"), "title" : "My title2", "topics" : [ "a", "b" ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
    { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "items" : { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
    

    否则,为了在响应中选择一个字段:

    {
        '$group': {
            _id: '$_id',
            author: {
                $first: '$author'
            },
            publishedDate: {
                $first: '$publishedDate'
            },
            state: {
                $first: '$state'
            },
            templateName: {
                $first: '$templateName'
            },
            title: {
                $first: '$title'
            },
            topics: {
                $first: '$topics'
            }
        }
    }
    

    你会得到类似的东西:

    { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
    { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
    

相关问题