任何人都可以提供有关如何解决我遇到w / mongo的零星缓慢的问题的洞察力,或根据我的慢速操作日志指出任何明显的问题?

我使用安装在运行ubuntu服务器14.04的m4.large(8GB Ram,2个虚拟内核)ec2实例上的WiredTiger引擎的Mongo v3.2.6 . 在同一服务器上安装了一个java进程,它使用许多线程从/向mongo进程读/写 . 没有副本集(我知道应该有),没有我可以看到的内存问题,但是,我一直在经历偶然的慢查询,我一直无法诊断 .

以下是2016年6月21日的完整日志,它显示了一些非常冗长的查询(10秒),查询通常在几毫秒内完成 .

2016-06-21T01:27:44.806Z I COMMAND  [conn156] update company.token query: { _id: "1111" } update: { _id: "1111", _class: "company.domain.Token", userUUID: "5555", expires: 1466476050734 } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 14071ms
2016-06-21T01:27:44.806Z I COMMAND  [conn151] update company.token query: { _id: "2222" } update: { _id: "2222", _class: "company.domain.Token", userUUID: "6666", expires: 1466476050414 } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 14391ms
2016-06-21T01:27:44.811Z I COMMAND  [conn150] insert company.count ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 14074ms
2016-06-21T01:39:45.807Z I COMMAND  [conn151] update company.token query: { _id: "3333" } update: { _id: "3333", _class: "company.domain.Token", userUUID: "7777", expires: 1466476776277 } keysExamined:0 docsExamined:0 nMatched:1 nModified:1 upsert:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 9529ms
2016-06-21T10:34:14.596Z I COMMAND  [conn156] command company.count command: aggregate { aggregate: "count", pipeline: [ { $match: { lId: "4444", $and: [ { iTS: { $gte: 1466475578989 } }, { iTS: { $lte: 1466505254467 } } ] } }, { $group: { _id: { rId: "$rId", sType: "$sType" }, count: { $sum: "$rVal" } } } ] } keyUpdates:0 writeConflicts:0 numYields:92 reslen:162 locks:{ Global: { acquireCount: { r: 190 } }, Database: { acquireCount: { r: 95 } }, Collection: { acquireCount: { r: 95 } } } protocol:op_query 113ms
2016-06-21T10:39:17.113Z I COMMAND  [conn146] command company.count command: aggregate { aggregate: "count", pipeline: [ { $match: { lId: "4444", $and: [ { iTS: { $gte: 1466475589300 } }, { iTS: { $lte: 1466505556971 } } ] } }, { $group: { _id: { rId: "$rId", sType: "$sType" }, count: { $sum: "$rVal" } } } ] } keyUpdates:0 writeConflicts:0 numYields:93 reslen:162 locks:{ Global: { acquireCount: { r: 192 } }, Database: { acquireCount: { r: 96 } }, Collection: { acquireCount: { r: 96 } } } protocol:op_query 141ms
2016-06-21T10:41:16.871Z I COMMAND  [conn156] command company.count command: aggregate { aggregate: "count", pipeline: [ { $match: { lId: "4444", $and: [ { iTS: { $gte: 1466475578989 } }, { iTS: { $lte: 1466505676744 } } ] } }, { $group: { _id: { rId: "$rId", sType: "$sType" }, count: { $sum: "$rVal" } } } ] } keyUpdates:0 writeConflicts:0 numYields:94 reslen:162 locks:{ Global: { acquireCount: { r: 194 } }, Database: { acquireCount: { r: 97 } }, Collection: { acquireCount: { r: 97 } } } protocol:op_query 110ms

慢查询似乎总是 updateinsert 操作 . 上述慢查询表中的一个集合是 company.token ,它有~55k文档,平均大小为160B . 这个集合是非常频繁地读取的(大约每15秒爆发一次,每秒大约100x) . 读取查询使用主 _id 键进行搜索,因此它正在访问主键索引 .

日志中的另一个集合是 company.count ,其中包含大约2.7M文档,平均大小为587B,并且非常频繁地写入(每15秒突发约100次插入) . 该集合具有针对收集执行的稳定聚合读取查询(也显示在日志中),每分钟几十个 . 聚合读取 match 子句命中复合索引,似乎永远不会扫描比返回的文档更多的文档( executionStats 在底部显示) . 插入查询倾向于非常快速地完成,每次插入大约1-4ms,然而,每天一次或两次mongo似乎 grab 并且有时需要20秒来完成快速操作 .

我对滞后的最好猜测来自高度并发的写入和来自日志的两个集合的读取,但是,我的印象是WiredTiger在文档级别使用了intent独占和意图共享锁,这应该允许高并发性 .

最后要注意的是,我有时会看到另一个只包含一个文档的集合,最多需要10秒才能更新,因此看起来可能会在全局或数据库级别对写入和读取队列进行最大限度的处理 . 有什么办法可以确认吗?

{
  "executionStats":{
    "executionSuccess":true,
    "nReturned":12006,
    "executionTimeMillis":19,
    "totalKeysExamined":12006,
    "totalDocsExamined":12006,
    "executionStages":{
      "stage":"FETCH",
      "nReturned":12006,
      "executionTimeMillisEstimate":20,
      "works":12007,
      "advanced":12006,
      "needTime":0,
      "needYield":0,
      "saveState":93,
      "restoreState":93,
      "isEOF":1,
      "invalidates":0,
      "docsExamined":12006,
      "alreadyHasObj":0,
      "inputStage":{
        "stage":"IXSCAN",
        "nReturned":12006,
        "executionTimeMillisEstimate":0,
        "works":12007,
        "advanced":12006,
        "needTime":0,
        "needYield":0,
        "saveState":93,
        "restoreState":93,
        "isEOF":1,
        "invalidates":0,
        "keyPattern":{
          "lId":1,
          "iTS":1
        },
        "indexName":"lId_1_iTS_1",
        "isMultiKey":false,
        "isUnique":false,
        "isSparse":false,
        "isPartial":false,
        "indexVersion":1,
        "direction":"forward",
        "indexBounds":{
          "lId":[
            "[\"d4444\", \"4444\"]"
          ],
          "iTS":[
            "[1466475578989.0, 1466505676744.0]"
          ]
        },
        "keysExamined":12006,
        "dupsTested":0,
        "dupsDropped":0,
        "seenInvalidated":0
      }
    }
  }
}