首页 文章

在增加DC1节点上的写请求后,Cassandra DC2节点关闭

提问于
浏览
0

我们在多直流集群中使用Cassandra 2.1.2(DC1上有30台服务器,DC2上有10台服务器),DC1上的密钥空间复制因子为1,DC2上为2 .

出于某种原因,当我们增加DC1上的写请求量(使用ONE或LOCAL_ONE)时,DC2节点上的cassandra java进程随机下降 .

当DC2节点开始下降时,DC1节点上的负载平均值约为3-5,而DC2节点上的负载平均值约为7-10 ..所以没什么大不了的 .

Taking a look at the Cassandra's system.log, we found some exceptions:

ERROR [SharedPool-Worker-43] 2014-11-15 00:39:48,596 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [CompactionExecutor:8] 2014-11-15 00:39:48,596 CassandraDaemon.java:153 - Exception in thread Thread[CompactionExecutor:8,1,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [Thrift-Selector_2] 2014-11-15 00:39:48,596 Message.java:238 - Got an IOException during write!
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_25]
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_25]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_25]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_25]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) ~[na:1.8.0_25]
        at org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:164) ~[libthrift-0.9.1.jar:0.9.1]
        at com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104) ~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112) ~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.Message.write(Message.java:222) ~[thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598) [thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569) [thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423) [thrift-server-0.3.7.jar:na]
        at com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.run(TDisruptorServer.java:383) [thrift-server-0.3.7.jar:na]
ERROR [Thread-94] 2014-11-15 00:39:48,597 CassandraDaemon.java:153 - Exception in thread Thread[Thread-94,5,main]
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) ~[na:1.8.0_25]
        at org.apache.cassandra.db.composites.AbstractCType.sliceBytes(AbstractCType.java:369) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:101) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:110) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:168) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:150) ~[apache-cassandra-2.1.2.jar:2.1.2]
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82) ~[apache-cassandra-2.1.2.jar:2.1.2]

Memory:

  • DC1服务器具有32 GB的RAM,HEAP配置为8 GB .

  • DC2服务器具有16 GB RAM,HEAP也配置为8 GB .

请问,任何提示?

提前致谢 .

1 回答

  • 1

    当您指定LOCAL_ONE的一致性级别时,您告诉Cassandra在其中一个本地副本收到更新后立即考虑写入请求 . 但是,请求仍会发送到所有副本 . 另一个DC中的节点同时获取请求 . 由于网络延迟,请求的实际工作很可能在写请求表明它已成功完成后不久完成 - 我猜这是另一个DC死亡的“随机”时序的来源 . 实质上,该集群中的一个或多个节点过载 .

    TL; DR:写入的LOCAL_ONE与ONE基本相同 . LOCAL_ONE仅对读取产生重大影响,其中仅查询本地DC(避免网络成本) . 上述集群正在达到其在DC2中的吞吐量上限 .

相关问题