首页 文章

Cassandra以OutOfMemory(OOM)错误终止

提问于
浏览
2

我们在AWS上有一个3节点的cassandra集群 . 这些节点运行cassandra 1.2.2并具有8GB内存 . 我们没有更改任何默认堆或GC设置 . 因此每个节点分配1.8GB的堆空间 . 行很宽;每行存储大约260,000列 . 我们正在使用Astyanax阅读数据 . 如果我们的应用程序试图同时从10行或更多行读取80,000列,则某些节点会耗尽堆空间并以OOM错误终止 . 这是错误消息:

java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
        at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
        at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
        at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
        at org.apache.cassandra.db.Table.getRow(Table.java:355)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space
        at java.lang.Long.toString(Long.java:269)
        at java.lang.Long.toString(Long.java:764)
        at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171)
        at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068)
        at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192)
        at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766)
        at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722) ERROR 02:14:05,350 Exception in thread Thread[ACCEPT-/10.0.0.170,5,main] java.lang.RuntimeException: java.nio.channels.ClosedChannelException
        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:893) Caused by: java.nio.channels.ClosedChannelException
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:211)
        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:99)
        at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:882)

每列中的数据少于50个字节 . 添加所有列开销(列名元数据)后,它不应超过100个字节 . 因此,从10行读取80,000列每个意味着我们正在读取80,000 * 10 * 100 = 80 MB的数据 . 它很大,但不够大,无法填满1.8 GB的堆 . 所以我想知道为什么堆满了 . 如果数据请求很大以填充合理的时间,我希望Cassandra返回TimeOutException而不是终止 .

一个简单的解决方案是增加堆大小,但这只会掩盖问题 . 读取80MB的数据不应该使1.8 GB堆满 .

是否有一些其他Cassandra设置我可以调整以防止OOM异常?

1 回答

  • 0

    不,我读取数据时没有正在进行的写操作 . 我确信增加堆空间可能会有所帮助 . 但我试图理解为什么读取80MB数据会使1.8GB堆满 .

    Cassandra使用Heap和OfHeap chaching . 首次加载80MB用户数据可能会导致200-400 MB的Java堆使用率 . (哪个vm?64位?)其次,这个内存被添加到已经用于缓存的内存中 . 它表明cassandra不会释放缓存以提供私人查询 . 可以提高吞吐量 .

    您是否同时通过增加MaxHeap来解决您的问题?

相关问题