我们在具有多个作业的独立火花簇上耗尽内存 . 在调查时我们发现了这些消息,并开始怀疑内存太少是免费的
16/09/23 12:30:38 INFO MemoryStore: Block broadcast_50802_piece0 stored as bytes in memory (estimated size 5.1 KB, free 233.5 KB)
16/09/23 12:30:38 INFO TorrentBroadcast: Reading broadcast variable 50802 took 9 ms
16/09/23 12:30:38 INFO MemoryStore: Block broadcast_50802 stored as values in memory (estimated size 11.3 KB, free 244.9 KB)
在另一个集群中,我们通常将自由报告为500MB,并且堆栈溢出上的许多日志跟踪在GB中显示为空闲 .
在分析代码之后,这条消息似乎具有误导性 . 报告的可用内存实际上是blocksMemoryused
if (enoughMemory) {
// We acquired enough memory for the block, so go ahead and put it
val entry = new MemoryEntry(value(), size, deserialized)
entries.synchronized {
entries.put(blockId, entry)
}
val valuesOrBytes = if (deserialized) "values" else "bytes"
logInfo("Block %s stored as %s in memory (estimated size %s, free %s)".format(
blockId, valuesOrBytes, Utils.bytesToString(size), Utils.bytesToString(blocksMemoryUsed)))
} else {
// Tell the block manager that we couldn't put it in memory so that it can drop it to
// disk if the block allows disk storage.
lazy val data = if (deserialized) {
Left(value().asInstanceOf[Array[Any]])
} else {
Right(value().asInstanceOf[ByteBuffer].duplicate())
}
val droppedBlockStatus = blockManager.dropFromMemory(blockId, () => data)
droppedBlockStatus.foreach { status => droppedBlocks += ((blockId, status)) }
}
文档说明其使用的内存不是免费的
/**
* Amount of storage memory, in bytes, used for caching blocks.
* This does not include memory used for unrolling.
*/
private def blocksMemoryUsed: Long = memoryManager.synchronized {
memoryUsed - currentUnrollMemory
}
问题是为什么如果它实际使用的内存或我错误解释,这被称为免费 .
1 回答
它似乎是一个bug,它在Spark 2.0中已经被fixed(在包含许多其他更改的PR中) .
实际上,报告完全错误,显示占用的内存而不是空闲内存 .