首页 文章

在Apache spark Memory MemoryStore中tryToPut是什么意思

提问于
浏览
1

我们在具有多个作业的独立火花簇上耗尽内存 . 在调查时我们发现了这些消息,并开始怀疑内存太少是免费的

16/09/23 12:30:38 INFO MemoryStore: Block broadcast_50802_piece0 stored as bytes in memory (estimated size 5.1 KB, free 233.5 KB)
16/09/23 12:30:38 INFO TorrentBroadcast: Reading broadcast variable 50802 took 9 ms
16/09/23 12:30:38 INFO MemoryStore: Block broadcast_50802 stored as values in memory (estimated size 11.3 KB, free 244.9 KB)

在另一个集群中,我们通常将自由报告为500MB,并且堆栈溢出上的许多日志跟踪在GB中显示为空闲 .

在分析代码之后,这条消息似乎具有误导性 . 报告的可用内存实际上是blocksMemoryused

https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

if (enoughMemory) {
        // We acquired enough memory for the block, so go ahead and put it
        val entry = new MemoryEntry(value(), size, deserialized)
        entries.synchronized {
          entries.put(blockId, entry)
        }
        val valuesOrBytes = if (deserialized) "values" else "bytes"
        logInfo("Block %s stored as %s in memory (estimated size %s, free %s)".format(
          blockId, valuesOrBytes, Utils.bytesToString(size), Utils.bytesToString(blocksMemoryUsed)))
      } else {
        // Tell the block manager that we couldn't put it in memory so that it can drop it to
        // disk if the block allows disk storage.
        lazy val data = if (deserialized) {
          Left(value().asInstanceOf[Array[Any]])
        } else {
          Right(value().asInstanceOf[ByteBuffer].duplicate())
        }
        val droppedBlockStatus = blockManager.dropFromMemory(blockId, () => data)
        droppedBlockStatus.foreach { status => droppedBlocks += ((blockId, status)) }
      }

文档说明其使用的内存不是免费的

/**
   * Amount of storage memory, in bytes, used for caching blocks.
   * This does not include memory used for unrolling.
   */
  private def blocksMemoryUsed: Long = memoryManager.synchronized {
    memoryUsed - currentUnrollMemory
  }

问题是为什么如果它实际使用的内存或我错误解释,这被称为免费 .

1 回答

  • 2

    它似乎是一个bug,它在Spark 2.0中已经被fixed(在包含许多其他更改的PR中) .

    实际上,报告完全错误,显示占用的内存而不是空闲内存 .

相关问题