首页 文章

从Spark工作人员读取和写入Cassandra会引发错误

提问于
浏览
2

我正在使用Datastax Cassandra java驱动程序从spark worker写入Cassandra . 代码段

rdd.foreachPartition(record => {
      val cluster = SimpleApp.connect_cluster(Spark.cassandraip)
      val session = cluster.connect()
      record.foreach { case (bin_key: (Int, Int), kpi_map_seq: Iterable[Map[String, String]]) => {
        kpi_map_seq.foreach { kpi_map: Map[String, String] => {
          update_tables(session, bin_key, kpi_map)
        }
        }
      }
      } //record.foreach
      session.close()
      cluster.close()
    }

在阅读时我正在使用spark cassandra连接器(我在内部使用相同的驱动程序)

val bin_table = javaFunctions(Spark.sc).cassandraTable("keyspace", "bin_1")
      .select("bin").where("cell = ?", cellname) // assuming this will run on worker nodes
    println(s"get_bins_for_cell Count of Bins for Cell $cellname is ", cell_bin_table.count())
    return bin_table

每次执行此操作不会导致任何问题 . 一起做就是抛弃这个堆栈跟踪 .

我的主要目标是不直接从Spark驱动程序进行写入或读取 . 似乎它仍然需要对上下文做些什么;两个上下文被使用?

16/07/06 06:21:29 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 22, euca-10-254-179-202.eucalyptus.internal): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1222)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

1 回答

  • 0

    在使用与Cassandra的会话之后,Spark Context将被关闭

    def update_table_using_cassandra_driver() ={
     CassandraConnector(SparkWriter.conf).withSessionDo { session =>
     val statement_4: Statement = QueryBuilder.insertInto("keyspace", "table")
              .value("bin", my_tuple_value)
              .value("cell", my_val("CName"))
      session.executeAsync(statement_4)
      ...
    }
    

    所以下次我在循环中调用它时,我得到了异常 . 看起来像Cassandra驱动程序中的错误;必须检查这个 . 暂时做了以下事情来解决这个问题

    for(a <- 1 to 1000) {
      val sc = new SparkContext(SparkWriter.conf)
      update_table_using_cassandra_driver()
      sc.stop()
      ...sleep(xxx)
     }
    

相关问题