首页 文章

分区键部件URL的空值无效

提问于
浏览
0

我有以下代码尝试在spark中加入2个cassandra表 .

val imageKeywords = sc.cassandraTable[ImageMetadata]("images", "metadata")
 val imageAndPageKeywords = imageKeywords
  .joinWithCassandraTable[PagesMetadata]("pages2", "metadata")
  .on(SomeColumns("tid", "url" as "pu"))

我用来映射数据的案例类如下

case class ImageMetadata(tid: String, iu: String, pu: Option[String],
mk: List[String], fk: List[String], ak: List[String], ipk: List[String], pk: List[String], ik: List[String], ck: List[String])

case class PagesMetadata(tid: String, url: String, pk: List[String], uk: List[String], hk: List[String], ok: List[String], tc: List[String])

当我尝试执行如下操作时,我收到错误

imageAndPageKeywords.collect.toList.sortBy(_._1.tid).take(10).foreach(println)

错误堆栈跟踪 -

引起:com.datastax.driver.core.exceptions.InvalidQueryException:com.datastax.driver.core.Responses com.datastax上的分区键部件url的无效空值 . $ Error.asException(Responses.java:103) . driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)在com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:293)在com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java: 455)位于org.jboss.netty.handler的org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)的com.datastax.driver.core.Connection $ Dispatcher.messageReceived(Connection.java:734) .timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java) :791)在org.jb oss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294)在org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)在org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream( DefaultChannelPipeline.java:564)在org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)在org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)在org.jboss .netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)在org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)在org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext .sendUpstream(DefaultChannelPipeline.java:791)在org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)在org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462 )org.jboss . netm.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)位于org.jboss.netty.channel的org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) . SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)在org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)在org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)在组织位于org.jboss.netty.channel.socket.nio.NioWorker的org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)的.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) .read(NioWorker.java:88)org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector) .java:318)org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)org.jboss.netty.channel.socket.nio.NioW orker.run(NioWorker.java:178)org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)org.jboss.netty.util.internal.DeadLockProofWorker $ 1.run(DeadLockProofWorker.java:42 )...还有3个

1 回答

  • 2

    很简单,该异常告诉您它无法执行连接,因为用于连接 ImageMetadataPagesMetadata 的列为空 .

    在您的情况下, ImageMetadata 中的某些url(pu)值为null .

    奇怪的是你定义了 PagesMetadata 和url nullable(Option [String])并且它似乎是表的主键的一部分

    使其工作的一个解决方案是:

    val imageAndPageKeywords = imageKeywords
      .filter(im -> im.pu.isDefined)
      .joinWithCassandraTable[PagesMetadata]("pages2", "metadata")
      .on(SomeColumns("tid", "url" as "pu"))
    

相关问题