我在集群模式下使用databricks spark-redshift 2.0.1在emr上运行spark 2.0.0并且我的工作可以正常使用一些简单的redshift查询

val easyQueryWorks =
  s"""
     |select
     |s.session_id, s.user_id,
     |e.ex_id, e.pre_id
     |from schem1.sessions s
     |join schem2.entries e
     |on s.session_id = e.session_id
     |where e.entry_id = 200
     """.stripMargin

此结果集大约有100,000行 .

当我尝试使用如下所示的查询运行它时:

val loadQuery =
  s"""
     |select
     |          e.created_at, e.session_id, e.entry_id,
     |          t1.user_id,
     |          t1.dim1, t1.dim2, t1.dim3,
     |          c.p_id, c.name,
     |          s.dim0,
     |          s.dim1, s.dim2, s.dim3, s.dim4,
     |          s.dim5, s.dim6, s.dim7,
     |          s.dim8, s.dim9, s.dim10
     |        from
     |          schem1.entries e
     |        left join
     |        (
     |         select
     |            session_id, MAX(id) as id
     |         from
     |           schem1.tool_events
     |         where
     |           value in (1,2,3,4)
     |         group by
     |           session_id
     |         ) t2
     |         on
     |           t2.session_id = e.session_id
     |        left join
     |          schem1.tool_events t1
     |        on
     |         t1.id = t2.id
     |        join
     |          schem2.cooks c
     |        on
     |          c.id = e.entry_id
     |        join
     |          schem1.sessions s
     |        on
     |         s.session_id = e.session_id
     |        where
     |          e.entry_id = 200
    """.stripMargin

(如果查询略有不正确请忽略 . 我只是替换了表和字段的名称)

大约2个小时后(我们在s3中看到创建了一个临时目录并且其中有一些文件),我们得到了这个异常:

FATAL CampaignJob: [Job Name -> MyJob] java.sql.SQLException: Amazon Error setting/closing connection: Connection reset by peer. java.sql.SQLException: Amazon Error setting/closing connection: Connection reset by peer. at com.amazon.jdbc.communications.channels.MessagesSocketChannel.readMessages(Unknown Source) at com.amazon.jdbc.communications.channels.AbstractMessagesSocketChannel.read(Unknown Source) Caused by: com.amazon.support.exceptions.GeneralException: Amazon Error setting/closing connection: Connection reset by peer. ... 2 more

工作失败了 .

它与简单查询的行数大致相同,我们知道,对于简单的查询,红移访问很好 . 原始查询本身直接在Redshift上运行,大约需要7-8分钟才能运行 . 只是在使用databricks-redshift工具运行spark时,它会失败 .

如果我能提供更多信息,请告诉我 .