首页 文章

Spark SASL没有使用纱线加工emr

提问于
浏览
0

首先,我想说的是,我所看到的唯一解决此问题的方法是:Spark 1.6.1 SASL . 但是,在添加spark和yarn认证的配置时,它仍然无法正常工作 . 下面是我在amazon的emr上使用spark-submit进行火花提升的火花配置:

SparkConf sparkConf = new SparkConf().setAppName("secure-test");
    sparkConf.set("spark.authenticate.enableSaslEncryption", "true");
    sparkConf.set("spark.network.sasl.serverAlwaysEncrypt", "true");
    sparkConf.set("spark.authenticate", "true");
    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator");
    try {
        sparkConf.registerKryoClasses(new Class<?>[]{
                Class.forName("org.apache.hadoop.io.LongWritable"),
                Class.forName("org.apache.hadoop.io.Text")
        });
    } catch (Exception e) {}

    sparkContext = new JavaSparkContext(sparkConf);
    sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
    sparkContext.hadoopConfiguration().set("fs.s3a.enableServerSideEncryption", "true");
    sparkContext.hadoopConfiguration().set("spark.authenticate", "true");

注意,我将spark.authenticate添加到代码中的sparkContext的hadoop配置而不是core-site.xml(我假设我可以这样做,因为其他的东西也可以工作) .

看这里:https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java好像两个spark.authenticate都是必要的 . 当我运行此应用程序时,我得到以下堆栈跟踪 .

17/01/03 22:10:23 INFO storage.BlockManager:使用本地外部shuffle服务注册执行程序 . 17/01/03 22点10分23秒ERROR client.TransportClientFactory:自举而异常客户端后,178毫秒了java.lang.RuntimeException:java.lang.IllegalArgumentException异常:未知消息类型:-22在org.apache.spark.network.shuffle .protocol.BlockTransferMessage $ Decoder.fromByteBuffer(BlockTransferMessage.java:67)org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:71)at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest (TransportRequestHandler.java:149)在org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)在org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)在org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)在io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)在io.netty.channel.AbstractChannelHandlerContext.invokeC hannelRead(AbstractChannelHandlerContext.java:333)在io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)在io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)在io.netty.channel在io.net的io.netty.channel.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)的io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)上的.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) . netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)ato.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java)中的io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) 86)在io.netty.channel.AbstractC的io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) haoHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)at io.netty.channel.nio.AbstractNioByteChannel $ NioByteUnsafe.read(AbstractNioByteChannel.java:130)at io位于io.netty.channel.nio.NoEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)的io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop)的.java:382)在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)在io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:116)在java.lang.Thread中.RUN(Thread.java:745)

它说,在Spark的文档中

For Spark on YARN deployments, configuring spark.authenticate to true will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.

根据上面的纱线文件中的评论,这似乎是错误的,但是在拍摄失败的情况下,我仍然迷失在应该让sasl上班的地方?我错过了某些明显记录在某处的东西吗?

1 回答

  • 1

    所以我终于明白了 . 之前的StackOverflow线程在技术上是正确的 . 我需要将spark.authenticate添加到纱线配置中 . 也许有可能做到这一点,但我无法弄清楚如何在代码中添加这个配置,这在很高的层次上是有道理的 . 我将在下面发布我的配置,以防其他人在将来遇到此问题 .

    首先,我使用了一个aws emr配置文件(这个例子就是当使用aws cli aws emr create-cluster --configurations file://youpathhere.json 时)

    然后,我将以下json添加到文件中:

    [{
        "Classification": "spark-defaults",
        "Properties": {
            "spark.authenticate": "true",
            "spark.authenticate.enableSaslEncryption": "true",
            "spark.network.sasl.serverAlwaysEncrypt": "true"
        }
    },
    {
        "Classification": "core-site",
        "Properties": {
            "spark.authenticate": "true"
        }
    }]
    

相关问题