Clarified Question: SparkContext is available in Java but wants a Scala sequence. How do I make it happy -- in Java?
我有这个代码做一个简单的 jsc.parallelize
我正在使用JavaSparkContext,但SparkContext想要一个Scala集合 . 我在这里想到我正在构建一个Scala Range并将其转换为Java列表,不知道如何将该核心Range作为Scala Seq,这就是parallelize from SparkContext is asking for .
// The JavaSparkContext way, was trying to get around MAXINT limit, not the issue here
// setup bogus Lists of size M and N for parallelize
//List<Integer> rangeM = rangeClosed(startM, endM).boxed().collect(Collectors.toList());
//List<Integer> rangeN = rangeClosed(startN, endN).boxed().collect(Collectors.toList());
接下来是金钱线,如何在Java中创建Scala Seq以实现并行化?
// these lists above need to be scala objects now that we switched to SparkContext
scala.collection.Seq<Integer> rangeMscala = scala.collection.immutable.List(startM to endM);
// setup sparkConf and create SparkContext
... SparkConf setup
SparkContext jsc = new SparkContext(sparkConf);
RDD<Integer> dataSetMscala = jsc.parallelize(rangeMscala);
1 回答
你应该这样使用它:
希望能帮助到你!问候