首页 文章

Spark和MongoDB应用程序在Scala 2.10 maven中构建错误

提问于
浏览
0

我想为Spark和MongoDB构建一个带有maven依赖项的Scala应用程序 . 我使用的Scala版本是2.10 . 我的pom看起来像这样(省略了不相关的部分):

<properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.tools.version>2.10</scala.tools.version>
    <!-- Put the Scala version of the cluster -->
    <scala.version>2.10.5</scala.version>
</properties>

<!-- repository to add org.apache.spark -->
<repositories>
    <repository>
        <id>cloudera-repo-releases</id>
        <url>https://repository.cloudera.com/artifactory/repo/</url>
    </repository>
</repositories>

<build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <!-- <pluginManagement>  -->    
        <plugins>
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-make:transitive</arg>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.13</version>
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <!-- If you have classpath issue like NoDefClassError,... -->
                    <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>

            <!-- "package" command plugin -->
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    <!-- </pluginManagement>  -->       
</build>

<dependencies>  
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.1</version>
    </dependency>
    <dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.10</artifactId>
        <version>1.1.0</version>

    </dependency>   
    <dependency>
        <groupId>org.mongodb.scala</groupId>
        <artifactId>mongo-scala-driver_2.11</artifactId>
        <version>1.1.1</version>
    </dependency>
</dependencies>

当我运行 mvn clean assembly:assembly 时,会发生以下错误:

C:\Develop\workspace\SparkApplication>mvn clean assembly:assembly
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ SparkApplication ---
[INFO] Deleting C:\Develop\workspace\SparkApplication\target
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-assembly-plugin:2.4.1:assembly (default-cli) > package @ SparkA
pplication >>>
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ SparkAppli
cation ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Develop\workspace\SparkApplication
\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ SparkApplicatio
n ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- scala-maven-plugin:3.1.3:compile (default) @ SparkApplication ---
[WARNING]  Expected all dependencies to require Scala version: 2.10.5
[WARNING]  xx.xxx.xxx:SparkApplication:0.0.1-SNAPSHOT requires scala version:
2.10.5
[WARNING]  com.twitter:chill_2.10:0.5.0 requires scala version: 2.10.4
[WARNING] Multiple versions of scala libraries detected!
[INFO] C:\Develop\workspace\SparkApplication\src\main\scala:-1: info: compiling
[INFO] Compiling 1 source files to C:\Develop\workspace\SparkApplication\target\
classes at 1477993255625
[INFO] No known dependencies. Compiling everything
[ERROR] error: bad symbolic reference. A signature in package.class refers to ty
pe compileTimeOnly
[INFO] in package scala.annotation which is not available.
[INFO] It may be completely missing from the current classpath, or the version o
n
[INFO] the classpath might be incompatible with the version used when compiling
package.class.
[ERROR] C:\Develop\workspace\SparkApplication\src\main\scala\com\examples\MainEx
ample.scala:33: error: Reference to method intWrapper in class LowPriorityImplic
its should not have survived past type checking,
[ERROR] it should have been processed and eliminated during expansion of an encl
osing macro.
[ERROR]     val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
[ERROR]                                ^
[ERROR] two errors found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.363 s
[INFO] Finished at: 2016-11-01T10:40:58+01:00
[INFO] Final Memory: 20M/353M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.3:compi
le (default) on project SparkApplication: wrap: org.apache.commons.exec.ExecuteE
xception: Process exited with an error: 1(Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception

仅在添加 mongo-scala-driver_2.11 依赖项时才会发生此错误 . 如果没有这种依赖关系,就会构建jar . 我的代码目前是来自Spark website的Pi估算示例:

val conf = new SparkConf()
            .setAppName("Cluster Application")
            //.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 

val sc = new SparkContext(conf)


val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

我还尝试将以下标记添加到每个元素中,因为我在某些github issue中找到了这个标记 . 虽然没有帮助 .

<exclusions>
        <exclusion>
            <!-- make sure wrong scala version is not pulled in -->
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
        </exclusion>
    </exclusions>

如何解决这个问题? MongoDB Scala驱动程序似乎是针对Scala 2.11构建的,但Spark需要Scala 2.10 .

1 回答

  • 1

    删除Mongo Scala驱动程序依赖项,它不是为Scala 2.10编译的,因此不兼容 .

    好消息是MongoDB Spark Connector是一个独立的连接器 . 它使用同步Mongo Java Driver,因为Spark专为CPU密集型同步任务而设计 . 它被设计为遵循Spark习语,是将MongoDB连接到Spark所需的一切 .

    另一方面,Mongo Scala Driver是现代Scala惯例的惯用语;所有IO都是完全异步的 . 这非常适合Web应用程序并提高单个计算机的可伸缩性 .

相关问题