如何在HDInsight中添加外部jar到spark？-Java 学习之路

我正在尝试在Azure上的HDInsight Spark群集中安装Azure CosmosDB Spark连接器 . （Github）

我是火花环境的新手，我无法实现将连接器jar添加到spark配置的正确方法 .

我使用的方法：

Method 1 我上传了与HDInsight群集关联的Azure Blob存储容器上的jar . （例如/ jars /）我与spark cluster head节点 Build 了一个ssh连接并运行以下命令：

spark-shell --master yarn --conf "spark.executor.extraClassPath="wasb:///example/jars/azure-cosmosdb-spark_2.0.2_2.11-0.0.3.jar" --conf "spar.driver.extraClassPath= "wasb:///example/jar/azure-cosmosdb-spark_2.0.2_2.11-0.0.3.jar"

spark-shell返回以下内容：

SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/10/19 15:10:48 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.0.0.20:4040
Spark context available as 'sc' (master = yarn, app id = application_1508418631913_0014).
Spark session available as 'spark'.

我认为这里的问题是

SparkContext: Use an existing SparkContext, some configuration may not take effect.

Method 2

上传到/ examples / jars后与第一种方法相同 . 我打开了Ambari UI并将spark.executor.extraClassPath和spark.driver.extraClassPath添加到spark-Custom-Defaults，并使用方法1中提到的相同值 .

Method 1 和 Method 2 都对我的开发环境没有影响 . 我试图导入com.microsoft.azure.cosmosdb并且解释器找不到它 .

Method 3 我创建了一个HDInsight 3.6 Spark Cluster（不推荐用于我的情况，因为连接器是在HDInsight 3.5上测试的），我使用Zeppelin将配置添加到Livy Interpreter . 我尝试了找到的示例代码Here，我弹出了这个错误：

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.TypeCoercion$.findTightestCommonType()Lscala/Function2;

经过一些谷歌搜索我认为这是一个类版本的问题，所以我回到HDInsight 3.5仍然没有结果 .

我的问题是：

Spark-Shell --conf是应用持久配置还是仅应用于shell会话？

如何知道将来我将使用Livy REST API来执行可能包含此软件包的远程PySpark作业并且我不希望每次提交远程作业时都运行配置？

1 回答

1
您可以通过以下命令添加额外的依赖项：spark-shell：
```
spark-shell --packages maven-coordinates of the package
```
在你的情况下：
```
spark-shell --packages com.microsoft.azure:azure-cosmosdb-spark_2.1.0_2.11:jar:1.1.2
```
一个好的做法是打包你的应用程序及其所有依赖项：

https://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies

这应该也适用于livy .
回复于 2024-04-20T11:39:37+08:00

如何在HDInsight中添加外部jar到spark？

1 回答

相关问题