首页 文章

使用pyspark连接Microsoft SQL Server,抛出错误:

提问于
浏览
2

请引导我使用Pyspark连接和读取MS SQL数据的步骤 . 下面是我的代码和我尝试从MS SQL Server加载数据时收到的错误消息 . 请指导我 .

import urllib
import findspark
findspark.init()
from pyspark import SparkConf, SparkContext

from pyspark.sql import SQLContext

APP_NAME = 'My Spark Application'

conf = SparkConf().setAppName("APP_NAME").setMaster("local[4]")
sc = SparkContext(conf=conf)

sqlcontext = SQLContext(sc)

jdbcDF = sqlcontext.read.format("jdbc").option("url", "jdbc:sqlserver:XXXX:1433").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable", "dbo.XXXX").option("user", "XXXX").option("password", "XXX").load()

********错误 ******************** teway.py“,第1133行,在 call 回答,self.gateway_client,self.target_id,self.name)文件"C:\spark-2.0.1-bin-hadoop2.6\python\pyspark\sql\utils.py",第63行,在d eco return f( a, kw)文件"C:\spark-2.0.1-bin-hadoop2.6\python\lib\py4j-0.10.3-src.zip\py4j\protoco l.py",第319行,以get_return_value格式(target_id,".",name),value)py4j.protocol.Py4JJavaError:调用o66.load . 时发生错误:java . 位于org.apache.spark.sql.exe执行的org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD $ .resolveTable(JDBCRDD.scala:167)的lang.NullPointerException . (J DBCRelation . scala:117)org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala: 330)atg.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)at sun.reflect.NativeMethodAccessorImp l.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl . java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)py4j.reflection.MethodInvoker.invoke(MethodInvoker.java: 237)在Py4j.mands上的py4j.mands.AbstractCommand.invokeMethod(AbstractCommand.java:132)的py4j.Gateway.invoke(Gateway.java:280)处的py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) . CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:214)at java.lang.Thread.run(Thread.java:745)

1 回答

  • 0

    以下解决方案适合我:

    mssql-jdbc-7.0.0.jre8.jar 文件包含到jars子文件夹(例如:C:\ spark \ spark-2.2.2-bin-hadoop2.7 \ jars)中,或者您可以根据您的系统粘贴任何jar文件 .

    然后使用以下命令连接到MS SQL服务器并创建Spark Dataframe:

    dbData = spark.read.jdbc(“jdbc:sqlserver:// servername; databaseName:ExampleDB; user:username; password:password”,“tablename”)

相关问题