我使用以下代码使用pyspark读取csv文件
import os
import sys
os.environ["SPARK_HOME"] = "D:\ProgramFiles\spark-2.1.0-bin-hadoop2.7"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.4-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
conf = SparkConf()
conf.setMaster('local')
conf.setAppName('test')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = qlContext.read.format("com.databricks.spark.csv").schema(customSchema).option("header", "true").option("mode", "DROPMALFORMED").load("iris.csv")
df.show()
The error is thrown as follows:-
文件“”,第1行,在df = sqlContext.read.format(“com.databricks.spark.csv”) . schema(customSchema).option(“header”,“true”) . option(“mode”, “DROPMALFORMED”) . load(“iris.csv”)文件“D:\ ProgramFiles \ spark-2.1.0-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ sql \ context.py”,line 464,在读取返回DataFrameReader(self)文件“D:\ ProgramFiles \ spark-2.1.0-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ sql \ readwriter.py”,第70行,在init中self._jreader = spark._ssql_ctx.read()文件“D:\ ProgramFiles \ spark-2.1.0-bin-hadoop2.7 \ python \ lib \ py4j-0.10.4-src.zip \ py4j \ java_gateway.py” ,第1133行,在调用的答案中,self.gateway_client,self.target_id,self.name)文件“D:\ ProgramFiles \ spark-2.1.0-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ sql \ utils.py“,第79行,在deco中引发IllegalArgumentException(s.split(':',1)[1],stackTrace)IllegalArgumentException:”实例化'org.apache.spark.sql.internal.SessionState'时出错: “
1 回答
以上阅读csv的方式适用于spark版本<2.0.0
对于spark> 2.0.0你需要阅读spark session,如,
要么