我想运行一个简单的spark脚本,它有一些sparksql查询basicaly Hiveql . 相应的表保存在spark-warehouse文件夹中 .

从pyspark.sql导入SparkSession

from pyspark.sql import Row


spark=SparkSession.builder.config("spark.sql.warehouse.dir", "file:///C:/tmp").appName("TestApp").enableHiveSupport().getOrCreate()
sqlstring="SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2  WHERE lflow1.Did = lesflow2.MID"

def queryBuilder(sqlval):
    df=spark.sql(sqlval)
    df.show()

result=queryBuilder(sqlstring)
print(result.collect())
print("Type of",type(result))

执行火花提交操作后,我正面临波纹错误

py4j.protocol.Py4JJavaError: An error occurred while calling o27.sql. : org.apache.spark.sql.AnalysisException: Table or view not found: lflow1; line 1 pos 211

我无法弄清楚它为什么会发生 . 我看过stackoverflow的一些帖子,他们建议我必须通过编写enablehivesupport()来启用我已经在我的脚本中完成的hive支持 . 但我仍然得到这个错误 . 我正在Windows 10中运行pyspark 2.2 . 请帮助我搞清楚

我已经从dataframe创建并保存了lflow1和lesflow2作为pyspark shell中的永久表 . 这是mycode

df = spark.read.json("C:/Users/codemen/Desktop/test for sparkreport engine/LeaseFlow1.json")

df2 = spark.read.json("C:/Users/codemen/Desktop/test for sparkreport engine/LeaseFlow2.json")

df.write.saveAsTable("lflow1")
df2.write.saveAsTable("lesflow2")

在Pyspark shell我已经执行了查询

spark.sql("SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2  WHERE lflow1.Did = lesflow2.MID").show()

和pyspark控制台显示这一点

enter image description here