我正在尝试在HDP 2.6.3沙箱上组建一个数据管道 . (docker)我正在使用pyspark与phoenix(4.7)和HBase .

我已经从maven安装了phoenix项目并成功创建了一个包含测试记录的表 . 我也可以在Hbase中看到数据 .

现在我尝试使用pyspark从表中读取数据,代码如下:

import phoenix 
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext(appName="Phoenix test")
sqlContext = SQLContext(sc)
table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "localhost:2181:/hbase-unsecure").load()

凤凰ddl:

CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (1, 'test_row_1',111);
UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (2, 'test_row_2',111 );

呼叫:

spark-submit --class org.apache.phoenix.spark --jars /usr/hdp/current/phoenix-server/phoenix-4.7.0.2.5.0.0-1245-client.jar --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/spark2/conf/hbase-site.xml phoenix_test.py

回溯(最近一次调用最后一次):文件"/root/hdp/process_data.py",第42行,在.format(data_source_format)\ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py",第593行,保存文件"/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/java_gateway.py",第1160行,在 call 应答,self.gateway_client,self.target_id,self.name )文件"/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py",第63行,在deco文件"/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/protocol.py",第320行,以get_return_value格式(target_id,".",名称),值)py4j.protocol.Py4JJavaError:调用o55.save时发生错误 . :java.lang.UnsupportedOperationException:empty.tail

谢谢,透视