我在执行pyspark脚本的逻辑回归时遇到错误,来自pyspark.ml.classification import LogisticRegression

training = spark.read.format("libsvm").load("D:\gnanasekaran\software\spark-2.0.2-bin-hadoop2.7\data\data\mllib\sample_libsvm_data.txt")

lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)

lrModel = lr.fit(training)

print("Coefficients: " + str(lrModel.coefficients))
print("Intercept: " + str(lrModel.intercept))

mlr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8, family="multinomial")

mlrModel = mlr.fit(training)

print("Multinomial coefficients: " + str(mlrModel.coefficientMatrix))
print("Multinomial intercepts: " + str(mlrModel.interceptVector))

错误:

D:\gnanasekaran\own_work\pyspark>spark-submit spark_logistic.py

2018-03-12 14:39:01 WARN NativeCodeLoader:62 - 无法为您的平台加载native-hadoop库...使用适用的内置java类Traceback(最近一次调用最后一次):文件“D:/ gnanasekaran /own_work/pyspark/spark_logistic.py“,第1行,来自pyspark.ml.classification导入LogisticRegression文件”D:\ gnanasekaran \ spark-2.3.0-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ ml__init __ . py“,第22行,在文件”D:\ gnanasekaran \ spark-2.3.0-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ ml \ base.py“中,第24行,在ModuleNotFoundError中输入文件“D:\ gnanasekaran \ spark-2.3.0-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ ml \ param__init __ . py”,第26行:没有名为'numpy'的模块2018- 03-12 14:39:02 INFO ShutdownHookManager:54 - 关闭钩子叫2018-03-12 14:39:02 INFO ShutdownHookManager:54 - 删除目录C:\ Users \ gs00497896 \ AppData \ Local \ Temp \ spark-f6f572f8- 5c9d-4428-9992-b2ca2e4a5d78

并下载numpy并尝试使用以下命令在Windows中安装:

python setup.py install

但它不起作用