我是Spark的新手 . 我在本地拥有master(192.168.33.10)和slave(192.168.33.12)集群设置,并且我写了下面的脚本来演示master和slave都在自己的机器上运行get_ip_wrap() .

但是,当我使用命令./bin/spark-submit ip.py运行时,我只看到输出中的192.168.33.10,我在输出中也期待192.168.33.12 .

我还包括我的主和工作输出文件的跟踪 .

import socket
import fcntl
import struct
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

def get_ip_address(ifname):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    return socket.inet_ntoa(fcntl.ioctl(
        s.fileno(),
        0x8915,  # SIOCGIFADDR                                                                         
        struct.pack('256s', ifname[:15])
    )[20:24])

def get_ip_wrap(num):
    return get_ip_address('eth1')

#spark = SparkSession\                                                                                 
#        .builder\                                                                                     
#        .appName("PythonALS")\                                                                        
#        .getOrCreate()                                                                                
#sc = spark.sparkContext                                                                               

conf = SparkConf().setAppName('appName').setMaster('spark://vagrant-ubuntu-trusty-64:7077')
sc = SparkContext(conf=conf)

data = [x for x in range(0, 50)]
distData = sc.parallelize(data)

result = distData.map(get_ip_wrap)
print result.collect()

vagrant @ vagrant-ubuntu-trusty-64:〜/ spark-2.1.1-bin-hadoop2.7 $ ./sbin/start-master.sh

启动org.apache.spark.deploy.master.Master,登录到/home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.master.Master- 1-流浪-Ubuntu的可信任-64.out

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

vagrant @ vagrant-ubuntu-trusty-64:〜/ spark-2.1.1-bin-hadoop2.7 $ ./sbin/start-slave.sh spark:// vagrant-ubuntu-trusty-64:7077

启动org.apache.spark.deploy.worker.Worker,登录/home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org.apache.spark.deploy.worker.Worker- 1-流浪-Ubuntu的可信任-64.out

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

vagrant @ vagrant-ubuntu-trusty-64:〜/ spark-2.1.1-bin-hadoop2.7 $ ./bin/spark-submit ip.py

使用Spark的默认log4j配置文件:org / apache / spark / log4j-defaults.properties

17/05/27 17:08:09 INFO SparkContext:运行Spark版本2.1.1

17/05/27 17:08:09 WARN SparkContext:自Spark 2.0.0起,不支持Java 7

17/05/27 17:08:10 WARN NativeCodeLoader:无法为您的平台加载native-hadoop库...使用适用的builtin-java类

17/05/27 17:08:10 INFO SecurityManager:将视图更改为:vagrant

17/05/27 17:08:10 INFO SecurityManager:将修改修改为:vagrant

17/05/27 17:08:10 INFO SecurityManager:更改视图acls组到:

17/05/27 17:08:10 INFO SecurityManager:将修改acls组更改为:

17/05/27 17:08:10 INFO SecurityManager:SecurityManager:身份验证禁用; ui acls disabled;具有查看权限的用户:Set(vagrant);具有查看权限的组:Set();具有修改权限的用户:Set(vagrant);具有修改权限的组:Set()

17/05/27 17:08:10 INFO Utils:已成功启动端口59290上的“sparkDriver”服务 .

17/05/27 17:08:10 INFO SparkEnv:注册MapOutputTracker

17/05/27 17:08:10 INFO SparkEnv:注册BlockManagerMaster

17/05/27 17:08:10 INFO BlockManagerMasterEndpoint:使用org.apache.spark.storage.DefaultTopologyMapper获取拓扑信息

17/05/27 17:08:10 INFO BlockManagerMasterEndpoint:BlockManagerMasterEndpoint up

17/05/27 17:08:10 INFO DiskBlockManager:在/ tmp / blockmgr-ad008702-6e92-4e60-ab27-a582b1ba9fb9创建本地目录

17/05/27 17:08:10 INFO MemoryStore:MemoryStore的容量为413.9 MB

17/05/27 17:08:11 INFO SparkEnv:注册OutputCommitCoordinator

17/05/27 17:08:11 WARN Utils:服务'SparkUI'无法绑定端口4040.尝试端口4041 .

17/05/27 17:08:11 WARN Utils:服务'SparkUI'无法绑定端口4041.尝试端口4042 .

17/05/27 17:08:11 INFO Utils:在端口4042上成功启动了“SparkUI”服务 .

17/05/27 17:08:11 INFO SparkUI:将SparkUI绑定到0.0.0.0,并从http://10.0.2.15:4042开始

17/05/27 17:08:11 INFO SparkContext:在spark://10.0.2.15:59290 / files /添加了文件文件:/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py ip.py,时间戳为1495904891756

17/05/27 17:08:11 INFO Utils:将/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py复制到/ tmp / spark-5400808c-1304-404d-ae53-dc6cdb14694f / userFiles-dc94d72e-15d3-4d84-87b9-27e87dcb0f6a / ip.py

17/05/27 17:08:11 INFO StandaloneAppClient $ ClientEndpoint:连接到master spark:// vagrant-ubuntu-trusty-64:7077 ...

17/05/27 17:08:11 INFO TransportClientFactory:20 ms后成功创建了与vagrant-ubuntu-trusty-64 / 10.0.2.15:7077的连接(在bootstraps中花费了0毫秒)

17/05/27 17:08:12 INFO StandaloneSchedulerBackend:通过app ID app-20170527170812-0000连接到Spark群集

17/05/27 17:08:12 INFO Utils:在端口53124上成功启动了服务'org.apache.spark.network.netty.NettyBlockTransferService' .

17/05/27 17:08:12 INFO NettyBlockTransferService:服务器创建于10.0.2.15:53124

17/05/27 17:08:12 INFO BlockManager:使用用于块复制策略的org.apache.spark.storage.RandomBlockReplicationPolicy

17/05/27 17:08:12 INFO BlockManagerMaster:注册BlockManager BlockManagerId(驱动程序,10.0.2.15,53124,无)

17/05/27 17:08:12 INFO StandaloneAppClient $ ClientEndpoint:Executor添加:app-20170527170812-0000 / 0 on worker-20170527170800-10.0.2.15-54829(10.0.2.15:54829)with 1 cores

17/05/27 17:08:12 INFO StandaloneSchedulerBackend:主机 Actuator ID app-20170527170812-0000 / 0在hostPort 10.0.2.15:54829上有1个内核,1024.0 MB RAM

17/05/27 17:08:12 INFO BlockManagerMasterEndpoint:注册块管理器10.0.2.15:53124,内存为413.9 MB,BlockManagerId(驱动程序,10.0.2.15,53124,无)

17/05/27 17:08:12 INFO BlockManagerMaster:已注册的BlockManager BlockManagerId(驱动程序,10.0.2.15,53124,无)

17/05/27 17:08:12 INFO BlockManager:初始化BlockManager:BlockManagerId(驱动程序,10.0.2.15,53124,无)

17/05/27 17:08:12 INFO StandaloneAppClient $ ClientEndpoint:Executor更新:app-20170527170812-0000 / 0现在正在运行

17/05/27 17:08:12 INFO StandaloneSchedulerBackend:SchedulerBackend准备好在达到minRegisteredResourcesRatio后开始进行调度:0.0

17/05/27 17:08:13 INFO SparkContext:开始工作:收集于/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31

17/05/27 17:08:13 INFO DAGScheduler:找到工作0(收集在/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31),带有2个输出分区

17/05/27 17:08:13 INFO DAGScheduler:最后阶段:ResultStage 0(收集于/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31)

17/05/27 17:08:13 INFO DAGScheduler:最后阶段的父母:List()

17/05/27 17:08:13 INFO DAGScheduler:失踪父母:List()

17/05/27 17:08:13 INFO DAGScheduler:提交ResultStage 0(PythonRDD [1] at collect at /home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31)没有失踪的父母

17/05/27 17:08:13 INFO MemoryStore:阻止broadcast_0存储为内存中的值(估计大小4.1 KB,免费413.9 MB)

17/05/27 17:08:13 INFO MemoryStore:阻止broadcast_0_piece0作为字节存储在内存中(估计大小为2.8 KB,自由413.9 MB)

17/05/27 17:08:13 INFO BlockManagerInfo:在10.0.2.15:53124内存中添加了broadcast_0_piece0(大小:2.8 KB,免费:413.9 MB)

17/05/27 17:08:13 INFO SparkContext:在DAGScheduler.scala广播中创建广播0:996

17/05/27 17:08:13 INFO DAGScheduler:从ResultStage 0提交2个缺失任务(PythonRDD [1]收集于/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31 )

17/05/27 17:08:13 INFO TaskSchedulerImpl:添加任务集0.0,包含2个任务

17/05/27 17:08:15 INFO CoarseGrainedSchedulerBackend $ DriverEndpoint:ID为0的已注册执行者NettyRpcEndpointRef(null)(10.0.2.15:40762)

17/05/27 17:08:15 INFO TaskSetManager:在阶段0.0中启动任务0.0(TID 0,10.0.2.15,执行程序0,分区0,PROCESS_LOCAL,6136字节)

17/05/27 17:08:15 INFO BlockManagerMasterEndpoint:将块管理器10.0.2.15:33949注册到413.9 MB RAM,BlockManagerId(0,10.0.2.15,33949,无)

17/05/27 17:08:15 INFO BlockManagerInfo:在10.0.2.15:33949内存中添加了broadcast_0_piece0(大小:2.8 KB,免费:413.9 MB)

17/05/27 17:08:16 INFO TaskSetManager:在阶段0.0中启动任务1.0(TID 1,10.0.2.15,执行程序0,分区1,PROCESS_LOCAL,6136字节)

17/05/27 17:08:16 INFO TaskSetManager:在10.0.2.15( Actuator 0)(1/2)上1050毫秒的阶段0.0(TID 0)中完成任务0.0

17/05/27 17:08:16 INFO DAGScheduler:ResultStage 0(收集于/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31)在2.504秒完成

17/05/27 17:08:16 INFO TaskSetManager:在10.0.2.15(执行者0)(2/2)的119 ms中完成阶段0.0(TID 1)的任务1.0

17/05/27 17:08:16 INFO TaskSchedulerImpl:从池中删除任务已完成的TaskSet 0.0

17/05/27 17:08:16 INFO DAGScheduler:作业0完成:收集于/home/vagrant/spark-2.1.1-bin-hadoop2.7/ip.py:31,花了2.981746 s

['192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10',' 192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168 . 33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10' ,'192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10',' 192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168 .33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10','192.168.33.10']

17/05/27 17:08:16 INFO SparkContext:从shutdown hook调用stop()

17/05/27 17:08:16 INFO SparkUI:在http://10.0.2.15:4042停止Spark Web UI

17/05/27 17:08:16 INFO StandaloneSchedulerBackend:关闭所有执行程序

17/05/27 17:08:16 INFO CoarseGrainedSchedulerBackend $ DriverEndpoint:要求每个执行者关闭

17/05/27 17:08:16 INFO MapOutputTrackerMasterEndpoint:MapOutputTrackerMasterEndpoint停止了!

17/05/27 17:08:16 INFO MemoryStore:MemoryStore已清除

17/05/27 17:08:16 INFO BlockManager:BlockManager停止了

17/05/27 17:08:16 INFO BlockManagerMaster:BlockManagerMaster已停止

17/05/27 17:08:16 INFO OutputCommitCoordinator $ OutputCommitCoordinatorEndpoint:OutputCommitCoordinator停止了!

17/05/27 17:08:16 INFO SparkContext:成功停止了SparkContext

17/05/27 17:08:16 INFO ShutdownHookManager:关闭挂钩调用

17/05/27 17:08:16 INFO ShutdownHookManager:删除目录/ tmp / spark-5400808c-1304-404d-ae53-dc6cdb14694f / pyspark-021d6ed2-91d0-481b-b528-108581abe66c

17/05/27 17:08:16 INFO ShutdownHookManager:删除目录/ tmp / spark-5400808c-1304-404d-ae53-dc6cdb14694f

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $

vagrant @ vagrant-ubuntu-trusty-64:〜/ spark-2.1.1-bin-hadoop2.7 $ cat /home/vagrant/spark-2.1.1-bin-hadoop2.7/logs/spark-vagrant-org . apache.spark.deploy.master.Master -1-流浪-Ubuntu的可信任-64.out

Spark命令:/ usr / lib / jvm / java-7-openjdk-amd64 / jre / bin / java -cp /home/vagrant/spark-2.1.1-bin-hadoop2.7/conf/:/home/vagrant/ spark-2.1.1-bin-hadoop2.7 / jars / * -Xmx1g -XX:MaxPermSize = 256m org.apache.spark.deploy.master.Master --host vagrant-ubuntu-trusty-64 --port 7077 - webui-port 8080

========================================

使用Spark的默认log4j配置文件:org / apache / spark / log4j-defaults.properties

17/05/27 17:07:44 INFO Master:启动守护程序,进程名称为:9384 @ vagrant-ubuntu-trusty-64

17/05/27 17:07:44 INFO SignalUtils:TERM的注册信号处理程序

17/05/27 17:07:44 INFO SignalUtils:HUP的注册信号处理程序

17/05/27 17:07:44 INFO SignalUtils:INT的注册信号处理程序

17/05/27 17:07:44 WARN NativeCodeLoader:无法为您的平台加载native-hadoop库...使用适用的builtin-java类

17/05/27 17:07:45 INFO SecurityManager:将视图更改为:vagrant

17/05/27 17:07:45 INFO SecurityManager:将修改acls更改为:vagrant

17/05/27 17:07:45 INFO SecurityManager:将视图组调整为:

17/05/27 17:07:45 INFO SecurityManager:将修改acls组更改为:

17/05/27 17:07:45 INFO SecurityManager:SecurityManager:身份验证已禁用; ui acls disabled;具有查看权限的用户:Set(vagrant);具有查看权限的组:Set();具有修改权限的用户:Set(vagrant);具有修改权限的组:Set()

17/05/27 17:07:45 INFO Utils:在端口7077上成功启动了“sparkMaster”服务 .

17/05/27 17:07:45 INFO Master:在spark:// vagrant-ubuntu-trusty-64:7077开始Spark大师

17/05/27 17:07:45 INFO Master:运行Spark版本2.1.1

17/05/27 17:07:45 INFO Utils:在端口8080上成功启动了“MasterUI”服务 .

17/05/27 17:07:45 INFO MasterWebUI:将MasterWebUI绑定到0.0.0.0,并从http://10.0.2.15:8080开始

17/05/27 17:07:45 INFO Utils:在6066端口上成功启动服务 .

17/05/27 17:07:45 INFO StandaloneRestServer:启动REST服务器,用于在端口6066上提交应用程序

17/05/27 17:07:46 INFO大师:我当选了领导者!新州:ALIVE

17/05/27 17:08:00 INFO Master:使用1个内核,2.8 GB RAM注册worker 10.0.2.15:54829

17/05/27 17:08:12 INFO Master:注册app appName

17/05/27 17:08:12 INFO Master:已注册的app appName,ID为app-20170527170812-0000

17/05/27 17:08:12 INFO Master:在 Worker Worker 身上启动执行人app-20170527170812-0000 / 0-20170527170800-10.0.2.15-54829

17/05/27 17:08:16 INFO Master:从应用程序app-20170527170812-0000收到取消注册请求

17/05/27 17:08:16 INFO Master:删除app app-20170527170812-0000

17/05/27 17:08:16 INFO Master:10.0.2.15:51703被解除了关闭,删除它 .

17/05/27 17:08:16INFO Master:10.0.2.15:59290已取消关联,删除它 .

17/05/27 17:08:16 WARN Master:获得未知执行者app-20170527170812-0000 / 0的状态更新

@流浪的无业游民,Ubuntu的信赖-64:〜/火花2.1.1彬hadoop2.7 $