我有3个节点集群,有20个客户端,它在spark上下文中运行 . 最初它工作正常,但是每当新节点即客户端尝试与集群连接时随机出现问题 . 集群无法运行 . 当它卡住时我有以下日志 . 如果我重新启动任何Ignite服务器然后它的发布和工作正常 . 我使用Ignite 2.4.0版本 . 同样的问题也出现在Ignite 2.5.0版本中 .

客户端日志无法等待分区映射交换[topVer = AffinityTopologyVersion [topVer = 44,minorTopVer = 0],node = 4d885cfd-45ed-43a2-8088-f35c9469797f] . 转储可能是原因的待处理对象:

GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]

无法等待分区映射交换[topVer = AffinityTopologyVersion [topVer = 44,minorTopVer = 0],node = 4d885cfd-45ed-43a2-8088-f35c9469797f] . 转储可能是原因的待处理对象:

GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]

无法等待初始分区映射交换 . 可能的原因是:^ - 死锁中的事务 . ^ - 长时间运行的事务(如果是这种情况则忽略) . ^ - 未发布的显式锁 .

仍在等待初始分区映射交换[fut = GridDhtPartitionsExchangeFuture [firstDiscoEvt = DiscoveryEvent [evtNode = TcpDiscoveryNode [id = 4d885cfd-45ed-43a2-8088-f35c9469797f,addrs =

服务器端日志条带池中可能存在饥饿现象 . 线程名称:sys-stripe-0-#1队列:[消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtTxPrepareResponse [nearEvicted] = null,futId = 869dd4ca361-fe7e167d-4d80-4f57-b004-13359a9f2c11,miniId = 1,super = GridDistributedTxPrepareResponse [txState = null,part = -1,err = null,super = GridDistributedBaseMessage [ver = GridCacheVersion [topVer = 139084030, order = 1527604094903,nodeOrder = 1],committedVers = null,rolledbackVers = null,cnt = 0,super = GridCacheIdMessage [cacheId = 0]]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE, topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicSingleUpdateRequest [key = KeyCacheObjectImpl [part = 984,val = null,hasValBytes = true],val = BinaryObjectImpl [arr = true,ctx = false,start = 0],prevVal = null,super = GridDhtAtomicAbstractUpdateRequest [onRes = false,nearNodeId = null,nearFutId = 0,flags =]]]],oaiiprocessors.cache.distributed.dht.atomic.GridDhtAtomicCache $ D eferredUpdateTimeout @ 2735c674,消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtTxPrepareRequest [nearNodeId = 628e3078-17fd-4e49-b9ae-ad94ad97a2f1 ,futId = 6576e4ca361-6e7cdac2-d5a3-4624-9ad3-b93f25546cc3,miniId = 1,topVer = AffinityTopologyVersion [topVer = 20,minorTopVer = 0],invalidateNearEntries = {},nearWrites = null,owned = null,nearXidVer = GridCacheVersion [topVer = 139084030,order = 1527604094933,nodeOrder = 2],subjId = 628e3078-17fd-4e49-b9ae-ad94ad97a2f1,taskNameHash = 0,preloadKeys = null,super = GridDistributedTxPrepareRequest [threadId = 86,concurrency = OPTIMISTIC,isolation = READ_COMMITTED,writeVer = GridCacheVersion [topVer = 139084030,order = 1527604094935,nodeOrder = 2],timeout = 0,reads = null,writes = [IgniteTxEntry [key = BinaryObjectImpl [arr = true,ctx = false,start = 0],cacheId = -1755241537, txKey = null,val = [op = UPDATE,val = BinaryObjectImpl [arr = true,ctx = false,start = 0]],prevVal = [op = NOOP,val = null],oldVal = [op = NOO P,val = null],entryProcessorsCol = null,ttl = -1,conflictExpireTime = -1,conflictVer = null,explicitVer = null,dhtVer = null,filters = null,filtersPassed = false,filtersSet = false,entry = null,prepared = 0,locked = false,nodeId = null,locMapped = false,expiryPlc = null,transferExpiryPlc = false,flags = 0,partUpdateCntr = 0,serReadVer = null,xidVer = null]],dhtVers = null,txSize = 0,plc = 2,txState = null,flags = onePhase | last,super = GridDistributedBaseMessage [ver = GridCacheVersion [topVer = 139084030,order = 1527604094933,nodeOrder = 2],committedVers = null,rolledbackVers = null,cnt = 0,super = GridCacheIdMessage [ cacheId = 0]]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicDeferredUpdateResponse [futIds = GridLongList [idx] = 2,arr = [65774,65775]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridNearAtomicSingleUpdateRequest [关键= KeyCacheObjectImp l [part = 1016,val = null,hasValBytes = true],parent = GridNearAtomicAbstractSingleUpdateRequest [nodeId = null,futId = 49328,topVer = AffinityTopologyVersion[topVer = 20,minorTopVer = 0],parent = GridNearAtomicAbstractUpdateRequest [res = null,flags = needRes]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false, timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicDeferredUpdateResponse [futIds = GridLongList [idx = 1,arr = [98591]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridDhtAtomicDeferredUpdateResponse [futIds = GridLongList [idx = 1,arr = [114926]]]]],消息闭包[msg = GridIoMessage [plc = 2,topic = TOPIC_CACHE,topicOrd = 8,ordered = false,timeout = 0,skipOnTimeout = false,msg = GridNearAtomicSingleUpdateRequest [key = KeyCacheObjectImpl [part = 1016,val = null,hasValBytes = true],parent = GridNearAtomicAbstractSingleUpdateRequest [nodeId = null,futId = 32946,topVer = AffinityTopologyVersion [topVer = 20,minorTopVer = 0],parent = GridNear