我遇到了我的点燃集群的问题,客户端在启动期间一直挂起 . 此群集在k8s中运行,并且是3个节点

我创建了一个简单的缓存/近缓存,因为他们正在对其进行更改以衡量性能影响 . 这是客户端启动代码:

Ignition.setClientMode(true);
    IgniteConfiguration igniteConfiguration = new IgniteConfiguration();
    igniteConfiguration.setIncludeEventTypes(EventType.EVTS_ALL);
    igniteConfiguration.setPeerClassLoadingEnabled(true);
    TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
    igniteConfiguration.setDiscoverySpi(tcpDiscoverySpi);
    TcpDiscoveryIpFinder podResolver = getKubePodResolver();
    tcpDiscoverySpi.setIpFinder(podResolver);
    tcpDiscoverySpi.setJoinTimeout(30000);
    tcpDiscoverySpi.setAckTimeout(30000);
    tcpDiscoverySpi.setSocketTimeout(30000);
    tcpDiscoverySpi.setNetworkTimeout(30000);
    tcpDiscoverySpi.failureDetectionTimeoutEnabled(true);
    try (Ignite ignite = Ignition.start(igniteConfiguration)) {
        ignite.destroyCache("myCache");
        NearCacheConfiguration<Integer, Integer> nearCfg = new NearCacheConfiguration<>();
        nearCfg.setNearEvictionPolicy(new LruEvictionPolicy<>(5000));
        nearCfg.setNearStartSize(5000);
        CacheConfiguration<Integer, Integer> cacheConfiguration = new CacheConfiguration<Integer, Integer>("myCache");
        cacheConfiguration.setOnheapCacheEnabled(false);
        cacheConfiguration.setStatisticsEnabled(true);
        cacheConfiguration.setWriteBehindEnabled(true);
        cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
        // toggling btwn partitioned and replicated
        // cacheConfiguration.setCacheMode(CacheMode.REPLICATED);
        cacheConfiguration.setQueryParallelism(3);
        IgniteCache<Integer, Integer> cache = ignite.getOrCreateCache(cacheConfiguration, nearCfg);

在创建缓存之后,我运行get和puts来填充最多10k个条目 . 当我重新启动客户端它挂起 - 我可以通过简单地重新启动客户端来重现这一点 .

在客户端上运行线程转储时,我看到主线程挂在未来和关联的线程上

"main" #1 prio=5 os_prio=0 tid=0x00007fc03800b800 nid=0x6 waiting on condition [0x00007fc04139d000]
       java.lang.Thread.State: TIMED_WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
            at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
            at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
            at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:151)
            at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.onKernalStart(GridCachePartitionExchangeManager.java:595)
            at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStart(GridCacheProcessor.java:769)
            at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1060)
            at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1909)
            at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1652)
            - locked <0x0000000086b27728> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
            at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1080)
            at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:600)
            at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:525)
            at org.apache.ignite.Ignition.start(Ignition.java:322)
            ...

    "exchange-worker-#35" #60 prio=5 os_prio=0 tid=0x00007fc039093000 nid=0x42 waiting on condition [0x00007fbfe3bfc000]
       java.lang.Thread.State: TIMED_WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
            at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
            at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
            at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2289)
            at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
            at java.lang.Thread.run(Thread.java:748)

    "disco-event-worker-#34" #57 prio=5 os_prio=0 tid=0x00007fc0388ad000 nid=0x3f waiting on condition [0x00007fbff013b000]
       java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x0000000086556ea0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
            at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
            at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2552)
            at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2534)
            at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
            at java.lang.Thread.run(Thread.java:748)

当我转向点火调试时,这是客户端和服务器集群的输出示例:

客户:

[02:48:29,653][WARNING][main][GridCachePartitionExchangeManager] Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=c26a231d-027a-49c0-8d64-7d5c92be0c7a, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.16.102.6], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, near-test-5d9699b96f-lzsbx/172.16.102.6:0], discPort=0, order=35, intOrder=0, lastExchangeTime=1514342845847, loc=true, ver=2.3.0#20171028-sha1:8add7fd5, isClient=true], topVer=35, nodeId8=c26a231d, msg=null, type=NODE_JOINED, tstamp=1514342848946], crd=TcpDiscoveryNode [id=cc754ef0-a004-40c9-985f-f43b2df66e39, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.16.83.5], sockAddrs=[/172.16.83.5:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1514342846946, loc=false, ver=2.3.0#20171028-sha1:8add7fd5, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=c26a231d-027a-49c0-8d64-7d5c92be0c7a, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.16.102.6], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, near-test-5d9699b96f-lzsbx/172.16.102.6:0], discPort=0, order=35, intOrder=0, lastExchangeTime=1514342845847, loc=true, ver=2.3.0#20171028-sha1:8add7fd5, isClient=true], topVer=35, nodeId8=c26a231d, msg=null, type=NODE_JOINED, tstamp=1514342848946], nodeId=c26a231d, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=2111815064], init=true, lastVer=null, partReleaseFut=null, exchActions=null, affChangeMsg=null, initTs=1514342849646, centralizedAff=false, changeGlobalStateE=null, done=false, state=CLIENT, evtLatch=0, remaining=[b1581d62-f72f-4a56-93b6-babd364cc695, cc754ef0-a004-40c9-985f-f43b2df66e39, 8f840e6f-c40d-46fd-8476-06793c25d329], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=353891789]]]
    [02:48:35,555][WARNING][exchange-worker-#35][diagnostic] Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], node=c26a231d-027a-49c0-8d64-7d5c92be0c7a]. Dumping pending objects that might be the cause:
    [02:48:45,555][WARNING][exchange-worker-#35][diagnostic] Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], node=c26a231d-027a-49c0-8d64-7d5c92be0c7a]. Dumping pending objects that might be the cause:

其中一个服务器节点:

2017-12-27 02:46:52,298 ignite-8df95c79b-bbtvx ignite: [priority='INFO' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.IgniteKernal@463']
    Metrics for local node (to disable set 'metricsLogFrequency' to 0)
        ^-- Node [id=8f840e6f, uptime=04:43:04.048]
        ^-- H/N/C [hosts=5, nodes=5, CPUs=10]
        ^-- CPU [cur=0.5%, avg=3.3%, GC=0%]
        ^-- PageMemory [pages=1024]
        ^-- Heap [used=400MB, free=80.36%, comm=2041MB]
        ^-- Non heap [used=72MB, free=-1%, comm=74MB]
        ^-- Public thread pool [active=0, idle=0, qSize=0]
        ^-- System thread pool [active=0, idle=6, qSize=0]
        ^-- Outbound messages queue [size=0]
    2017-12-27 02:46:52,298 ignite-8df95c79b-bbtvx ignite: [priority='INFO' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.IgniteKernal@463'] FreeList [name=null, buckets=256, dataPages=9, reusePages=638]
    2017-12-27 02:46:52,298 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=9b979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342812287, period=60000, cancel=false, task=org.apache.ignite.internal.IgniteKernal$4@6cb224d], process=true]
    2017-12-27 02:46:52,656 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=a6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342812652, period=3000, cancel=false, task=org.apache.ignite.internal.processors.query.GridQueryProcessor$2@3f625e1a], process=true]
    2017-12-27 02:46:53,887 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=c6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342813885, period=3000, cancel=false, task=MetricsUpdater [prevGcTime=2117, prevCpuTime=578225, super=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$MetricsUpdater@24c52bbf]], process=true]
    2017-12-27 02:46:54,481 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=69979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814475, period=5000, cancel=false, task=org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager$BackupCleaner@2032f1ff], process=true]
    2017-12-27 02:46:54,481 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=dba79d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814475, period=5000, cancel=false, task=org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager$BackupCleaner@7c995f6b], process=true]
    2017-12-27 02:46:54,776 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=ea829359061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814774, period=5000, cancel=false, task=org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager$BackupCleaner@76ae058], process=true]
    2017-12-27 02:46:54,845 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=7b979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814839, period=30000, cancel=false, task=org.apache.ignite.internal.IgniteKernal$2@1905ce7], process=true]
    2017-12-27 02:46:54,908 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=GridCommunicationMessageSet [nodeId=cc754ef0-a004-40c9-985f-f43b2df66e39, endTime=1514342814898, timeoutId=5b979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, topic=T6 [topic=TOPIC_CACHE, id1=83e8ca36-2305-3266-8e65-1463be879baa, id2=0], plc=5, msgs=[], reserved=false, timeout=10000, skipOnTimeout=false, lastTs=1514325827646], process=true]
    2017-12-27 02:46:55,664 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=a6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342815654, period=3000, cancel=false, task=org.apache.ignite.internal.processors.query.GridQueryProcessor$2@3f625e1a], process=true]
    2017-12-27 02:46:55,823 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='nio-acceptor-#29' class='org.apache.ignite.internal.processors.odbc.ClientListenerProcessor@452'] Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
    2017-12-27 02:46:56,871 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='nio-acceptor-#33' class='org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestProtocol@452'] Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
    2017-12-27 02:46:56,897 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=CancelableTask [id=c6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342816887, period=3000, cancel=false, task=MetricsUpdater [prevGcTime=2117, prevCpuTime=578240, super=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$MetricsUpdater@24c52bbf]], process=true]
    2017-12-27 02:46:56,951 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='nio-acceptor-#24' class='org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi@452'] Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
    2017-12-27 02:46:56,984 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor@452'] Timeout has occurred [obj=org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask@67741cd0, process=true]
    2017-12-27 02:46:56,984 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore@452'] Deployment meta for local deployment: GridDeploymentMetadata [depMode=SHARED, alias=org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask$1, clsName=org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask$1, userVer=null, sndNodeId=8f840e6f-c40d-46fd-8476-06793c25d329, clsLdrId=null, clsLdr=null, participants=null, parentLdr=null, record=true, nodeFilter=null, seqNum=n/a]
    2017-12-27 02:46:56,985 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore@452'] Acquired deployment class from local cache: GridDeployment [ts=1514325826446, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, clsLdrId=8a979d49061-8f840e6f-c40d-46fd-8476-06793c25d329, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=false, usage=0]

知道这里发生了什么吗?