我们有..一个GC..issue,VM将在哪里运行一个很长的GC(1秒......我们的SLA要求<200ms) .

(这是在Java 8上,我们已经从11版本到最新版本看到了这一点) .

我们的GC设置atm:

-XX:+PrintStringTableStatistics
-XX:+PrintAdaptiveSizePolicy
-XX:+ParallelRefProcEnabled
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms3000M
-Xmx3000M
-Xmn1500M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
-XX:+PerfDisableSharedMem
-XX:CompileCommandFile=./../conf/hotspot_compiler
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:CMSWaitDuration=10000
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:/opt/openmarket/sqs/logs/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=100M

我们尝试过G1,但情况更糟,所以我们“坚持”CMS atm .

无论如何,这是GC日志中的片段 .

应该注意的是,在大约24小时的运行中,它执行了大约4个,其中2个......是'长'..(事实上我们在过去,12秒内看到更长的时间!!)

有一点我不知道..为什么我们看到 - >擦洗字符串表,0.6001941秒花了这么长时间,我们没有'实习'我们的很多字符串,启用-XX:PrintStringTableStatistics显示没有有趣的'活动,从我们可以告诉我们总共有1000个实习字符串,所以这不应该是因素 .

gc日志的高级摘要可以在这里看到:

http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDcvMjYvLS1ub2RlLTAwMS4zLS0xNS0yMi0zNA==

2018-07-25T18:15:52.608+0000: 30.543: Total time for which application threads were stopped: 0.0046391 seconds
 CMS: abort preclean due to time 2018-07-25T18:15:53.553+0000: 31.488: [CMS-concurrent-abortable-preclean: 0.326/20.683 secs] [Times: user=25.25 sys=3.89, real=20.68 secs] 
2018-07-25T18:15:53.557+0000: 31.492: [GC (CMS Final Remark) [YG occupancy: 588025 K (1382400 K)]31.492: [Rescan (parallel) , 0.0559721 secs]31.548: [weak refs processing, 0.0035595 secs]31.552: [class unloading, 0.0409634 secs]31.593: [scrub symbol table, 0.0051572 secs]31.598: [scrub string table, 0.0016212 secs][1 CMS-remark: 12246K(1536000K)] 600272K(2918400K), 0.1150145 secs] [Times: user=0.34 sys=0.00, real=0.11 secs] 
2018-07-25T18:15:53.673+0000: 31.608: Total time for which application threads were stopped: 0.1195510 seconds
2018-07-25T18:15:53.673+0000: 31.608: [CMS-concurrent-sweep-start]
--
2018-07-25T18:18:45.477+0000: 203.412: Total time for which application threads were stopped: 0.0044451 seconds
 CMS: abort preclean due to time 2018-07-25T18:18:51.809+0000: 209.743: [CMS-concurrent-abortable-preclean: 3.023/67.120 secs] [Times: user=18.52 sys=6.47, real=67.11 secs] 
2018-07-25T18:18:51.812+0000: 209.747: [GC (CMS Final Remark) [YG occupancy: 595711 K (1382400 K)]209.747: [Rescan (parallel) , 0.1225090 secs]209.870: [weak refs processing, 0.0048343 secs]209.875: [class unloading, 0.0408876 secs]209.916: [scrub symbol table, 0.0047264 secs]209.920: [scrub string table, 0.0042791 secs][1 CMS-remark: 54877K(1536000K)] 650589K(2918400K), 0.1843075 secs] [Times: user=0.54 sys=0.00, real=0.18 secs] 
2018-07-25T18:18:51.997+0000: 209.932: Total time for which application threads were stopped: 0.1885355 seconds
2018-07-25T18:18:52.002+0000: 209.937: Total time for which application threads were stopped: 0.0041075 seconds
--
2018-07-26T00:06:31.084+0000: 21069.019: Total time for which application threads were stopped: 0.0033592 seconds
 CMS: abort preclean due to time 2018-07-26T00:06:56.329+0000: 21094.264: [CMS-concurrent-abortable-preclean: 3.400/86.919 secs] [Times: user=15.48 sys=7.48, real=86.91 secs] 
2018-07-26T00:06:56.339+0000: 21094.274: [GC (CMS Final Remark) [YG occupancy: 604677 K (1382400 K)]21094.274: [Rescan (parallel) , 0.1252722 secs]21094.399: [weak refs processing, 0.0084571 secs]21094.408: [class unloading, 0.1359008 secs]21094.544: [scrub symbol table, 0.0037596 secs]21094.548: [scrub string table, 0.5171769 secs][1 CMS-remark: 1274484K(1536000K)] 1879161K(2918400K), 0.7978841 secs] [Times: user=1.13 sys=0.00, real=0.80 secs] 
2018-07-26T00:06:57.138+0000: 21095.072: Total time for which application threads were stopped: 0.8018055 seconds
2018-07-26T00:06:57.138+0000: 21095.073: [CMS-concurrent-sweep-start]
--
2018-07-26T09:05:41.291+0000: 53419.226: Total time for which application threads were stopped: 0.0053130 seconds
 CMS: abort preclean due to time 2018-07-26T09:06:08.087+0000: 53446.021: [CMS-concurrent-abortable-preclean: 2.304/99.556 secs] [Times: user=15.49 sys=8.46, real=99.54 secs] 
2018-07-26T09:06:08.090+0000: 53446.025: [GC (CMS Final Remark) [YG occupancy: 605003 K (1382400 K)]53446.025: [Rescan (parallel) , 0.1410524 secs]53446.166: [weak refs processing, 0.0134749 secs]53446.180: [class unloading, 1.0563894 secs]53447.236: [scrub symbol table, 0.0137832 secs]53447.250: [scrub string table, 0.6001941 secs][1 CMS-remark: 1431566K(1536000K)] 2036569K(2918400K), 1.8335188 secs] [Times: user=1.28 sys=0.00, real=1.83 secs] 
2018-07-26T09:06:09.924+0000: 53447.859: Total time for which application threads were stopped: 1.8372996 seconds

更新:这里是不同运行的结果,标记较少(但基本相同或更差)(ON A侧节点..我们一直在使用parallelOld测试,看起来很有希望)

标志:

-XX:+CMSClassUnloadingEnabled
-Xms3000M
-Xmx3000M
-Xmn1500M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=10
-XX:TargetSurvivorRatio=90
-XX:+UseTLAB
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:/opt/openmarket/sqs/logs/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=100M

GC日志摘要:

018-07-29T09:22:01.752 0000:38922.964:[CMS-concurrent-abortable-preclean-start] CMS:因时间而中止预清洗2018-07-29T09:22:06.888 0000:38928.100:[CMS-concurrent-abortable -preclean:5.118 / 5.136 secs] [时间:用户= 5.09 sys = 0.10,实际= 5.14秒] 2018-07-29T09:22:06.890 0000:38928.102:[GC(CMS最终备注)[YG占用率:184100 K( 1382400 K)] 2018-07-29T09:22:06.890 0000:38928.102:[重新扫描(平行),0.0609614秒] 2018-07-29T09:22:06.951 0000:38928.163:[弱参考处理,0.0026624秒] 2018-07 -29T09:22:06.954 0000:38928.166:[类卸载,0.0369107秒] 2018-07-29T09:22:06.991 0000:38928.203:[擦洗符号表,0.0024052秒] 2018-07-29T09:22:06.993 0000:38928.205 :[scrub strings table,1.0810044 secs] [1 CMS-remark:1414399K(1536000K)] 1598500K(2918400K),1.1918919 secs] [次:用户= 1.17 sys = 0.00,real = 1.19 secs] 2018-07-29T09:22 :08.082 0000:38929.294:应用程序线程停止的总时间:1.1943563秒,停止线程:0.0001797秒

2018-07-29T18:52:55.303 0000:73176.515:[CMS-concurrent-abortable-preclean-start] CMS:因时间而中止预清2018-07-29T18:53:00.352 0000:73181.564:[CMS-并发 - 可中止-preclean:5.029 / 5.049秒] [时间:用户= 5.83 sys = 0.31,实际= 5.05秒] 2018-07-29T18:53:00.355 0000:73181.567:[GC(CMS最终备注)[YG占用率:185778 K( 1382400 K)] 2018-07-29T18:53:00.355 0000:73181.567:[重新扫描(平行),0.0650261秒] 2018-07-29T18:53:00.420 0000:73181.632:[弱参考处理,0.0025795秒] 2018-07 -29T18:53:00.423 0000:73181.635:[类卸载,0.0257172秒] 2018-07-29T18:53:00.448 0000:73181.660:[擦洗符号表,0.0026111秒] 2018-07-29T18:53:00.451 0000:73181.663 :[scrub string table,0.9664343 secs] [1 CMS-remark:1414468K(1536000K)] 1600246K(2918400K),1.0629217 secs] [次:用户= 1.11 sys = 0.00,real = 1.06 secs] 2018-07-29T18:53 :01.418 0000:73182.630:应用程序线程停止的总时间:1.0657259秒,停止线程:0.0002660秒

2018-07-30T12:30:12.498 0000:136613.710:[CMS-concurrent-abortable-preclean-start] CMS:因时间而中止预清洗2018-07-30T12:30:17.698 0000:136618.910:[CMS-concurrent-abortable -preclean:5.185 / 5.200 secs] [时间:用户= 5.72 sys = 0.20,实际= 5.20秒] 2018-07-30T12:30:17.701 0000:136618.913:[GC(CMS最终备注)[YG占用率:179068 K( 1382400 K)] 2018-07-30T12:30:17.701 0000:136618.913:[重新扫描(平行),0.0632593秒] 2018-07-30T12:30:17.764 0000:136618.976:[弱参考处理,0.0022151秒] 2018-07 -30T12:30:17.767 0000:136618.979:[类卸载,0.1576453秒] 2018-07-30T12:30:17.924 0000:136619.136:[擦洗符号表,0.0043250秒] 2018-07-30T12:30:17.929 0000:136619.141 :[scrub strings table,1.0465825 secs] [1 CMS-remark:1414152K(1536000K)] 1593220K(2918400K),1.2744338 secs] [次:用户= 1.21 sys = 0.00,real = 1.28 secs] 2018-07-30T12:30 :18.976 0000:136620.188:应用程序线程停止的总时间:1.2772222秒,停止线程: 0.0001826秒