Cassandra G1 GC使用,案例
参考文档: https://medium.com/@mlowicki/move-cassandra-2-1-to-g1-garbage-collector-b9fb27365509#.7ugg78nq7
#MAX_HEAP_SIZE="8192M"
#HEAP_NEWSIZE="2048M"
MAX_HEAP_SIZE="16384M"
#calculate_heap_sizes函数定义的地方可以不用删除,只要调用的地方去掉就可以
#if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" ]; then
# calculate_heap_sizes
#else
# if [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" ]; then
# echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs (see cassandra-env.sh)"
# exit 1
# fi
#fi
JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
#JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}" #G1不需要配置New
#JVM_OPTS="$JVM_OPTS -Xms16G" #这些原本是不存在的,都去掉
#JVM_OPTS="$JVM_OPTS -Xmx16G"
#JVM_OPTS="$JVM_OPTS -Xmn4G"
#JVM_OPTS="$JVM_OPTS -XX:MaxDirectMemorySize=6G"
#JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"
#JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
#JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
#JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
#JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
#JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
#JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
#JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
#JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=10000"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
## note: bash evals ‘1.7.x’ as > ‘1.7’ so this is really a >= 1.7 jvm check
#if { [ "$JVM_VERSION" /> "1.7" ] && [ "$JVM_VERSION" /< "1.8.0" ] && [ "$JVM_PATCH_VERSION" -ge "60" ]; } || [ "$JVM_VERSION" /> "1.8" ] ; then
# JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000"
#fi
cassandra.yaml需要修改的配置:
table_allocation_type: offheap_objects
memtable_offheap_space_in_mb: 2048
CMS切换为G1脚本:
cd /usr/install/cassandra/conf
mv cassandra-env.sh /usr/install/cassandra/conf/cassandra-env.sh.cms_20160621
host=`ifconfig | grep "192.168.4" | awk '/inet addr/{sub("addr:",""); print $2}'`
comp=`cat /usr/install/cassandra/conf/cassandra.yaml | grep compactors`
wget http://192.168.47.211:8000/cassandra-env.sh
sed -i -e "s/localhost/$host/g" cassandra-env.sh
sed -i -e "s/heap_buffers/offheap_objects/g" cassandra.yaml
echo "memtable_offheap_space_in_mb: 2048" >> cassandra.yaml
sed -i -e "s/$comp/concurrent_compactors: 4/g" cassandra.yaml
/usr/install/cassandra/bin/nodetool flush
/usr/install/cassandra/bin/nodetool stopdaemon
/usr/install/cassandra/bin/cassandra
163仍然使用CMS,22号:
[admin@fp-cass048163 ~]$ jstat -gcutil 30764 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 3.91 58.22 29.55 59.81 1554565 45480.263 7069 2087.903 47568.166
6.93 0.00 5.53 29.62 59.81 1554566 45480.290 7069 2087.903 47568.193
6.93 0.00 46.36 29.62 59.81 1554566 45480.290 7069 2087.903 47568.193
24号:
[admin@fp-cass048163 ~]$ jstat -gcutil 30764 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 6.79 26.05 68.01 59.81 1613189 47191.337 7335 2159.359 49350.696
0.00 6.79 63.72 68.01 59.81 1613189 47191.337 7335 2159.359 49350.696
6.86 0.00 2.59 68.13 59.81 1613190 47191.368 7335 2159.359 49350.726
6.86 0.00 38.73 68.13 59.81 1613190 47191.368 7335 2159.359 49350.726
6.86 0.00 77.89 68.13 59.81 1613190 47191.368 7335 2159.359 49350.726
统计时间 | GC次数(超过200ms) | 超过1s |
---|---|---|
2016-06-01 | 73 | 16 |
2016-06-02 | 469 | 103 |
2016-06-03 | 81 | 2 |
2016-06-04 | 228 | 28 |
2016-06-05 | 618 | 137 |
2016-06-06 | 312 | 47 |
2016-06-07 | 63 | 9 |
2016-06-08 | 273 | 60 |
2016-06-09 | 64 | 7 |
2016-06-10 | 66 | 8 |
2016-06-11 | 79 | 8 |
2016-06-12 | 1037 | 244 |
2016-06-13 | 1160 | 277 |
2016-06-14 | 65 | 9 |
2016-06-15 | 82 | 10 |
2016-06-16 | 880 | 221 |
2016-06-17 | 622 | 137 |
2016-06-18 | 1454 | 343 |
2016-06-19 | 68 | 5 |
2016-06-20 | 654 | 175 |
2016-06-21 | 338 | 87 |
最后一条数据6-21号这天数据不是很准,因为这一天从CMS切换到G1
在162上测试,YGC超过200ms的都很少了
[admin@fp-cass048162 logs]$ tail -f system.log | grep GC
2016-06-21 11:14:12,947 - G1 Young Generation GC in 246ms. G1 Eden Space: 8917090304 -> 0; G1 Old Gen: 6059796976 -> 6071479536; G1 Survivor Space: 167772160 -> 293601280
2016-06-21 11:15:36,990 - G1 Young Generation GC in 245ms. G1 Eden Space: 8506048512 -> 0; G1 Old Gen: 6200014560 -> 6216498552;
2016-06-21 13:22:22,631 - G1 Young Generation GC in 283ms. G1 Eden Space: 8849981440 -> 0; G1 Old Gen: 5812656096 -> 5823218856; G1 Survivor Space: 327155712 -> 335544320
2016-06-21 13:50:59,920 - G1 Young Generation GC in 241ms. G1 Eden Space: 10032775168 -> 0; G1 Old Gen: 4619885032 -> 4633499088; G1 Survivor Space: 268435456 -> 276824064
:white_check_mark: 162开启G1,22号到24号为止,没有发生一次FGC,下面是24号统计的信息:YGCT/YGC=2496/28780=0.086=86ms
[admin@fp-cass048162 ~]$ jstat -gcutil 47069 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 12.13 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 24.17 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 35.87 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 47.74 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 59.27 78.58 97.10 28780 2496.678 0 0.000 2496.678
:white_check_mark: 171开启G1,22号(运行一天,没有发生一次FGC):YGCT/YGC=494/6258=0.078=78ms
[admin@cass048171 ~]$ jstat -gcutil 27229 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 60.11 80.74 96.16 6258 494.761 0 0.000 494.761
0.00 100.00 74.05 80.74 96.16 6258 494.761 0 0.000 494.761
0.00 100.00 88.07 80.74 96.16 6258 494.761 0 0.000 494.761
0.00 100.00 7.06 80.81 96.16 6259 494.846 0 0.000 494.846
注:由于Cassandra日志文件会进行滚动,所以当前日志文件用grep,以前的文件用zgrep统计
统计CMS一般在压缩文件中查找:ConcurrentMarkSweep
统计G1一般在当前文件中查找:G1 Old Generation GC(当然也可以在压缩文件中找)
所有GC统计(当前文件,压缩文件):
当前文件,所有GC,按天统计
$ grep 'GC' /usr/install/cassandra/logs/system.log | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-25 29
2016-06-26 55
压缩文件,所有GC,按天统计
$ zgrep 'GC' /usr/install/cassandra/logs/system.log.*.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-16 880
2016-06-17 622
2016-06-18 1454
2016-06-19 68
2016-06-20 654
2016-06-21 362
2016-06-22 22
2016-06-23 84
2016-06-24 46
2016-06-25 62
超过10s的GC(当前文件,压缩文件)
$ zgrep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log.*.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-13 280
2016-06-14 10
2016-06-15 11
2016-06-16 224
2016-06-17 139
2016-06-18 346
2016-06-19 5
2016-06-20 177
2016-06-21 88
2016-06-23 1
$ grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
当前文件统计(所有GC,超过10s)
当前文件,所有GC,按时段统计
grep 'GC' /usr/install/cassandra/logs/system.log | awk '{print $4":"substr($5,0,2)}' | uniq -c | awk '{print $2" "$1}' | sort
当前文件,超过10s的所有GC,按天统计
grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | awk '{print $4}' | uniq -c | awk '{print $2" "$1}' | sort
当前文件,超过10s的GC,按时段统计
grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | awk '{print $4":"substr($5,0,2)}' | uniq -c | awk '{print $2" "$1}' | sort
压缩文件统计(所有GC,超过10s)
压缩文件,所有GC,按时段统计
zgrep 'GC' /usr/install/cassandra/logs/system.log.1.zip | awk '{print $4":"substr($5,0,2)}' | sort | uniq -c | awk '{print $2" "$1}'
压缩文件,超过10s的所有GC,按天统计
zgrep 'GC in [0-9][0-9][0-9][0-9]ms' /usr/install/cassandra/logs/system.log.1.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}'
压缩文件的CMS,当前文件的G1
压缩文件,CMS GC,按天统计
$ zgrep 'ConcurrentMarkSweep' /usr/install/cassandra/logs/system.log.*.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-12 1023
2016-06-13 1138
2016-06-14 60
2016-06-15 75
2016-06-16 868
2016-06-17 612
2016-06-18 1438
2016-06-19 62
2016-06-20 635
2016-06-21 326
当前文件,G1 GC,按天统计
$ grep 'G1 Old Generation GC' /usr/install/cassandra/logs/system.log | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}'
[admin@fp-cass048162 ~]$ grep 'GC' /usr/install/cassandra/logs/system.log | wc -l
134
[admin@fp-cass048162 ~]$ grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | wc -l
1
咦,还是有一个超过10s的GC:
2016-06-23 08:09:25,970 - G1 Young Generation GC in 206ms. G1 Eden Space: 9646899200 -> 0; G1 Old Gen: 4639205352 -> 5004258312; G1 Survivor Space: 511705088 -> 176160768
2016-06-23 09:47:38,757 - G1 Young Generation GC in 3770ms. G1 Eden Space: 8027897856 -> 0; G1 Old Gen: 6979980640 -> 13109433016; G1 Survivor Space: 150994944 -> 1023410176
2016-06-23 09:47:39,607 - G1 Young Generation GC in 780ms. G1 Eden Space: 159383552 -> 0; G1 Old Gen: 13109433016 -> 14160738744; G1 Survivor Space: 1023410176 -> 75497472
2016-06-23 10:18:11,557 - G1 Young Generation GC in 257ms. G1 Eden Space: 771751936 -> 0; G1 Old Gen: 6197828584 -> 5781983928; G1 Survivor Space: 83886080 -> 92274688
同样的情况发生在171上:24号:
YGCT/YGC=3051/57162=0.053=53ms
FTCT/FGC=63/2=30s!!!
[admin@cass048171 ~]$ jstat -gcutil 27229 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 25.22 86.72 96.99 57162 3051.206 2 63.878 3115.084
0.00 100.00 53.48 86.72 96.99 57162 3051.206 2 63.878 3115.084
0.00 100.00 78.70 86.65 96.99 57162 3051.206 2 63.878 3115.084
[admin@cass048171 ~]$ grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | wc -l
4
[admin@cass048171 ~]$ grep 'GC' /usr/install/cassandra/logs/system.log | wc -l
24
2016-06-23 10:33:10,706 - G1 Young Generation GC in 353ms. G1 Eden Space: 10049552384 -> 0; G1 Old Gen: 5018700376 -> 5039128024; G1 Survivor Space: 134217728 -> 914358272
2016-06-23 10:33:36,366 - G1 Young Generation GC in 21135ms. G1 Eden Space: 8438939648 -> 0; G1 Old Gen: 5039128024 -> 13030534304; G1 Survivor Space: 914358272 -> 1174405120 :no_good:
2016-06-23 10:33:37,667 - G1 Young Generation GC in 1209ms. G1 Eden Space: 8388608 -> 0; G1 Old Gen: 13030534304 -> 14225910944; G1 Survivor Space: 1174405120 -> 58720256 :no_good:
2016-06-23 10:33:38,638 - G1 Young Generation GC in 259ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 14225910944 -> 14276242592; G1 Survivor Space: 58720256 -> 75497472
2016-06-23 10:33:48,066 - G1 Young Generation GC in 1096ms. G1 Eden Space: 771751936 -> 0; G1 Old Gen: 16117542048 -> 15689723040; G1 Survivor Space: 83886080 -> 33554432 :no_good:
2016-06-23 17:58:26,833 - G1 Young Generation GC in 261ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 13005169216 -> 12363114936;
2016-06-23 21:01:43,202 - G1 Young Generation GC in 268ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 14009140736 -> 13586786088; G1 Survivor Space: 58720256 -> 67108864
2016-06-23 22:54:34,508 - G1 Old Generation GC in 35275ms. G1 Old Gen: 14394600712 -> 10991951608; G1 Survivor Space: 25165824 -> 0 :no_good:(FGC)
2016-06-23 22:56:53,104 - G1 Young Generation GC in 229ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 6362278144 -> 4860703680; G1 Survivor Space: 58720256 -> 33554432
2016-06-23 22:56:54,109 - G1 Young Generation GC in 263ms. G1 Eden Space: 822083584 -> 0; G1 Old Gen: 4860703680 -> 3774189968; G1 Survivor Space: 33554432 -> 50331648
2016-06-24 01:15:31,753 - G1 Young Generation GC in 251ms. G1 Eden Space: 10141827072 -> 0; G1 Old Gen: 4901503760 -> 4907845240; G1 Survivor Space: 134217728 -> 687865856
2016-06-24 01:15:38,930 - G1 Young Generation GC in 252ms. G1 Eden Space: 9026142208 -> 0; G1 Old Gen: 4907845240 -> 4959650376; G1 Survivor Space: 687865856 -> 645922816
2016-06-24 01:15:46,310 - G1 Young Generation GC in 263ms. G1 Eden Space: 9059696640 -> 0; G1 Old Gen: 4959650376 -> 5025814888; G1 Survivor Space: 645922816 -> 637534208
2016-06-24 01:15:53,692 - G1 Young Generation GC in 232ms. G1 Eden Space: 9009364992 -> 0; G1 Old Gen: 5025814888 -> 5035267128; G1 Survivor Space: 637534208 -> 645922816
2016-06-24 01:16:01,092 - G1 Young Generation GC in 234ms. G1 Eden Space: 8984199168 -> 0; G1 Old Gen: 5035267128 -> 5635307288; G1 Survivor Space: 645922816 -> 100663296
2016-06-24 02:40:23,736 - G1 Young Generation GC in 205ms. G1 Eden Space: 771751936 -> 0; G1 Old Gen: 5298934024 -> 4154593728;
2016-06-24 04:47:09,843 - G1 Young Generation GC in 429ms. G1 Eden Space: 8833204224 -> 0; G1 Old Gen: 6350837368 -> 6645919712; G1 Survivor Space: 75497472 -> 1115684864
2016-06-24 04:47:10,624 - G1 Young Generation GC in 476ms. G1 Eden Space: 545259520 -> 0; G1 Old Gen: 6645919712 -> 7641490784; G1 Survivor Space: 1115684864 -> 142606336
2016-06-24 04:47:16,581 - G1 Young Generation GC in 796ms. G1 Eden Space: 6660554752 -> 0; G1 Old Gen: 7779693536 -> 8886989792; G1 Survivor Space: 142606336 -> 855638016
2016-06-24 04:47:17,526 - G1 Young Generation GC in 875ms. G1 Eden Space: 8388608 -> 0; G1 Old Gen: 8886989792 -> 8136209376; G1 Survivor Space: 855638016 -> 58720256
2016-06-24 04:47:18,447 - G1 Young Generation GC in 286ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 8136209376 -> 7452538096; G1 Survivor Space: 58720256 -> 109051904
2016-06-24 04:47:19,135 - G1 Young Generation GC in 217ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 7452538096 -> 7683224816;
2016-06-24 04:47:24,022 - G1 Young Generation GC in 202ms. G1 Eden Space: 2147483648 -> 0; G1 Old Gen: 8652109040 -> 8836658416; G1 Survivor Space: 109051904 -> 285212672
2016-06-24 04:47:29,255 - G1 Young Generation GC in 263ms. G1 Eden Space: 536870912 -> 0; G1 Old Gen: 8853435632 -> 8970333808; G1 Survivor Space: 318767104 -> 58720256
如果将GC时间和Compaction对照,会发现时间是对应的。但是这能说明什么问题,是因为Compaction导致GC的吗?
[admin@cass048171 ~]$ cat /usr/install/cassandra/logs/system.log | grep CompactionManager
INFO [Service Thread] 2016-06-23 10:33:10,755 StatusLogger.java:75 - CompactionManager 4 17
INFO [Service Thread] 2016-06-23 10:33:37,642 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 10:33:37,726 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 10:33:38,680 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 10:33:48,084 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 17:58:26,881 StatusLogger.java:75 - CompactionManager 6 61
INFO [Service Thread] 2016-06-23 21:01:43,258 StatusLogger.java:75 - CompactionManager 6 73
INFO [Service Thread] 2016-06-23 22:54:34,576 StatusLogger.java:75 - CompactionManager 5 81
INFO [Service Thread] 2016-06-23 22:56:53,115 StatusLogger.java:75 - CompactionManager 5 71
INFO [Service Thread] 2016-06-23 22:56:54,124 StatusLogger.java:75 - CompactionManager 5 71
INFO [Service Thread] 2016-06-24 01:15:31,762 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:15:38,940 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:15:46,321 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:15:53,717 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:16:01,106 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 02:40:23,752 StatusLogger.java:75 - CompactionManager 5 44
INFO [Service Thread] 2016-06-24 04:47:09,879 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:10,636 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:17,491 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:17,555 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:18,458 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:19,150 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:24,031 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:29,287 StatusLogger.java:75 - CompactionManager 6 35
业务监控系统:
OpsCenter:
Cassandra日志文件:
[admin@cass048171 ~]$ grep 'GC in [0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log
INFO [Service Thread] 2016-07-07 08:19:08,106 GCInspector.java:258 - G1 Young Generation GC in 204ms. G1 Eden Space: 696254464 -> 0; G1 Old Gen: 7281313536 -> 5909453064; G1 Survivor Space: 159383552 -> 92274688;
INFO [Service Thread] 2016-07-07 08:19:09,324 GCInspector.java:258 - G1 Young Generation GC in 313ms. G1 Eden Space: 763363328 -> 0; G1 Old Gen: 5909453064 -> 5497545648; G1 Survivor Space: 92274688 -> 100663296;
INFO [Service Thread] 2016-07-07 08:59:00,400 GCInspector.java:258 - G1 Young Generation GC in 270ms. G1 Eden Space: 763363328 -> 0; G1 Old Gen: 3972003048 -> 3531601136; G1 Survivor Space: 92274688 -> 67108864;
INFO [Service Thread] 2016-07-07 09:49:47,102 GCInspector.java:258 - G1 Young Generation GC in 17887ms.G1 Eden Space: 8875147264 -> 0; G1 Old Gen: 6224121000 -> 11796004824; G1 Survivor Space: 125829120 -> 1132462080;
INFO [Service Thread] 2016-07-07 09:49:48,795 GCInspector.java:258 - G1 Young Generation GC in 536ms. G1 Eden Space: 1241513984 -> 0; G1 Old Gen: 11796004824 -> 13020741592; G1 Survivor Space: 1132462080 -> 301989888;
INFO [Service Thread] 2016-07-07 09:49:51,681 GCInspector.java:258 - G1 Young Generation GC in 464ms. G1 Eden Space: 1619001344 -> 0; G1 Old Gen: 13020741592 -> 13700218840; G1 Survivor Space: 301989888 -> 243269632;
INFO [Service Thread] 2016-07-07 09:49:53,068 GCInspector.java:258 - G1 Young Generation GC in 311ms. G1 Eden Space: 1006632960 -> 0; G1 Old Gen: 13700218840 -> 14111260632; G1 Survivor Space: 243269632 -> 159383552;
INFO [Service Thread] 2016-07-07 09:49:54,076 GCInspector.java:258 - G1 Young Generation GC in 263ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 14111260632 -> 14463582168; G1 Survivor Space: 159383552 -> 117440512;
INFO [Service Thread] 2016-07-07 09:49:55,003 GCInspector.java:258 - G1 Young Generation GC in 236ms. G1 Eden Space: 738197504 -> 0; G1 Old Gen: 14463582168 -> 14803320800; G1 Survivor Space: 117440512 -> 109051904;
INFO [Service Thread] 2016-07-07 09:49:55,916 GCInspector.java:258 - G1 Young Generation GC in 225ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 14803320800 -> 15096922080;
INFO [Service Thread] 2016-07-07 09:49:57,943 GCInspector.java:258 - G1 Young Generation GC in 202ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 15390523360 -> 15717679072;
INFO [Service Thread] 2016-07-07 09:50:02,743 GCInspector.java:258 - G1 Young Generation GC in 251ms. G1 Eden Space: 830472192 -> 0; G1 Old Gen: 10585068072 -> 9267343240; G1 Survivor Space: 25165824 -> 33554432;
INFO [Service Thread] 2016-07-07 09:50:04,138 GCInspector.java:258 - G1 Young Generation GC in 326ms. G1 Eden Space: 822083584 -> 0; G1 Old Gen: 9267343240 -> 8527731712;
INFO [Service Thread] 2016-07-07 09:50:28,182 GCInspector.java:258 - G1 Young Generation GC in 204ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 8527855792 -> 8195194872; G1 Survivor Space: 58720256 -> 100663296;
2016-07-07 09:49:47
发生17s的YGC,堆中各个代的大小:
在/var/log/cassandra/gc.log中是:2016-07-07T09:49:47
G1 Young Generation GC in 17887ms. YGC花费了17s
G1 Eden Space: 8,875,147,264 -> 0; 8G -> 0 新生代
G1 Old Gen: 6,224,121,000 -> 11,796,004,824; 6G -> 11G
G1 Survivor Space: 125,829,120 -> 1,132,462,080; 125M -> 1G
Eden从8G减少到0,Survivor从125M增加到1G,其实很好理解:
对象从Eden转移到了Survivor,所以Eden减少,Survivor增加。
Old也会增加的原因是:在GC的时候,Survivor的对象到达年龄计数器后,也会转移到Old。
所以Survivor增加的空间并不一定都是从Eden转移过来的,因为Survivor有一部分会转移到Old。
同时并不是Eden的所有对象都会转移到Survivor,只有需要的对象才会转移到Survivor。
按照正常转移方式,Old增加的大小,最多不会超过Survivor的大小。但是上面Old从6G增加到11G,而Survivor在GC前只有125M,所以正常来说,就算Survivor的所有对象都转移到Old最多也不会超过7G!
那么唯一的解释是:Eden的对象直接被转移到了Old!Eden总共有8G,转移了1G到Survivor,并且有4G跳过Survivor直接转移到Old!
新生代8G看起来有点大,对比其他日志,最大的也才1.6G。
年轻代的GC:存活对象会被拷贝/移动到一个或多个“Survivor”区域,存活时间够长的直接移到“老年代”区域。这里会stop the world,但young GC的过程是多线程执行的。
结论:YGC会STW,如果YGC时间很长,应用程序暂停时间就很长!
总体GC时间
上gc.log:
下面这两条只是作为对比:
2016-07-07T09:49:22.341+0800: 942859.604: Total time for which application threads were stopped: 0.0814731 seconds, Stopping threads took: 0.0002340 seconds
2016-07-07T09:49:29.213+0800: 942866.476: Total time for which application threads were stopped: 0.0012369 seconds, Stopping threads took: 0.0002430 seconds
这里才开始比较严重的YGC:
{Heap before GC invocations=159966 (full 2):
garbage-first heap total 16777216K, used 14876451K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 1074 young (8798208K), 15 survivors (122880K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
2016-07-07T09:49:29.213+0800: 942866.477: [GC pause (GCLocker Initiated GC) (young) →YGC,暂停在GCLocker Initiated GC阶段
Desired survivor size 566231040 bytes, new threshold 15 (max 15) 中间这一段打印的是存活区的信息
- age 1: 16845608 bytes, 16845608 total
- age 2: 6895480 bytes, 23741088 total
- age 3: 7210040 bytes, 30951128 total
- age 4: 6712376 bytes, 37663504 total
- age 5: 6050320 bytes, 43713824 total
- age 6: 5356392 bytes, 49070216 total
- age 7: 5514552 bytes, 54584768 total
- age 8: 5607616 bytes, 60192384 total
- age 9: 5430416 bytes, 65622800 total
- age 10: 5450352 bytes, 71073152 total
- age 11: 5550480 bytes, 76623632 total
- age 12: 5110560 bytes, 81734192 total
- age 13: 5440256 bytes, 87174448 total
- age 14: 5565200 bytes, 92739648 total
- age 15: 5550672 bytes, 98290320 total
(to-space exhausted), 17.8874292 secs] → YGC花费了17s
关于G1日志文件的分析参考: https://blogs.oracle.com/g1gc/entry/g1gc_logs_how_to_print
2014-07-16T11:14:35.792-0700: 11.871: [GC pause (G1 Evacuation Pause) (young),0.0178446 secs]
Date-and-time: timestamp [GC (gc-cause-is-evacuation-pause) (gc-type-is-young-gc), total-gc-pause-in-sec]
对应我们这里的日志文件:2016-07-07T09:49:29.213+0800: 942866.477: [GC pause (GCLocker Initiated GC) (young) (to-space exhausted), 17.8874292 secs]
在GC类型(这里是young,具体阶段是:GCLocker Initiated GC)之后的是GC的原因:to空间耗尽了。
问题:为什么to空间不够?
http://bboniao.com/jvm/2014-03/g1garbage-first.html
https://www.infoq.com/articles/tuning-tips-G1-GC
解决方案:
参考线上API的GC设置:
-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1ReservePercent=30 -XX:InitiatingHeapOccupancyPercent=30 -XX:ConcGCThreads=4
GC各个阶段
接下来打印了各个阶段的GC时间:
[Parallel Time: 16666.1 ms, GC Workers: 13] ==》并行收集花了16s
[GC Worker Start (ms): Min: 942866476.9, Avg: 942866477.0, Max: 942866477.1, Diff: 0.2]
[Ext Root Scanning (ms): Min: 2.9, Avg: 3.0, Max: 3.1, Diff: 0.2, Sum: 39.0]
[Update RS (ms): Min: 71.8, Avg: 72.2, Max: 72.7, Diff: 0.9, Sum: 938.2]
[Processed Buffers: Min: 52, Avg: 75.5, Max: 102, Diff: 50, Sum: 982]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 2.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 16589.9, Avg: 16590.4, Max: 16590.6, Diff: 0.7, Sum: 215675.3] ==>主要在这里!
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.3]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 13]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.7]
[GC Worker Total (ms): Min: 16665.8, Avg: 16665.9, Max: 16666.0, Diff: 0.2, Sum: 216656.7]
[GC Worker End (ms): Min: 942883142.9, Avg: 942883142.9, Max: 942883142.9, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 2.0 ms]
[Other: 1219.2 ms]
[Evacuation Failure: 1212.5 ms] ==>疏散失败!
[Choose CSet: 0.0 ms]
[Ref Proc: 0.7 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 4.4 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.7 ms]
[Eden: 8472.0M(8464.0M)->0.0B(1184.0M) Survivors: 120.0M->1080.0M Heap: 14.2G(16.0G)->12.0G(16.0G)]
堆内存变化
上面最后一行是本次GC的堆信息,区域:GC前大小->GC后大小,括号里是Capacity(也有可能变化)
Heap的Capacity等于-Xms和-Xmx的值=16G,这个是不会变的,但是Eden的大小是会变化,比如GC前Capacity=8G,GC后变成了1G。
Young中分成Eden和Survivor,其中Eden和Survivor是成比例的。
[Eden: used-enden-before-gc(eden-capacity)->used-eden-after-gc(eden-capacity-for-next-gc)
Survivors: used-survivors-before-gc->used-survivors-before-gc
Heap: used-heap-before-gc(heap-capacity)->used-heap-after-gc(heap-capacity-for-next-gc)]
这里打印的堆信息其实和Cassandra的system.log是一致的,不过上面的日志并没有打印Old!
G1 Young Generation GC in 17887ms. YGC花费了17s
G1 Eden Space: 8,875,147,264 -> 0; 8G -> 0 新生代GC前8G,GC后清空
G1 Old Gen: 6,224,121,000 -> 11,796,004,824; 6G -> 11G 老年代从6G增加到11G
G1 Survivor Space: 125,829,120 -> 1,132,462,080; 125M -> 1G 存活区从125M增加到1G
查看其它正常的堆变化情况,会发现基本上Survivor不会增大到1G,而且Heap一般
[admin@cass048171 ~]$ cat /var/log/cassandra/gc.log.4.current | grep Eden | tail
[Eden: 4952.0M(4952.0M)->0.0B(696.0M) Survivors: 120.0M->120.0M Heap: 14.1G(16.0G)->9511.6M(16.0G)]
[Eden: 696.0M(696.0M)->0.0B(5056.0M) Survivors: 120.0M->80.0M Heap: 10207.6M(16.0G)->9455.0M(16.0G)]
[Eden: 5056.0M(5056.0M)->0.0B(5040.0M) Survivors: 80.0M->88.0M Heap: 14.2G(16.0G)->9463.0M(16.0G)]
[Eden: 5040.0M(5040.0M)->0.0B(4992.0M) Survivors: 88.0M->112.0M Heap: 14.2G(16.0G)->9483.0M(16.0G)]
[Eden: 4992.0M(4992.0M)->0.0B(4976.0M) Survivors: 112.0M->120.0M Heap: 14.1G(16.0G)->9491.0M(16.0G)]
[Eden: 4976.0M(4976.0M)->0.0B(696.0M) Survivors: 120.0M->120.0M Heap: 14.1G(16.0G)->9495.0M(16.0G)]
[Eden: 696.0M(696.0M)->0.0B(5024.0M) Survivors: 120.0M->80.0M Heap: 10191.0M(16.0G)->9488.6M(16.0G)]
[Eden: 5024.0M(5024.0M)->0.0B(5008.0M) Survivors: 80.0M->88.0M Heap: 14.2G(16.0G)->9496.6M(16.0G)]
[Eden: 5008.0M(5008.0M)->0.0B(4960.0M) Survivors: 88.0M->112.0M Heap: 14.2G(16.0G)->9524.6M(16.0G)]
[Eden: 4968.0M(4960.0M)->0.0B(4944.0M) Survivors: 112.0M->120.0M Heap: 14.1G(16.0G)->9532.6M(16.0G)]
GC前后统计
最后是Heap在GC前后的统计,这里把GC前的也放在一起进行比较:
{Heap before GC invocations=159966 (full 2):
garbage-first heap total 16777216K, used 14876451K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 1074 young (8798208K), 15 survivors (122880K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
Heap after GC invocations=159967 (full 2):
garbage-first heap total 16777216K, used 12625455K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 135 young (1105920K), 135 survivors (1105920K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
}
[Times: user=41.47 sys=3.94, real=17.88 secs]
2016-07-07T09:49:47.101+0800: 942884.364: Total time for which application threads were stopped: 17.8884392 seconds, Stopping threads took: 0.0001048 seconds
The ratio user/real should be close to the number of parallel gc threads.user和real的比值是并行的GC线程,这里41.47/17.88=2.3