Java Garbage Collectors – Moving to Java7 Garbage-First (G1) Collector Gurpreet S. Sachdeva Aricent Group
2 Agenga
Memory Management • Performance Tuning • Garbage Collector • JIT Compiler • Heap size
GC Goals • Minimal Footprint • High Throughput • Responsiveness / Low Latency
Generational Hypothesis • Most objects die young • Only a few live very long • Longer they live, more likely they live longer • Old objects rarely reference young objects
Generational Garbage Collector
GC Choices
CMS operations in Young Generation (i) • Young Generation • 1 Eden and 2 Survivor Spaces • Old Generation • Compacted only at Full GC
CMS operations in Young Generation (ii) • Young Generation Collection • Stop the World Pause • Live objects from young generation moved to • Other survivor space • Old Generation
CMS operations in Young Generation (iii) • After Young Generation GC • Eden and 1 Survivor Space are empty • Objects promoted to old generation
CMS operations in Old Generation (i) • Mark Phases • Initial Mark (STW) • Concurrent Mark • Remark (STW)
CMS operations in Old Generation (ii) • Concurrent Sweeping Phase • Collects objects identified as unreachable during marking phases • In-place de-allocation of unreachable objects
CMS operations in Old Generation (iii) • Resetting • All unmarked objects de-allocated • Prepare for next concurrent collection by clearing data structures
CMS Challenges • Stop the World Pause (Remark phase) • Very Large Heaps • Fragmentation • Hard to tune
Introducing G1 • Concurrent • Refinement, Marking, Cleanup • Parallel • STW Pauses • Full GC is single threaded • Compacting
G1 Goals • Low Latency • Better Predictability • Easy to use & tune • Move away from current situation of 3 different GC frameworks
G1 Heap Overview • Single large contiguous space divided into fixed size regions (~ 2000) • No physical separation between young and old generation • Objects moved between regions during collections • Humongous Regions for large objects
G1 - Young Generation GC • Live objects evacuated (copied/moved) to • One or more survivor regions • Old regions • STW Pause • Done in parallel with multiple threads • Eden size and survivor size calculated for next young GC cycle
G1 - Old Generation GC • Initial Marking Phase • Piggybacked on Young Generation GC • STW Pause
G1 - Old Generation GC • Concurrent Marking Phase • Calculates liveness information per region • Empty regions can be reclaimed easily (denoted as X)
G1 - Old Generation GC • Remark Phase • Completes marking of live objects in heap • Empty regions removed and reclaimed • STW Pause • Region liveness known for all other old generation regions
G1 - Old Generation GC • Copying/Cleanup Phase • Select regions with low liveness • Collect (some) during next Young GC
G1 Old Generation GC • After Copying/Cleanup Phase • Selected regions collected and compacted • Some garbage objects may be left in old generation regions
Summary - G1 Old Generation GC • Concurrent Marking Phase • Calculates liveness information per region, concurrently while the application is running • Identifies best regions for subsequent evacuation phases • No corresponding sweeping phase • Remark Phase • Different marking algorithm than CMS • Uses Snapshot-at-the-beginning (SATB) which is much faster than what was being used in CMS • Completely empty regions are reclaimed • Copying/Cleanup Phase • Young generation and Old generation reclaimed at the same time • Old generation regions selected based on their liveness
G1 and CMS Comparison Features G1 GC CMS GC Concurrent and Generational Yes Yes Releases Max Heap memory after usage Yes No Low Latency Yes Yes Throughput Higher Lower Compaction Yes No Predictability More Less Physical separation between Young and Old No Yes
Footprint Overhead • For the same application size, as compared to CMS, the heap size is likely to be larger in G1 due to additional accounting data structures • Remembered Sets (RSets / RSet) • Track object references into a given region • Footprint overhead less than 5% • Caution • More inter-region references => Bigger Remembered Set • Large Remembered Set => Slow GC • Collection Sets (CSets / CSet) • Set of regions that will be collected in a GC • Footprint overhead less than 1%
Command Line Options • -XX:+UseG1GC • Tells the JVM to use G1 Garbage Collector • -XX:MaxGCPauseMillis=200 • Sets target for the maximum GC pause time
G1 GC Tuning Options (i) • Main goal is latency • If latency not a problem, then use Parallel GC • Related goal is simplified tuning • Most important tuning option • XX:MaxGCPauseMillis=200 (default value = 200ms) • Influences maximum amount of work per collection • Best effort only
G1 GC Tuning Options (ii) • -XX:InitiatingHeapOccupancyPercentage=n • Trigger to start GC • Percent of entire heap not just old generation • -XX:G1OldCSetRegionLiveThresholdPercent=n • Threshold for region to be included in a Collection Set
G1 GC Tuning Options (iii) • -XX:G1MixedGCCountTarget=n • How many Mixed GC / Concurrent Cycle • Precaution • Fixing young generation size (-Xmn) can cause PauseTimeTarget to be ignored • G1 no longer respects the pause time target • Even if heap expands, the young generation size is fixed
G1 Logging (i) • Three different log levels • Log level as fine – Use -verbosegc (equivalent to -XX:+PrintGC) • Sample Output [GC pause (G1 Humongous Allocation) (young) (initial-mark) 24M- >21M(64M), 0.2349730 secs] [GC pause (G1 Evacuation Pause) (mixed) 66M->21M(236M), 0.1625268 secs] • Log level as finer – Use -XX:+PrintGCDetails • Average, Min, and Max time displayed for each phase • Root Scan, RSet Updating (with processed buffers information), RSet Scan, Object Copy, Termination (with number of attempts) • Also shows “other” time such as time spent choosing CSet, reference processing, reference enqueuing and freeing CSet • Shows the Eden, Survivors and Total Heap occupancies. • Sample Output [Ext Root Scanning (ms): Avg: 1.7 Min: 0.0 Max: 3.7 Diff: 3.7] [Eden: 818M(818M)->0B(714M) Survivors: 0B- >104M Heap: 836M(4096M)->409M(4096M)]
G1 Logging (ii) • Log level as finest – Use -XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest • Like finer but includes individual worker thread information. • Sample Output [Ext Root Scanning (ms): 2.1 2.4 2.0 0.0 Avg: 1.6 Min: 0.0 Max: 2.4 Diff: 2.3] [Update RS (ms): 0.4 0.2 0.4 0.0 Avg: 0.2 Min: 0.0 Max: 0.4 Diff: 0.4] [Processed Buffers : 5 1 10 0 Sum: 16, Avg: 4, Min: 0, Max: 10, Diff: 10] • Determine Time – How time is displayed in GC logs XX:+PrintGCTimeStamps - Shows the elapsed time since the JVM started 1.729: [GC pause (young) 46M->35M(1332M), 0.0310029 secs] -XX:+PrintGCDateStamps - Adds a time of day prefix to each entry 2012-05-02T11:16:32.057+0200: [GC pause (young) 46M- >35M(1332M), 0.0317225 secs]
G1 Logging Keywords (i) • Parallel Time - Overall elapsed time of the main parallel part of the pause • Worker Start – Timestamp at which the workers start • Note: The logs are ordered on thread id and are consistent on each entry 414.557: [GC pause (young), 0.03039600 secs] [Parallel Time: 22.9 ms] [GC Worker Start (ms): 7096.0 7096.0 7096.1 7096.1 706.1 7096.1 7096.1 7096.1 7096.2 7096.2 7096.2 7096.2 Avg: 7096.1, Min: 7096.0, Max: 7096.2, Diff: 0.2] • External Root Scanning - The time taken to scan the external root (e.g., things like system dictionary that point into the heap.) [Ext Root Scanning (ms): 3.1 3.4 3.4 3.0 4.2 2.0 3.6 3.2 3.4 7.7 3.7 4.4 Avg: 3.8, Min: 2.0, Max: 7.7, Diff: 5.7] • Update Remembered Set - Buffers that are completed but have not yet been processed by the concurrent refinement thread before the start of the pause have to be updated. • Time depends on density of the cards. The more cards, the longer it will take. [Update RS (ms): 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Processed Buffers : 26 0 0 0 0 0 0 0 0 0 0 0 Sum: 26, Avg: 2, Min: 0, Max: 26, Diff: 26]
G1 Logging Keywords (ii) • Scanning Remembered Sets - Look for pointers that point into the Collection Set [Scan RS (ms): 0.4 0.2 0.1 0.3 0.0 0.0 0.1 0.2 0.0 0.1 0.0 0.0 Avg: 0.1, Min: 0.0, Max: 0.4, Diff: 0.3]F • Object Copy - The time that each individual thread spent copying and evacuating objects [Object Copy (ms): 16.7 16.7 16.7 16.9 16.0 18.1 16.5 16.8 16.7 12.3 16.4 15.7 Avg: 16.3, Min: 12.3, Max: 18.1, Diff: 5.8] • Termination Time - When a worker thread is finished with its particular set of objects to copy and scan, it enters the termination protocol. It looks for work to steal and once it's done with that work it again enters the termination protocol. Termination attempt counts all the attempts to steal work. [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 Sum: 12, Avg: 1, Min: 1, Max: 1, Diff: 0] • GC Worker End [GC Worker End (ms): 7116.4 7116.3 7116.4 7116.3 7116.4 7116.3 7116.4 7116.4 7116.4 7116.4 7116.3 7116.3 Avg: 7116.4, Min: 7116.3, Max: 7116.4, Diff: 0.1] • GC worker end time – Timestamp when the individual GC worker stops. • GC worker time – Time taken by individual GC worker thread.
G1 Logging Keywords (iii) • GC Worker Other - The time (for each GC thread) that can't be attributed to the worker phases listed previously. Should be quite low. [GC Worker Other (ms): 2.6 2.6 2.7 2.7 2.7 2.7 2.7 2.8 2.8 2.8 2.8 2.8 Avg: 2.7, Min: 2.6, Max: 2.8, Diff: 0.2] • Clear CT - Time taken to clear the card table of RSet scanning meta-data [Clear CT: 0.6 ms] • Other - Time taken for various other sequential phases of the GC pause. [Other: 6.8 ms] • CSet - Time taken finalizing the set of regions to collect. Usually very small; slightly longer when having to select old [Choose CSet: 0.1 ms] • Ref Proc - Time spent processing soft, weak, etc. references deferred from the prior phases of the GC. [Ref Proc: 4.4 ms] • Ref Enq - Time spent placing soft, weak, etc. references on to the pending list. [Ref Enq: 0.1 ms] • Free CSet - Time spent freeing the set of regions that have just been collected, including their remembered sets [Free CSet: 2.0 ms]
G1 Evacuation Failure • Promotion Failure when JVM runs out of heap regions during the GC • Indicated by “to-space overflow” in PrintGCDetails log • Very expensive operation
Sample Application Test • Sample Application Create and add 190 Float Arrays into an Array List Each Float Array reserves 4MB of memory, i.e. 1 x 1024 x 1024 = 4 MB 4 MB x 190 = 760 MB After each iteration the arrays are released and application sleeps for some time Same steps are repeated certain number of times
Observations for CMS • Command Line Arguments java -server -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:CMS.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:UsersgusachdeworkspaceMemorybin GCTest 190 • Observations with VisualVM
Observations for G1 Command Line Arguments java -server -XX:+UseG1GC -XX:+PrintGCDetails -XX: +PrintGCTimeStamps -Xloggc:G1GC.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:UsersgusachdeworkspaceMemorybin GCTest 190 Observations with VisualVM
Results Comparison • G1 GC is able to reclaim max heap size • CMS is not able to do so • Lesser CPU utilization for G1 collection • G1 Heap goes to max size in three distinct jumps • CMS seems to gain max heap size in initial jump Parameters G1 GC CMS GC Time taken for execution 7 min 5 sec 7 min 56 sec Max CPU Usage 27.3% 70.2% Max GC Activity 2% 24% Max Heap Size 974 MB 974 MB Max Used Heap Size 763 MB 779 GB
Is G1 For You • Evaluate all other options before moving to G1 • Don’t need Low Latency • Use Parallel GC • Don’t need big heap • Use small heap and Parallel GC • Need big heap • Try CMS • If CMS not performing well => Tune it • If tuned CMS not performing well => Tune it further • If problem still persists => Check whether you require such a big heap and low pauses • Start using G1 • Test before deploying in production
References • JavaOne 2012 G1 Talk, Charlie Hunt, Monica Beckwith • http://www.oracle.com/webfolder/technetwork/tutoria • Poonam Bajaj’s blog • https://blogs.oracle.com/poonam/ • hotspot-gc-use mailing list
Thank You

Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector

  • 1.
    Java Garbage Collectors– Moving to Java7 Garbage-First (G1) Collector Gurpreet S. Sachdeva Aricent Group
  • 2.
  • 3.
    Memory Management • PerformanceTuning • Garbage Collector • JIT Compiler • Heap size
  • 4.
    GC Goals • MinimalFootprint • High Throughput • Responsiveness / Low Latency
  • 5.
    Generational Hypothesis • Mostobjects die young • Only a few live very long • Longer they live, more likely they live longer • Old objects rarely reference young objects
  • 6.
  • 7.
  • 8.
    CMS operations inYoung Generation (i) • Young Generation • 1 Eden and 2 Survivor Spaces • Old Generation • Compacted only at Full GC
  • 9.
    CMS operations inYoung Generation (ii) • Young Generation Collection • Stop the World Pause • Live objects from young generation moved to • Other survivor space • Old Generation
  • 10.
    CMS operations inYoung Generation (iii) • After Young Generation GC • Eden and 1 Survivor Space are empty • Objects promoted to old generation
  • 11.
    CMS operations inOld Generation (i) • Mark Phases • Initial Mark (STW) • Concurrent Mark • Remark (STW)
  • 12.
    CMS operations inOld Generation (ii) • Concurrent Sweeping Phase • Collects objects identified as unreachable during marking phases • In-place de-allocation of unreachable objects
  • 13.
    CMS operations inOld Generation (iii) • Resetting • All unmarked objects de-allocated • Prepare for next concurrent collection by clearing data structures
  • 14.
    CMS Challenges • Stopthe World Pause (Remark phase) • Very Large Heaps • Fragmentation • Hard to tune
  • 15.
    Introducing G1 • Concurrent •Refinement, Marking, Cleanup • Parallel • STW Pauses • Full GC is single threaded • Compacting
  • 16.
    G1 Goals • LowLatency • Better Predictability • Easy to use & tune • Move away from current situation of 3 different GC frameworks
  • 17.
    G1 Heap Overview •Single large contiguous space divided into fixed size regions (~ 2000) • No physical separation between young and old generation • Objects moved between regions during collections • Humongous Regions for large objects
  • 18.
    G1 - YoungGeneration GC • Live objects evacuated (copied/moved) to • One or more survivor regions • Old regions • STW Pause • Done in parallel with multiple threads • Eden size and survivor size calculated for next young GC cycle
  • 19.
    G1 - OldGeneration GC • Initial Marking Phase • Piggybacked on Young Generation GC • STW Pause
  • 20.
    G1 - OldGeneration GC • Concurrent Marking Phase • Calculates liveness information per region • Empty regions can be reclaimed easily (denoted as X)
  • 21.
    G1 - OldGeneration GC • Remark Phase • Completes marking of live objects in heap • Empty regions removed and reclaimed • STW Pause • Region liveness known for all other old generation regions
  • 22.
    G1 - OldGeneration GC • Copying/Cleanup Phase • Select regions with low liveness • Collect (some) during next Young GC
  • 23.
    G1 Old GenerationGC • After Copying/Cleanup Phase • Selected regions collected and compacted • Some garbage objects may be left in old generation regions
  • 24.
    Summary - G1Old Generation GC • Concurrent Marking Phase • Calculates liveness information per region, concurrently while the application is running • Identifies best regions for subsequent evacuation phases • No corresponding sweeping phase • Remark Phase • Different marking algorithm than CMS • Uses Snapshot-at-the-beginning (SATB) which is much faster than what was being used in CMS • Completely empty regions are reclaimed • Copying/Cleanup Phase • Young generation and Old generation reclaimed at the same time • Old generation regions selected based on their liveness
  • 25.
    G1 and CMSComparison Features G1 GC CMS GC Concurrent and Generational Yes Yes Releases Max Heap memory after usage Yes No Low Latency Yes Yes Throughput Higher Lower Compaction Yes No Predictability More Less Physical separation between Young and Old No Yes
  • 26.
    Footprint Overhead • Forthe same application size, as compared to CMS, the heap size is likely to be larger in G1 due to additional accounting data structures • Remembered Sets (RSets / RSet) • Track object references into a given region • Footprint overhead less than 5% • Caution • More inter-region references => Bigger Remembered Set • Large Remembered Set => Slow GC • Collection Sets (CSets / CSet) • Set of regions that will be collected in a GC • Footprint overhead less than 1%
  • 27.
    Command Line Options •-XX:+UseG1GC • Tells the JVM to use G1 Garbage Collector • -XX:MaxGCPauseMillis=200 • Sets target for the maximum GC pause time
  • 28.
    G1 GC TuningOptions (i) • Main goal is latency • If latency not a problem, then use Parallel GC • Related goal is simplified tuning • Most important tuning option • XX:MaxGCPauseMillis=200 (default value = 200ms) • Influences maximum amount of work per collection • Best effort only
  • 29.
    G1 GC TuningOptions (ii) • -XX:InitiatingHeapOccupancyPercentage=n • Trigger to start GC • Percent of entire heap not just old generation • -XX:G1OldCSetRegionLiveThresholdPercent=n • Threshold for region to be included in a Collection Set
  • 30.
    G1 GC TuningOptions (iii) • -XX:G1MixedGCCountTarget=n • How many Mixed GC / Concurrent Cycle • Precaution • Fixing young generation size (-Xmn) can cause PauseTimeTarget to be ignored • G1 no longer respects the pause time target • Even if heap expands, the young generation size is fixed
  • 31.
    G1 Logging (i) •Three different log levels • Log level as fine – Use -verbosegc (equivalent to -XX:+PrintGC) • Sample Output [GC pause (G1 Humongous Allocation) (young) (initial-mark) 24M- >21M(64M), 0.2349730 secs] [GC pause (G1 Evacuation Pause) (mixed) 66M->21M(236M), 0.1625268 secs] • Log level as finer – Use -XX:+PrintGCDetails • Average, Min, and Max time displayed for each phase • Root Scan, RSet Updating (with processed buffers information), RSet Scan, Object Copy, Termination (with number of attempts) • Also shows “other” time such as time spent choosing CSet, reference processing, reference enqueuing and freeing CSet • Shows the Eden, Survivors and Total Heap occupancies. • Sample Output [Ext Root Scanning (ms): Avg: 1.7 Min: 0.0 Max: 3.7 Diff: 3.7] [Eden: 818M(818M)->0B(714M) Survivors: 0B- >104M Heap: 836M(4096M)->409M(4096M)]
  • 32.
    G1 Logging (ii) •Log level as finest – Use -XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest • Like finer but includes individual worker thread information. • Sample Output [Ext Root Scanning (ms): 2.1 2.4 2.0 0.0 Avg: 1.6 Min: 0.0 Max: 2.4 Diff: 2.3] [Update RS (ms): 0.4 0.2 0.4 0.0 Avg: 0.2 Min: 0.0 Max: 0.4 Diff: 0.4] [Processed Buffers : 5 1 10 0 Sum: 16, Avg: 4, Min: 0, Max: 10, Diff: 10] • Determine Time – How time is displayed in GC logs XX:+PrintGCTimeStamps - Shows the elapsed time since the JVM started 1.729: [GC pause (young) 46M->35M(1332M), 0.0310029 secs] -XX:+PrintGCDateStamps - Adds a time of day prefix to each entry 2012-05-02T11:16:32.057+0200: [GC pause (young) 46M- >35M(1332M), 0.0317225 secs]
  • 33.
    G1 Logging Keywords(i) • Parallel Time - Overall elapsed time of the main parallel part of the pause • Worker Start – Timestamp at which the workers start • Note: The logs are ordered on thread id and are consistent on each entry 414.557: [GC pause (young), 0.03039600 secs] [Parallel Time: 22.9 ms] [GC Worker Start (ms): 7096.0 7096.0 7096.1 7096.1 706.1 7096.1 7096.1 7096.1 7096.2 7096.2 7096.2 7096.2 Avg: 7096.1, Min: 7096.0, Max: 7096.2, Diff: 0.2] • External Root Scanning - The time taken to scan the external root (e.g., things like system dictionary that point into the heap.) [Ext Root Scanning (ms): 3.1 3.4 3.4 3.0 4.2 2.0 3.6 3.2 3.4 7.7 3.7 4.4 Avg: 3.8, Min: 2.0, Max: 7.7, Diff: 5.7] • Update Remembered Set - Buffers that are completed but have not yet been processed by the concurrent refinement thread before the start of the pause have to be updated. • Time depends on density of the cards. The more cards, the longer it will take. [Update RS (ms): 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Processed Buffers : 26 0 0 0 0 0 0 0 0 0 0 0 Sum: 26, Avg: 2, Min: 0, Max: 26, Diff: 26]
  • 34.
    G1 Logging Keywords(ii) • Scanning Remembered Sets - Look for pointers that point into the Collection Set [Scan RS (ms): 0.4 0.2 0.1 0.3 0.0 0.0 0.1 0.2 0.0 0.1 0.0 0.0 Avg: 0.1, Min: 0.0, Max: 0.4, Diff: 0.3]F • Object Copy - The time that each individual thread spent copying and evacuating objects [Object Copy (ms): 16.7 16.7 16.7 16.9 16.0 18.1 16.5 16.8 16.7 12.3 16.4 15.7 Avg: 16.3, Min: 12.3, Max: 18.1, Diff: 5.8] • Termination Time - When a worker thread is finished with its particular set of objects to copy and scan, it enters the termination protocol. It looks for work to steal and once it's done with that work it again enters the termination protocol. Termination attempt counts all the attempts to steal work. [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 Sum: 12, Avg: 1, Min: 1, Max: 1, Diff: 0] • GC Worker End [GC Worker End (ms): 7116.4 7116.3 7116.4 7116.3 7116.4 7116.3 7116.4 7116.4 7116.4 7116.4 7116.3 7116.3 Avg: 7116.4, Min: 7116.3, Max: 7116.4, Diff: 0.1] • GC worker end time – Timestamp when the individual GC worker stops. • GC worker time – Time taken by individual GC worker thread.
  • 35.
    G1 Logging Keywords(iii) • GC Worker Other - The time (for each GC thread) that can't be attributed to the worker phases listed previously. Should be quite low. [GC Worker Other (ms): 2.6 2.6 2.7 2.7 2.7 2.7 2.7 2.8 2.8 2.8 2.8 2.8 Avg: 2.7, Min: 2.6, Max: 2.8, Diff: 0.2] • Clear CT - Time taken to clear the card table of RSet scanning meta-data [Clear CT: 0.6 ms] • Other - Time taken for various other sequential phases of the GC pause. [Other: 6.8 ms] • CSet - Time taken finalizing the set of regions to collect. Usually very small; slightly longer when having to select old [Choose CSet: 0.1 ms] • Ref Proc - Time spent processing soft, weak, etc. references deferred from the prior phases of the GC. [Ref Proc: 4.4 ms] • Ref Enq - Time spent placing soft, weak, etc. references on to the pending list. [Ref Enq: 0.1 ms] • Free CSet - Time spent freeing the set of regions that have just been collected, including their remembered sets [Free CSet: 2.0 ms]
  • 36.
    G1 Evacuation Failure •Promotion Failure when JVM runs out of heap regions during the GC • Indicated by “to-space overflow” in PrintGCDetails log • Very expensive operation
  • 37.
    Sample Application Test •Sample Application Create and add 190 Float Arrays into an Array List Each Float Array reserves 4MB of memory, i.e. 1 x 1024 x 1024 = 4 MB 4 MB x 190 = 760 MB After each iteration the arrays are released and application sleeps for some time Same steps are repeated certain number of times
  • 38.
    Observations for CMS •Command Line Arguments java -server -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:CMS.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:UsersgusachdeworkspaceMemorybin GCTest 190 • Observations with VisualVM
  • 39.
    Observations for G1 CommandLine Arguments java -server -XX:+UseG1GC -XX:+PrintGCDetails -XX: +PrintGCTimeStamps -Xloggc:G1GC.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:UsersgusachdeworkspaceMemorybin GCTest 190 Observations with VisualVM
  • 40.
    Results Comparison • G1GC is able to reclaim max heap size • CMS is not able to do so • Lesser CPU utilization for G1 collection • G1 Heap goes to max size in three distinct jumps • CMS seems to gain max heap size in initial jump Parameters G1 GC CMS GC Time taken for execution 7 min 5 sec 7 min 56 sec Max CPU Usage 27.3% 70.2% Max GC Activity 2% 24% Max Heap Size 974 MB 974 MB Max Used Heap Size 763 MB 779 GB
  • 41.
    Is G1 ForYou • Evaluate all other options before moving to G1 • Don’t need Low Latency • Use Parallel GC • Don’t need big heap • Use small heap and Parallel GC • Need big heap • Try CMS • If CMS not performing well => Tune it • If tuned CMS not performing well => Tune it further • If problem still persists => Check whether you require such a big heap and low pauses • Start using G1 • Test before deploying in production
  • 42.
    References • JavaOne 2012G1 Talk, Charlie Hunt, Monica Beckwith • http://www.oracle.com/webfolder/technetwork/tutoria • Poonam Bajaj’s blog • https://blogs.oracle.com/poonam/ • hotspot-gc-use mailing list
  • 43.

Editor's Notes

  • #7 Split the heap into regions Create new objects in Young Generation Move mature objects to Old Generation Different strategies for different regions
  • #8 Serial Collector Both Young and Old collections done serially Parallel Young generation collection done in parallel using multiple CPUs CMS Most of the garbage collection work done concurrently with the application threads. G1 Supported since 7u4 Server style garbage collector, targeted for multiprocessor machines with large heap size To replace CMS in the long term
  • #18 Single large contiguous space divided into fixed size regions (~ 2000) Region size chosen at startup (size 1 MB to 32 MB) No physical separation between young and old generation Not required to be contiguous A region may act as either eden, survivor(s) or old generation Objects moved between regions during collections Humongous Regions for large objects Multiple contiguous regions for large objects (> 50% region size) Collection not optimized!
  • #23 Collect (some) during next Young GC Number of old regions collected depends on liveness information, predicted time to evacuate the space and pause time target
  • #24 Some garbage objects may be left in old generation regions Regions with high liveness They may be collected later based on future liveness, pause time target and number of unused regions
  • #27 Remembered Sets (RSets / RSet) Track object references into a given region One per region Enables parallel and independent collection of a region No need to track whole heap to find references Footprint overhead less than 5% Caution More inter-region references => Bigger Remembered Set Large Remembered Set => Slow GC Collection Sets (CSets / CSet) Set of regions that will be collected in a GC Regions can be eden and survivor, and optionally after (concurrent) marking some old generation regions All live data in a CSet is evacuated (copied/moved) during the GC Footprint overhead less than 1%
  • #30 -XX:InitiatingHeapOccupancyPercentage=n Trigger to start GC Percent of entire heap not just old generation Automatic resizing of young generation has lower and upper bound of 20% and 80% of java heap, respectively Caution Too Low => Unnecessary GC overhead Too High => “Space Overflow” => Full GC -XX:G1OldCSetRegionLiveThresholdPercent=n Threshold for region to be included in a Collection Set Caution Too high => More aggressive collecting => More live objects to copy Too low => Wasting some heap
  • #31 -XX:G1MixedGCCountTarget=n How many Mixed GC / Concurrent Cycle Caution Too high => Unnecessary overhead Too low => Longer pauses
  • #37 Promotion Failure when JVM runs out of heap regions during the GC For either survivors and promoted objects Heap is already at maximum Indicated by “to-space overflow” in PrintGCDetails log Very expensive operation GC still has to continue Unsuccessfully copied objects have to be tenured in place Any updates to RSets of regions in CSet have to be regenerated Prevention Increase heap size More marking threads