A D I G I TA L C O M M E R C E C O N S U LTA N C Y
Basics of JVM Tuning ... because out-of-the-box is often not enough Vladislav Gangan Vice President of Engineering Tacit Knowledge, Moldova
AGENDA • Basics of JVM memory management • Optimal starting settings for tuning • Garbage collection algorithms • Debugging the garbage collection process • Putting theory in practice
RATIONALE BEHIND THE NEED OF JVM TUNING
TWO AREAS OF MEMORY - STACK • scratch space for thread execution • easy to track internally • any method call results in block allocation • local vars • bookkeeping data • always LIFO allocation
TWO AREAS OF MEMORY - STACK • scratch space for thread execution • easy to track internally m2 vars • any method call results in block allocation m1 vars • local vars • bookkeeping data • always LIFO allocation
TWO AREAS OF MEMORY - STACK • scratch space for thread execution • easy to track internally free • any method call results in block allocation m1 vars • local vars • bookkeeping data • always LIFO allocation
TWO AREAS OF MEMORY - STACK • scratch space for thread execution • easy to track internally m3 vars • any method call results in block allocation m1 vars • local vars • bookkeeping data • always LIFO allocation
TWO AREAS OF MEMORY - STACK • scratch space for thread execution • easy to track internally m4 vars • any method call results in m3 vars block allocation • local vars m1 vars • bookkeeping data • always LIFO allocation
TWO AREAS OF MEMORY - HEAP • dynamic & random memory allocation • much more complex to handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
TWO AREAS OF MEMORY - HEAP • dynamic & random memory allocation o1 • much more complex to handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
TWO AREAS OF MEMORY - HEAP • dynamic & random memory allocation o1 • much more complex to o2 handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
TWO AREAS OF MEMORY - HEAP • dynamic & random memory allocation o1 • much more complex to o2 o3 handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
HEAP STRUCTURE Eden S0 S1 Tenured Permanent
GENERATIONAL OBJECT FLOW
GENERATIONAL OBJECT FLOW
GENERATIONAL OBJECT FLOW
GENERATIONAL OBJECT FLOW
GENERATIONAL OBJECT FLOW Minor collection
GARBAGE COLLECTION ELIGIBILITY Reachability test - can an object be reached from any live pointer in the application?
GARBAGE COLLECTION TYPES • Minor collection • operates on young space • low impact on performance • Major collection • operates on entire heap • very costly performance wise • some algorithms are “stop-the-world” activity
JVM TUNING PROCESS while (iAmNotSatisfied) { size = defineMinMaxHeapSize(); ratios = fineTuneGenerationsRatios(); alg = selectAppropriateGcAlgotrithm(); loadTestTheApplication(size, ratios, alg); iAmNotSatisfied = analyzeStatistics(); }
HEAP SIZE CONFIG OPTIONS -Xms - initial heap size -Xmx - max/final heap size java -Xms123m -Xmx456m MyApp
HEAP SIZE DEFAULTS Non-server class machine (or 32-bit Heap setting Server class machine Windows) or prior to to J2SE 5.0 1/64 of -Xms 4 MB physical (up to 1 GB) 1/4 of physical -Xmx 64 MB (up to 1 GB)
HEAP SIZE DEFAULTS Non-server class fo r Heap setting te machine (or 32-bit a s Windows) or prior to Server class machine u p q p to J2SE 5.0 e a of d l 1/64 a e i4n ev physical -Xms s MB l (up to 1 GB) e e im ris t p n r te te 64 MB 1/4 of physical f-Xmx O en (up to 1 GB)
FINDING MAX HEAP SIZE • observe application under consistent load • then add supplementary 25-30% to peak value • do not exceed 2 GB value (so say the experts)
FINDING INITIAL HEAP SIZE
FINDING INITIAL HEAP SIZE assign it equal to the max size, and here’s why:
FINDING INITIAL HEAP SIZE assign it equal to the max size, and here’s why: • the heap will grow in the long run anyway
FINDING INITIAL HEAP SIZE assign it equal to the max size, and here’s why: • the heap will grow in the long run anyway • baking in the overhead of heap growth/ resizing is viewed as irresponsible by the experts
CAVEATS ON 32-BIT SYSTEMS • requires contiguous unfragmented chunk of memory • 32-bit systems may not be able to allocate the desired size • 2-3 GB per process (Windows) • 3 GB per process (Linux) • some amount of memory is eaten up by OS and background processes
WHAT ARE THE OPTIONS?
SIZING HEAP GENERATIONS -XX:NewSize=123m -XX:MaxNewSize=123m -XX:SurvivorRatio=6
APPLICATION CONSIDERATIONS FOR HEAP GENERATIONS SIZES • reserve plenty of memory for young generation if creating lots of short-lived objects • favor tenured generation if making use of lots of long-lived objects
OPTIMAL SIZE FOR YOUNG GENERATION [⅓; ½)
WHAT ABOUT THAT SURVIVORRATIO FLAG?
WHAT ABOUT THAT SURVIVORRATIO FLAG? Eden S0 S1 Tenured Permanent
WHAT ABOUT THAT SURVIVORRATIO FLAG? • defaults to 1/34 of young generation • high risk of short-lived objects to migrate to tenured generation very fast • best if kept between [1/6; 1/12] of new space • -XX:SurvivorRatio=6 => 1/8
GARBAGE COLLECTION ALGORITHMS • serial • parallel • concurrent
SERIAL COLLECTOR • suitable only for single processor machines • relatively efficient • default on non-server class machines • -XX:+UseSerialGC
SERIAL COLLECTOR Application GC Application Threads Stop Threads
PARALLEL COLLECTOR • takes advantage of multiple CPUs/cores • performs minor collections in parallel • significantly improves performance in systems with lots of minor collections • default on server class machines • -XX:+UseParallelGC
PARALLEL COLLECTOR • major collections are still single threaded • -XX:+UseParallelOldGc • as of J2SE 5.0 update 6 • allows parallel compaction which reduces heap fragmentation • allows major collections in parallel
PARALLEL COLLECTOR Application GC Application Threads Stop Threads
CONCURRENT COLLECTOR • performs most of its work concurrently • the goal is to keep GC pauses short • single GC thread that runs simultaneously with application threads • -XX:+UseConcMarkSweepGC
CONCURRENT COLLECTOR App App App Initial Threads + Threads + Remark Threads Mark Concurrent Concurrent Mark Sweep
WHICH COLLECTOR WORKS WELL IN MY CASE? Collector Best for: Single processor machines + small Serial heaps Multiprocessor machines + high Parallel throughput (batch processing apps) Fast processor machines + minimized Concurrent response times (web apps)
GATHERING HEAP BEHAVIOR STATISTICS • -verbose:gc • -XX:+PrintGCDetails • -XX:+PrintHeapAtGC • -Xloggc:/path/to/gc/log/file
EXAMPLE java -verbose:gc MyApp 33.357: [GC 25394K->18238K(130176K), 0.0148471 secs] 33.811: [Full GC 22646K->18501K(130176K), 0.1954419 secs]
EXAMPLE java -verbose:gc -XX:+PrintGCDetails MyApp 19.834: [GC 19.834: [DefNew: 9088K->960K(9088K), 0.0126103 secs] 16709K->9495K(130112K), 0.0126960 secs] 20.424: [Full GC 20.424: [Tenured: 8535K->10032K(121024K), 0.1342573 secs] 13847K->10032K(130112K), [Perm : 12287K->12287K(12288K)], 0.1343551 secs]
EXAMPLE java -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC MyApp 18.645: [GC {Heap before GC invocations=16: Heap def new generation! total 9088K, used 9088K [0x02a20000, 0x033f0000, 0x05180000) eden space 8128K, 100% used [0x02a20000, 0x03210000, 0x03210000) from space 960K, 100% used [0x03210000, 0x03300000, 0x03300000) to! space 960K,! 0% used [0x03300000, 0x03300000, 0x033f0000) tenured generation!total 121024K, used 7646K [0x05180000, 0x0c7b0000, 0x22a20000) the space 121024K,! 6% used [0x05180000, 0x058f7870, 0x058f7a00, 0x0c7b0000) compacting perm gen total 11264K, used 11202K [0x22a20000, 0x23520000, 0x26a20000) the space 11264K, 99% used [0x22a20000, 0x23510938, 0x23510a00, 0x23520000) No shared spaces configured.
ANALYSIS TOOLS • custom scripts • feed the output to spreadsheet processor & build charts • GCViewer - http://www.tagtraum.com/gcviewer.html • Gchisto - http://java.net/projects/gchisto/ • VisualVM - http://visualvm.java.net • a host of other tools (commercial & freeware)
Let’s practice
RATIONALE BEHIND THE NEED OF JVM TUNING
Q&A
BIBLIOGRAPHY
BIBLIOGRAPHY
BIBLIOGRAPHY
BIBLIOGRAPHY
THANK YOU

Basics of JVM Tuning

  • 1.
    A D IG I TA L C O M M E R C E C O N S U LTA N C Y
  • 2.
    Basics of JVMTuning ... because out-of-the-box is often not enough Vladislav Gangan Vice President of Engineering Tacit Knowledge, Moldova
  • 3.
    AGENDA • Basicsof JVM memory management • Optimal starting settings for tuning • Garbage collection algorithms • Debugging the garbage collection process • Putting theory in practice
  • 4.
    RATIONALE BEHIND THENEED OF JVM TUNING
  • 5.
    TWO AREAS OFMEMORY - STACK • scratch space for thread execution • easy to track internally • any method call results in block allocation • local vars • bookkeeping data • always LIFO allocation
  • 6.
    TWO AREAS OFMEMORY - STACK • scratch space for thread execution • easy to track internally m2 vars • any method call results in block allocation m1 vars • local vars • bookkeeping data • always LIFO allocation
  • 7.
    TWO AREAS OFMEMORY - STACK • scratch space for thread execution • easy to track internally free • any method call results in block allocation m1 vars • local vars • bookkeeping data • always LIFO allocation
  • 8.
    TWO AREAS OFMEMORY - STACK • scratch space for thread execution • easy to track internally m3 vars • any method call results in block allocation m1 vars • local vars • bookkeeping data • always LIFO allocation
  • 9.
    TWO AREAS OFMEMORY - STACK • scratch space for thread execution • easy to track internally m4 vars • any method call results in m3 vars block allocation • local vars m1 vars • bookkeeping data • always LIFO allocation
  • 10.
    TWO AREAS OFMEMORY - HEAP • dynamic & random memory allocation • much more complex to handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
  • 11.
    TWO AREAS OFMEMORY - HEAP • dynamic & random memory allocation o1 • much more complex to handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
  • 12.
    TWO AREAS OFMEMORY - HEAP • dynamic & random memory allocation o1 • much more complex to o2 handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
  • 13.
    TWO AREAS OFMEMORY - HEAP • dynamic & random memory allocation o1 • much more complex to o2 o3 handle • can result in memory leaks if objects not destroyed properly • shielded from the developer by the JVM
  • 14.
    HEAP STRUCTURE Eden S0 S1 Tenured Permanent
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    GENERATIONAL OBJECT FLOW Minor collection
  • 20.
    GARBAGE COLLECTION ELIGIBILITY Reachabilitytest - can an object be reached from any live pointer in the application?
  • 21.
    GARBAGE COLLECTION TYPES • Minor collection • operates on young space • low impact on performance • Major collection • operates on entire heap • very costly performance wise • some algorithms are “stop-the-world” activity
  • 22.
    JVM TUNING PROCESS while (iAmNotSatisfied) { size = defineMinMaxHeapSize(); ratios = fineTuneGenerationsRatios(); alg = selectAppropriateGcAlgotrithm(); loadTestTheApplication(size, ratios, alg); iAmNotSatisfied = analyzeStatistics(); }
  • 23.
    HEAP SIZE CONFIGOPTIONS -Xms - initial heap size -Xmx - max/final heap size java -Xms123m -Xmx456m MyApp
  • 24.
    HEAP SIZE DEFAULTS Non-server class machine (or 32-bit Heap setting Server class machine Windows) or prior to to J2SE 5.0 1/64 of -Xms 4 MB physical (up to 1 GB) 1/4 of physical -Xmx 64 MB (up to 1 GB)
  • 25.
    HEAP SIZE DEFAULTS Non-server class fo r Heap setting te machine (or 32-bit a s Windows) or prior to Server class machine u p q p to J2SE 5.0 e a of d l 1/64 a e i4n ev physical -Xms s MB l (up to 1 GB) e e im ris t p n r te te 64 MB 1/4 of physical f-Xmx O en (up to 1 GB)
  • 26.
    FINDING MAX HEAPSIZE • observe application under consistent load • then add supplementary 25-30% to peak value • do not exceed 2 GB value (so say the experts)
  • 27.
  • 28.
    FINDING INITIAL HEAPSIZE assign it equal to the max size, and here’s why:
  • 29.
    FINDING INITIAL HEAPSIZE assign it equal to the max size, and here’s why: • the heap will grow in the long run anyway
  • 30.
    FINDING INITIAL HEAPSIZE assign it equal to the max size, and here’s why: • the heap will grow in the long run anyway • baking in the overhead of heap growth/ resizing is viewed as irresponsible by the experts
  • 31.
    CAVEATS ON 32-BITSYSTEMS • requires contiguous unfragmented chunk of memory • 32-bit systems may not be able to allocate the desired size • 2-3 GB per process (Windows) • 3 GB per process (Linux) • some amount of memory is eaten up by OS and background processes
  • 32.
    WHAT ARE THEOPTIONS?
  • 33.
    SIZING HEAP GENERATIONS -XX:NewSize=123m -XX:MaxNewSize=123m -XX:SurvivorRatio=6
  • 34.
    APPLICATION CONSIDERATIONS FORHEAP GENERATIONS SIZES • reserve plenty of memory for young generation if creating lots of short-lived objects • favor tenured generation if making use of lots of long-lived objects
  • 35.
    OPTIMAL SIZE FORYOUNG GENERATION [⅓; ½)
  • 36.
    WHAT ABOUT THATSURVIVORRATIO FLAG?
  • 37.
    WHAT ABOUT THATSURVIVORRATIO FLAG? Eden S0 S1 Tenured Permanent
  • 38.
    WHAT ABOUT THATSURVIVORRATIO FLAG? • defaults to 1/34 of young generation • high risk of short-lived objects to migrate to tenured generation very fast • best if kept between [1/6; 1/12] of new space • -XX:SurvivorRatio=6 => 1/8
  • 39.
    GARBAGE COLLECTION ALGORITHMS • serial • parallel • concurrent
  • 40.
    SERIAL COLLECTOR • suitable only for single processor machines • relatively efficient • default on non-server class machines • -XX:+UseSerialGC
  • 41.
    SERIAL COLLECTOR Application GC Application Threads Stop Threads
  • 42.
    PARALLEL COLLECTOR • takes advantage of multiple CPUs/cores • performs minor collections in parallel • significantly improves performance in systems with lots of minor collections • default on server class machines • -XX:+UseParallelGC
  • 43.
    PARALLEL COLLECTOR • major collections are still single threaded • -XX:+UseParallelOldGc • as of J2SE 5.0 update 6 • allows parallel compaction which reduces heap fragmentation • allows major collections in parallel
  • 44.
    PARALLEL COLLECTOR Application GC Application Threads Stop Threads
  • 45.
    CONCURRENT COLLECTOR •performs most of its work concurrently • the goal is to keep GC pauses short • single GC thread that runs simultaneously with application threads • -XX:+UseConcMarkSweepGC
  • 46.
    CONCURRENT COLLECTOR App App App Initial Threads + Threads + Remark Threads Mark Concurrent Concurrent Mark Sweep
  • 47.
    WHICH COLLECTOR WORKSWELL IN MY CASE? Collector Best for: Single processor machines + small Serial heaps Multiprocessor machines + high Parallel throughput (batch processing apps) Fast processor machines + minimized Concurrent response times (web apps)
  • 48.
    GATHERING HEAP BEHAVIORSTATISTICS • -verbose:gc • -XX:+PrintGCDetails • -XX:+PrintHeapAtGC • -Xloggc:/path/to/gc/log/file
  • 49.
    EXAMPLE java -verbose:gc MyApp 33.357: [GC 25394K->18238K(130176K), 0.0148471 secs] 33.811: [Full GC 22646K->18501K(130176K), 0.1954419 secs]
  • 50.
    EXAMPLE java -verbose:gc -XX:+PrintGCDetails MyApp 19.834: [GC 19.834: [DefNew: 9088K->960K(9088K), 0.0126103 secs] 16709K->9495K(130112K), 0.0126960 secs] 20.424: [Full GC 20.424: [Tenured: 8535K->10032K(121024K), 0.1342573 secs] 13847K->10032K(130112K), [Perm : 12287K->12287K(12288K)], 0.1343551 secs]
  • 51.
    EXAMPLE java -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC MyApp 18.645: [GC {Heap before GC invocations=16: Heap def new generation! total 9088K, used 9088K [0x02a20000, 0x033f0000, 0x05180000) eden space 8128K, 100% used [0x02a20000, 0x03210000, 0x03210000) from space 960K, 100% used [0x03210000, 0x03300000, 0x03300000) to! space 960K,! 0% used [0x03300000, 0x03300000, 0x033f0000) tenured generation!total 121024K, used 7646K [0x05180000, 0x0c7b0000, 0x22a20000) the space 121024K,! 6% used [0x05180000, 0x058f7870, 0x058f7a00, 0x0c7b0000) compacting perm gen total 11264K, used 11202K [0x22a20000, 0x23520000, 0x26a20000) the space 11264K, 99% used [0x22a20000, 0x23510938, 0x23510a00, 0x23520000) No shared spaces configured.
  • 52.
    ANALYSIS TOOLS • custom scripts • feed the output to spreadsheet processor & build charts • GCViewer - http://www.tagtraum.com/gcviewer.html • Gchisto - http://java.net/projects/gchisto/ • VisualVM - http://visualvm.java.net • a host of other tools (commercial & freeware)
  • 53.
  • 54.
    RATIONALE BEHIND THENEED OF JVM TUNING
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.