Java Performance Tweaks

29 June 2016 Java Performance Tweaks How I brought TopicViewer’s analysis runtime from 2:10 down to 18 seconds Repo: https://github.com/jimbethancourt/topic-viewer

1 Use a Faster Library 1 Avoid the Unnecessary 3 Looping 9 Parallelization 13 Finally 17 Contents

2 Use a Faster Library • Swapping out off-the-shelf components is the first step you’ll want to take and will likely provide the most immediate gain. • Replacing Colt with ParallelColt was relatively straightforward • No migration guides, but the fact that both libraries were open source and classes had relatively similar names helped. • Reduced the runtime from 2:10 to 0:55 Migrating from Colt to ParallelColt

3 Use a Faster Library 1 Avoid the Unnecessary 3 Looping 9 Parallelization 13 Finally 17

4 Avoid the Unnecessary Don’t use Objects when primitives will provide the same functionality Old clustersTree.findSet(new Vertex(i)).index == rootIndex New clustersTree .findSet(i).index == rootIndex Avoid Unnecessary Object Creation

5 Avoid the Unnecessary Use Object arrays instead of Maps when you primarily read values and can index via numbers: In DisjointTree Original Map<Vertex, Vertex> vertexMapping = new HashMap<Vertex, Vertex>(); findSet(Vertex v) { Vertex mapped = vertexMapping.get(v); …} New Vertex[] vertexMapping = new Vertex[numDocuments]; findSet (int v) { vertexMapping[v.index] = v; …} Brought runtime down to 42 seconds (if I remember correctly) between using arrays and avoiding object creation. Use Object Arrays instead of Maps

6 Avoid the Unnecessary • Avoid Unnecessary calls when there are collections that already perform the operations for you • Constantly calling collection.contains() added up – I was surprised • Avoid complex, object-heavy equals() operations when possible • Splitting work into producer + consumer and leveraging natural properties of collections cut 15 seconds off of runtime • See updateCorrelationMatrix() method and Pair.equals() Avoid Unnecessary Operations

7 Avoid the Unnecessary • Use method parameter values when looping heavily • Method parameters live on the processor stack, not the heap • As a result, accessing them is much faster • Shaved off a second or two Original: private int[] getLeastDissimilarPair(DoubleMatrix2D correlationMatrix, boolean force) { for (int i = 0; i < this.numDocuments; i++) for (int j = 0; j < this.numDocuments; j++) New: private int[] getLeastDissimilarPair(DoubleMatrix2D correlationMatrix, int numDocuments, boolean force) { for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) Use Method Parameter Values

8 Avoid the Unnecessary Original: for (int i = 0; i < this.numDocuments; i++) for (int j = 0; j < this.numDocuments; j++) { double similarity = correlationMatrix.get(i, j); if (i < j && similarity != Double.NEGATIVE_INFINITY && similarity > bestSimilarity) New: for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) { if (i < j) { double similarity = correlationMatrix.get(i, j); if (similarity != Double.NEGATIVE_INFINITY && similarity > bestSimilarity) Make Absolutely Sure you Need I/O

10 Looping for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) if (i < j) Is faster than for (int j = 0; j < numDocuments; j++) for (int i = 0; i < j; i++) Likely faster due to reuse of register values and JITting Don’t Get Fancy With Inner Loops

11 Looping It was cheaper to re-run the operation instead of caching in a map: private Set<Integer> getClusterSet(int rootIndex, int numDocuments) { Set<Integer> clusterSet = new LinkedHashSet<>(); for (int i = 0; i < numDocuments; i++) if (this.clustersTree.findSet(i).index == rootIndex) clusterSet.add(i); return clusterSet; } Don't Cache Where it Won't Help

12 Looping https://dzone.com/articles/java-collection-performance was pure gold! Use the Fastest Collection for your Dominant Operation

14 Parallelization What do you do when you need to parallelize something like this? for (int i = 0; i < correlationMatrix2D.rows()-1; i++) {…} for (int i = 0; i < correlationMatrix2D.rows()-1; i++) { rowsForUpdating.add(i); } rowsForUpdating.parallelStream().forEach(i -> {…}); Shaved 10 seconds off of runtime. If possible, populate rowsForUpdating only once if quantity is fixed Need to Parallelize an Indexed For Loop?

15 Parallelization When organizing parallelization of processing, perform the parallelization at the highest / outermost level possible. This will reduce the cost of context-switching for the CPU Nesting parallelization was detrimental to runtime when I attempted to do so. Processing time took longer and CPU utilization was higher. Parallelize (only) at the Highest Level Possible

16 Parallelization There is no ConcurrentHashSet data structure in java.util.concurrent  Leverage the fact that ConcurrentHashMap’s key is a HashSet: ConcurrentHashMap<Pair, Integer> calculatedClusters = new ConcurrentHashMap<>(); … calculatedClusters.put(newPair, 0); A ConcurrentHashMap has 16 or more segments and each can be written to at the same time. Use a ConcurrentHashMap for the Key

17 • When performing a large volume of writes, use a non-blocking data structure. • br.ufmg.aserg.topicviewer.util.Double2DMatrix is pretty amazing once I realize how it worked • Each matrix row has its own channel • Each cell in the row has a calculated offset • As a result, there is no contention Non-Blocking Data Structures for Heavy I/O

19 Finally -XX:+UseG1GC Saved 2 seconds of runtime! Use the G1 Garbage Collector

20 Finally Java Mission Control is your new best friend -XX:+UnlockCommercialFeatures -XX:+FlightRecorder Profile Profile Profile

21 • 18 seconds • Over 8X performance improvement Final Runtime

Java Performance Tweaks

More Related Content

What's hot

Viewers also liked

Similar to Java Performance Tweaks

More from Jim Bethancourt

Recently uploaded

Java Performance Tweaks