DEV Community

Cover image for Measure twice, write twice
Benjamin Trent
Benjamin Trent

Posted on

Measure twice, write twice

Continually measure

When writing code, it's tempting to get complex. Especially when you are concerned about performance. But, you should write it simply first. Measure its performance. With that data, then write your improvements. Using your performance measurements as guides.

My failure to measure

I was writing a statistic gathering service Java. The statistics were very simple counts of items seen overtime. The constraints of the system were:

  • high throughput
  • Thread safe

Java has a great class for thread-safe incremental statistics LongAdder. It's fantastic for fast writes but LongAdder#reset is not thread-safe. I needed to be able to grab the latest full count and then reset. In comes ReadWriteLock! ReadWriteLock#readLock can be used for all the increment actions and then ReadWriteLock#writeLock for grabbing the latest total. The resulting service ended up looking like this:

public static class Accumulator { private final LongAdder statsAccumulator = new LongAdder(); private final ReadWriteLock readWriteLock = new ReentrantReadWriteLock(true); public Accumulator inc() { readWriteLock.readLock().lock(); try { this.statsAccumulator.increment(); return this; } finally { readWriteLock.readLock().unlock(); } } public InferenceStats currentStatsAndReset() { readWriteLock.writeLock().lock(); try { Stats stats = currentStats(Instant.now()); this.statsAccumulator.reset(); return stats; } finally { readWriteLock.writeLock().unlock(); } } public InferenceStats currentStats(Instant timeStamp) { return new Stats(statsAccumulator.longValue(), timeStamp); } } 

I thought I had the perfect high throughput, thread-safe, statistics gathering class. I mean, it doesn't ever block the writes unless we grab the currentStatsAndReset. The common hot-path of inc() is normally not blocking.

Perfect.

But I didn't do one thing. I never measured performance of this implementation against a dead-simple synchronized version. 🤦

My overconfidence is my weakness

Start simple, then measure

Here is the simple version:

public static class Accumulator { private long statsAccumulator = 0L; public synchronized Accumulator inc() { this.statsAccumulator++; return this; } public synchronized InferenceStats currentStatsAndReset() { Stats stats = currentStats(Instant.now()); this.statsAccumulator = 0L; return stats; } public InferenceStats currentStats(Instant timeStamp) { return new Stats(statsAccumulator, timeStamp); } } 

It doesn't use any of those classes designed for low contentioning locking. Just a plain 'ol synchronized methods. Surely, all that use of synchronized would increase contention on write.

Another developer on my team called me out on this. He was curious to see if my version was truly faster. I knew I was right, so I wrote a JMH benchmark to prove him wrong.

The results were not on my side:

 Benchmark Mode Cnt Score Error Units MultiThreadedStatsAccumulatorBenchmark.rwAccumulator_1 avgt 20 5957.399 ± 112.892 us/op MultiThreadedStatsAccumulatorBenchmark.rwAccumulator_128 avgt 20 7480921.908 ± 255364.820 us/op MultiThreadedStatsAccumulatorBenchmark.syncAccumulator_1 avgt 20 421.662 ± 2.616 us/op MultiThreadedStatsAccumulatorBenchmark.syncAccumulator_128 avgt 20 792910.927 ± 52219.577 us/op 

My complex version (rwAccumulator) was almost 10x SLOWER. The simple, fully synchronized version (syncAccumulator) kicked my butt. Both with 1, and 128 separate threads!

more facepalm

Measure, Measure, Measure

My lessons were humbling reminders.

  • Always write the simple way first.
  • If you think it should be faster, change it and measure it.
  • Measure, measure, measure

Top comments (1)

Collapse
 
benwtrent profile image
Benjamin Trent