Performance Modeling and Simulation for Accumulo Applications Adam Fuchs CTO, Sqrrl Data, Inc. October 16, 2017
© 2017 Sqrrl Data, Inc. All rights reserved. 2 Modeling and Simulation with Accumulo Goals: Optimize performance in code and configuration Discover unexpected factors affecting performance and scalability Reduce risk when extrapolating performance to large customer sizes (10+PB) over long periods of time (years) Types used by Sqrrl: Micro-benchmarking Drive scoped-down tests with actual code Iterate with code changes to optimize Analog Simulator Track scheduling and side-effects of operations without actually running them Simulate large clusters and long periods of time quickly on a single machine Predictive model validation Predict performance and scalability mathematically (e.g. spreadsheet model) Full-scope tests instrumented with measurements to validate model
© 2017 Sqrrl Data, Inc. All rights reserved. 3 Technique #1: Micro-Benchmarking Optimize the crap out of operations in tight loops Isolate components to simplify interpretation of results Example: RFileSortingOutputFormat Produce RFiles from randomly ordered Key/Value Pairs Input Data (Split) Record Reader Mapper Context (Collector) In-Memory Buffer (~1GB) Sorted RFile1 Sorted RFileN Sort / Spill ... Micro-Benchmarking Candidates
© 2017 Sqrrl Data, Inc. All rights reserved. 4 Micro Benchmarking Setup Process: 1. Benchmark 2. Compare 3. Rewrite 4. Repeat UsefulTools: • JMH (Java Microbenchmarking Harness) • Sampling profiler, like JVisualVM main() { startTime = now() for(i = 0; i < many; i++) { do.thing() } endTime = now() print(endTime - startTime) }
© 2017 Sqrrl Data, Inc. All rights reserved. 5 Insights GainedThrough Micro-Benchmarking Serialization/deserialization of Keys can be very expensive Holding onto objects makes Java sad Sort boils down to compare and swap minimize the number of bytes to compare for sorting via hierarchical organization minimize the cost of swapping Other attempted optimization: Quicksort vs. merge sort Merge sort has tighter worst-case bound Quicksort faster in practice (probably dominated by cache fetches) Quicker to merge in ideas like this with micro-benchmarking Next steps: 1. Code with best known algorithms and design patterns 2. Measure again
© 2017 Sqrrl Data, Inc. All rights reserved. 6 RFileSortingOutputFormat Sort Buffer Design Key/Value Buffer tuple(CQ,Vis) value length value Metadata Buffers ... tuple(CQ,Vis) value length value ... Row ColFam Length Offset Length Offset Length Offset Length Offset Length Offset Length Offset ... ... Row ColFam Length Offset Length Offset Length Offset Length Offset Length Offset Length Offset ... ...
© 2017 Sqrrl Data, Inc. All rights reserved. 7 Micro-Benchmarking: RFileSortingOutputFormat Performance 10 million random keys sorted and written in 17-20 seconds per thread Up to ~588,000 key/value pairs per second per thread On one 20-core node, scaled up to ~7,000,000 key/value pairs per second generated, sorted, and written to disk
© 2017 Sqrrl Data, Inc. All rights reserved. 8 Technique #2: Analog Simulation Test bigger than you can afford to test! Example: Optimize Query and Compaction I/O Explore performance of various key designs and configuration Optimize Write-Amplification Factor (WAF) due to long-term compactions Optimize query latency, tied to number of RFiles per tablet at any time Input Organization and Maintenance Output
© 2017 Sqrrl Data, Inc. All rights reserved. 9 Document-Distributed IndexTable Design Document Query Ingest	Process Query	Process Document StoreTable Forward Index Inverted Index Forward Index Inverted Index Forward Index Inverted Index Forward Index Inverted Index Forward Index Inverted Index
© 2017 Sqrrl Data, Inc. All rights reserved. 10 Tablet Files Compaction Algorithm File Size 1. Pick the maximal set of files F that satisfies: sum(f) / max(f) > Ratio 2. Compact F into a new file f’ Candidate Set F 4.92 1.52 = 3.24 > 3.0 þ
© 2017 Sqrrl Data, Inc. All rights reserved. 11 LSMTree RFile RFile RFile RFile RFile RFile Compact! Compact! Time
© 2017 Sqrrl Data, Inc. All rights reserved. 12 Simulator Loop main() { ... while (queue not empty) { event = queue.pop(); event.process(); } } Event { final long eventTime; process() { modState(); genEvents(); ... } } PriorityQueue<Event> (ordered by time)
© 2017 Sqrrl Data, Inc. All rights reserved. 13 Simulator Design Simulate: Ingest Triggered compactions Timed events for: Bulk load of new files Compaction start, end Measure: Files to tablet counts over time Total input Total bytes Designs to explore: Key design, partitioning Compaction strategies Ingest Event Generator Compaction Simulator Ingest Simulator Tablet State Model Measurement Apparatus Event Queue
© 2017 Sqrrl Data, Inc. All rights reserved. 14 Example Simulator Execution • Inputs from default configs, micro-benchmarking, and other simulations • Intermediate measurements give hints about possible issues • Simplified optimization scores focus on small number of dimensions • 7 days and 20GB simulated in under a second Input Parameters: Compaction ratio: 4.0 Compaction rate: 2643.0 bytes/ms Max compacted file size: 1073741824000 bytes Last input time: 604800000 ms Num files per drop: 1.1 Bytes per drop: 10485760 File size stddev: 1048576 File drop period: 300000 ms Results: Last Input Time: 604800000 ms End time: 604800000 ms Time compacting: 35179342 ms Compaction portion: 0.058166901455026454 Active tablets supported per compaction thread: 17 Max concurrent files: 32 Max files per compaction: 20 Max compaction size: 16211724213 bytes Max compaction time: 6133834 ms Final file count: 7 Total Events: 2486 Total Compactions: 469 Total MajC Writes: 92979631364 bytes Total Input Writes: 21204489777 bytes Final Size (GB): 19.748220012523234 Write Amplification Factor: 5.384903024870363 Average files over time: 8.666971808862433
© 2017 Sqrrl Data, Inc. All rights reserved. 15 Simulator Insights Different compression codec for small vs. large compactions Limit maximum file size for compaction WAF optimized with roughly 1-week time partitioning in our application Other recent uses of simulators: Query readahead threadpool improvements Secondary compaction threadpool Bulk index creation partitioning optimization
© 2017 Sqrrl Data, Inc. All rights reserved. 16 Technique #3: Predictive ModelValidation Use knowledge of application to build predictive performance model Avoid “explaining” observations Minimize model fitting parameters Instrument whole-stack application with key performance indicators Run performance tests with enough variety to fit parameter and validate model Extra credit: include some factors that aren’t model parameters Validates non-relevance Prioritize by “most likely relevant” Analyze the difference between the prediction and the observations Refine model Iterate
© 2017 Sqrrl Data, Inc. All rights reserved. 17 Predictive ModelValidation: Example Query: filtered graph search implemented in Accumulo iterators Accumulo’s iterators support both next() and seek() Time to call next() and seek() is relatively uniform (big hand-wave here) Prediction: I/O-bound query runtime proportional to total next and seek calls Modeling performance ó accounting for next() and seek() calls
SPREADSHEETS FOR BIG DATA!
© 2017 Sqrrl Data, Inc. All rights reserved. 19 Predictive ModelValidation: Spreadsheets! Model Version 7:
© 2017 Sqrrl Data, Inc. All rights reserved. 20 Predictive ModelValidation: Insights Some iterators are seeking/scanning more than optimal Logic tweak cuts seeks in half Total of all scans and seeks is way more than expected Re-ordering filter hierarchy allows for fewer, bigger seeks Next steps: 1. Improve the code 2. Re-validate with new code
© 2017 Sqrrl Data, Inc. All rights reserved. 21 Predictive ModelValidation: Improvements 25X Predicted Improvement New Iterator Model Version 12:
© 2017 Sqrrl Data, Inc. All rights reserved. 22 Takeaways Performance is not good until you prove it! Don’t expect performance to be optimal the first time Measure early Leave time for optimization Measure often Micro-benchmarking, simulation, and predictive model validation: Learn ’em Use ’em Love ’em Ping me if you want help getting organized or getting started with code and spreadsheets

Performance modeling and simulation for accumulo applications

  • 1.
    Performance Modeling and Simulationfor Accumulo Applications Adam Fuchs CTO, Sqrrl Data, Inc. October 16, 2017
  • 2.
    © 2017 SqrrlData, Inc. All rights reserved. 2 Modeling and Simulation with Accumulo Goals: Optimize performance in code and configuration Discover unexpected factors affecting performance and scalability Reduce risk when extrapolating performance to large customer sizes (10+PB) over long periods of time (years) Types used by Sqrrl: Micro-benchmarking Drive scoped-down tests with actual code Iterate with code changes to optimize Analog Simulator Track scheduling and side-effects of operations without actually running them Simulate large clusters and long periods of time quickly on a single machine Predictive model validation Predict performance and scalability mathematically (e.g. spreadsheet model) Full-scope tests instrumented with measurements to validate model
  • 3.
    © 2017 SqrrlData, Inc. All rights reserved. 3 Technique #1: Micro-Benchmarking Optimize the crap out of operations in tight loops Isolate components to simplify interpretation of results Example: RFileSortingOutputFormat Produce RFiles from randomly ordered Key/Value Pairs Input Data (Split) Record Reader Mapper Context (Collector) In-Memory Buffer (~1GB) Sorted RFile1 Sorted RFileN Sort / Spill ... Micro-Benchmarking Candidates
  • 4.
    © 2017 SqrrlData, Inc. All rights reserved. 4 Micro Benchmarking Setup Process: 1. Benchmark 2. Compare 3. Rewrite 4. Repeat UsefulTools: • JMH (Java Microbenchmarking Harness) • Sampling profiler, like JVisualVM main() { startTime = now() for(i = 0; i < many; i++) { do.thing() } endTime = now() print(endTime - startTime) }
  • 5.
    © 2017 SqrrlData, Inc. All rights reserved. 5 Insights GainedThrough Micro-Benchmarking Serialization/deserialization of Keys can be very expensive Holding onto objects makes Java sad Sort boils down to compare and swap minimize the number of bytes to compare for sorting via hierarchical organization minimize the cost of swapping Other attempted optimization: Quicksort vs. merge sort Merge sort has tighter worst-case bound Quicksort faster in practice (probably dominated by cache fetches) Quicker to merge in ideas like this with micro-benchmarking Next steps: 1. Code with best known algorithms and design patterns 2. Measure again
  • 6.
    © 2017 SqrrlData, Inc. All rights reserved. 6 RFileSortingOutputFormat Sort Buffer Design Key/Value Buffer tuple(CQ,Vis) value length value Metadata Buffers ... tuple(CQ,Vis) value length value ... Row ColFam Length Offset Length Offset Length Offset Length Offset Length Offset Length Offset ... ... Row ColFam Length Offset Length Offset Length Offset Length Offset Length Offset Length Offset ... ...
  • 7.
    © 2017 SqrrlData, Inc. All rights reserved. 7 Micro-Benchmarking: RFileSortingOutputFormat Performance 10 million random keys sorted and written in 17-20 seconds per thread Up to ~588,000 key/value pairs per second per thread On one 20-core node, scaled up to ~7,000,000 key/value pairs per second generated, sorted, and written to disk
  • 8.
    © 2017 SqrrlData, Inc. All rights reserved. 8 Technique #2: Analog Simulation Test bigger than you can afford to test! Example: Optimize Query and Compaction I/O Explore performance of various key designs and configuration Optimize Write-Amplification Factor (WAF) due to long-term compactions Optimize query latency, tied to number of RFiles per tablet at any time Input Organization and Maintenance Output
  • 9.
    © 2017 SqrrlData, Inc. All rights reserved. 9 Document-Distributed IndexTable Design Document Query Ingest Process Query Process Document StoreTable Forward Index Inverted Index Forward Index Inverted Index Forward Index Inverted Index Forward Index Inverted Index Forward Index Inverted Index
  • 10.
    © 2017 SqrrlData, Inc. All rights reserved. 10 Tablet Files Compaction Algorithm File Size 1. Pick the maximal set of files F that satisfies: sum(f) / max(f) > Ratio 2. Compact F into a new file f’ Candidate Set F 4.92 1.52 = 3.24 > 3.0 þ
  • 11.
    © 2017 SqrrlData, Inc. All rights reserved. 11 LSMTree RFile RFile RFile RFile RFile RFile Compact! Compact! Time
  • 12.
    © 2017 SqrrlData, Inc. All rights reserved. 12 Simulator Loop main() { ... while (queue not empty) { event = queue.pop(); event.process(); } } Event { final long eventTime; process() { modState(); genEvents(); ... } } PriorityQueue<Event> (ordered by time)
  • 13.
    © 2017 SqrrlData, Inc. All rights reserved. 13 Simulator Design Simulate: Ingest Triggered compactions Timed events for: Bulk load of new files Compaction start, end Measure: Files to tablet counts over time Total input Total bytes Designs to explore: Key design, partitioning Compaction strategies Ingest Event Generator Compaction Simulator Ingest Simulator Tablet State Model Measurement Apparatus Event Queue
  • 14.
    © 2017 SqrrlData, Inc. All rights reserved. 14 Example Simulator Execution • Inputs from default configs, micro-benchmarking, and other simulations • Intermediate measurements give hints about possible issues • Simplified optimization scores focus on small number of dimensions • 7 days and 20GB simulated in under a second Input Parameters: Compaction ratio: 4.0 Compaction rate: 2643.0 bytes/ms Max compacted file size: 1073741824000 bytes Last input time: 604800000 ms Num files per drop: 1.1 Bytes per drop: 10485760 File size stddev: 1048576 File drop period: 300000 ms Results: Last Input Time: 604800000 ms End time: 604800000 ms Time compacting: 35179342 ms Compaction portion: 0.058166901455026454 Active tablets supported per compaction thread: 17 Max concurrent files: 32 Max files per compaction: 20 Max compaction size: 16211724213 bytes Max compaction time: 6133834 ms Final file count: 7 Total Events: 2486 Total Compactions: 469 Total MajC Writes: 92979631364 bytes Total Input Writes: 21204489777 bytes Final Size (GB): 19.748220012523234 Write Amplification Factor: 5.384903024870363 Average files over time: 8.666971808862433
  • 15.
    © 2017 SqrrlData, Inc. All rights reserved. 15 Simulator Insights Different compression codec for small vs. large compactions Limit maximum file size for compaction WAF optimized with roughly 1-week time partitioning in our application Other recent uses of simulators: Query readahead threadpool improvements Secondary compaction threadpool Bulk index creation partitioning optimization
  • 16.
    © 2017 SqrrlData, Inc. All rights reserved. 16 Technique #3: Predictive ModelValidation Use knowledge of application to build predictive performance model Avoid “explaining” observations Minimize model fitting parameters Instrument whole-stack application with key performance indicators Run performance tests with enough variety to fit parameter and validate model Extra credit: include some factors that aren’t model parameters Validates non-relevance Prioritize by “most likely relevant” Analyze the difference between the prediction and the observations Refine model Iterate
  • 17.
    © 2017 SqrrlData, Inc. All rights reserved. 17 Predictive ModelValidation: Example Query: filtered graph search implemented in Accumulo iterators Accumulo’s iterators support both next() and seek() Time to call next() and seek() is relatively uniform (big hand-wave here) Prediction: I/O-bound query runtime proportional to total next and seek calls Modeling performance ó accounting for next() and seek() calls
  • 18.
  • 19.
    © 2017 SqrrlData, Inc. All rights reserved. 19 Predictive ModelValidation: Spreadsheets! Model Version 7:
  • 20.
    © 2017 SqrrlData, Inc. All rights reserved. 20 Predictive ModelValidation: Insights Some iterators are seeking/scanning more than optimal Logic tweak cuts seeks in half Total of all scans and seeks is way more than expected Re-ordering filter hierarchy allows for fewer, bigger seeks Next steps: 1. Improve the code 2. Re-validate with new code
  • 21.
    © 2017 SqrrlData, Inc. All rights reserved. 21 Predictive ModelValidation: Improvements 25X Predicted Improvement New Iterator Model Version 12:
  • 22.
    © 2017 SqrrlData, Inc. All rights reserved. 22 Takeaways Performance is not good until you prove it! Don’t expect performance to be optimal the first time Measure early Leave time for optimization Measure often Micro-benchmarking, simulation, and predictive model validation: Learn ’em Use ’em Love ’em Ping me if you want help getting organized or getting started with code and spreadsheets