1 Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads Ahsan Javed Awan EMJD-DC (KTH-UPC) (https://www.kth.se/profile/ajawan/) Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC), Vladimir Vlassov(KTH)
2 Motivation Why should we care about architecture support? *Taken from Babak's slides Data Growing Faster Than Technology
3 Motivation Cont... Our GoalOur Goal Improve the node level performance through architecture support *Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/ Phoenix ++, Metis, Ostrich, etc.. Hadoop, Spark, Flink, etc..
4 Our Approach ● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award). ● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA – Limited to batch processing workloads only – Does not consider the velocity aspect of big data – Experiments are based on older version of Spark. What are the major performance bottlenecks??
5 Our Approach ● Does micro-architectural performance remains consistent across batch and stream processing workloads ? ● How Data-frames micro-architecturally compare to RDDs ? ● How data velocity affect the micro-architectural performance ? What are the remaining questions??
6 Progress Meeting 12-12-14 Which Scale-out Framework ? [Picture Courtesy: Amir H. Payberah] ● Tuning of Spark internal Parameters ● Tuning of JVM Parameters (Heap size etc..) ● Micro-architecture Level Analysis using Hardware Performance Counters.
7 Our Approach Which benchmarks?
8 Our Hardware Configuration Which Machine ? Hyper Threading and Turbo-boost are disabled Intel's Ivy Bridge Server
9 Does micro-architectural performance remains consistent ? Stream processing is micro-architecturally similar to batch processing in Spark
10 Cont.. Stream processing is micro-architecturally similar to batch processing in Spark
11 Cont.. Streaming workloads with similar Spark transformations have different micro-architectural behavior
12 Cont.. Streaming workloads with similar Spark transformations have different micro-architectural behavior
13 Cont.. Streaming workloads with similar Spark transformations have different micro-architectural behavior
14 Cont.. Workload Spark Transformation Input data rate Window size (s) Working Set with 2s sampling interval WWc FlatMap, Map, ReduceByKeyAndWindow 10^4 30 15 x 10^4 CSpc FlatMap, Map, CountByValueAndWindow 10^4 10 5 x 10^4 CErpz FlatMap, Map, Window, GroupByKey 10^4 30 15 x 10^4 CAuC FlatMap, Map, Window, GroupByKey, Count 10^4 10 5 x 10^4 Tpt FlatMap, ReduceByKeyAndWindow, Transform 10^1 60 30 x 10^1 Micro-batch size determines the micro-architectural behavior of stream processing workloads with similar Spark transformations
15 Do Dataframes perform better than RDDs at micro-architectural level? DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles 25% less BW consumption10% less starvation of execution resources Dataframes have better micro-architectural performance than RDDs
16 How Data Velocity affect micro-architectural performance? Better CPU utilization at higher data velocity
17 Cont.. Higher instruction retirement at higher data velocity Higher L1-Bound stalls at higher data velocity Less starvation at higher data velocity Higher BW consumption at higher velocity
18 Our Approach Conclusion ● Batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only. ● Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs. ● If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved.
19 THANK YOU
20 Our Approach List of Papers ● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award). ● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA . ● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads. (accepted to BDCloud 2016) ● Node Architecture Implications for In-Memory Data Analytics in Scale- in Clusters (accepted to IEEE BDCAT 2016) ● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission).

Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

  • 1.
    1 Micro-architectural Characterization of ApacheSpark on Batch and Stream Processing Workloads Ahsan Javed Awan EMJD-DC (KTH-UPC) (https://www.kth.se/profile/ajawan/) Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC), Vladimir Vlassov(KTH)
  • 2.
    2 Motivation Why should wecare about architecture support? *Taken from Babak's slides Data Growing Faster Than Technology
  • 3.
    3 Motivation Cont... Our GoalOur Goal Improvethe node level performance through architecture support *Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/ Phoenix ++, Metis, Ostrich, etc.. Hadoop, Spark, Flink, etc..
  • 4.
    4 Our Approach ● Performancecharacterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award). ● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA – Limited to batch processing workloads only – Does not consider the velocity aspect of big data – Experiments are based on older version of Spark. What are the major performance bottlenecks??
  • 5.
    5 Our Approach ● Doesmicro-architectural performance remains consistent across batch and stream processing workloads ? ● How Data-frames micro-architecturally compare to RDDs ? ● How data velocity affect the micro-architectural performance ? What are the remaining questions??
  • 6.
    6 Progress Meeting 12-12-14 WhichScale-out Framework ? [Picture Courtesy: Amir H. Payberah] ● Tuning of Spark internal Parameters ● Tuning of JVM Parameters (Heap size etc..) ● Micro-architecture Level Analysis using Hardware Performance Counters.
  • 7.
  • 8.
    8 Our Hardware Configuration WhichMachine ? Hyper Threading and Turbo-boost are disabled Intel's Ivy Bridge Server
  • 9.
    9 Does micro-architectural performanceremains consistent ? Stream processing is micro-architecturally similar to batch processing in Spark
  • 10.
    10 Cont.. Stream processing ismicro-architecturally similar to batch processing in Spark
  • 11.
    11 Cont.. Streaming workloads withsimilar Spark transformations have different micro-architectural behavior
  • 12.
    12 Cont.. Streaming workloads withsimilar Spark transformations have different micro-architectural behavior
  • 13.
    13 Cont.. Streaming workloads withsimilar Spark transformations have different micro-architectural behavior
  • 14.
    14 Cont.. Workload Spark TransformationInput data rate Window size (s) Working Set with 2s sampling interval WWc FlatMap, Map, ReduceByKeyAndWindow 10^4 30 15 x 10^4 CSpc FlatMap, Map, CountByValueAndWindow 10^4 10 5 x 10^4 CErpz FlatMap, Map, Window, GroupByKey 10^4 30 15 x 10^4 CAuC FlatMap, Map, Window, GroupByKey, Count 10^4 10 5 x 10^4 Tpt FlatMap, ReduceByKeyAndWindow, Transform 10^1 60 30 x 10^1 Micro-batch size determines the micro-architectural behavior of stream processing workloads with similar Spark transformations
  • 15.
    15 Do Dataframes performbetter than RDDs at micro-architectural level? DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles 25% less BW consumption10% less starvation of execution resources Dataframes have better micro-architectural performance than RDDs
  • 16.
    16 How Data Velocityaffect micro-architectural performance? Better CPU utilization at higher data velocity
  • 17.
    17 Cont.. Higher instruction retirementat higher data velocity Higher L1-Bound stalls at higher data velocity Less starvation at higher data velocity Higher BW consumption at higher velocity
  • 18.
    18 Our Approach Conclusion ● Batchprocessing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only. ● Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs. ● If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved.
  • 19.
  • 20.
    20 Our Approach List ofPapers ● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award). ● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA . ● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads. (accepted to BDCloud 2016) ● Node Architecture Implications for In-Memory Data Analytics in Scale- in Clusters (accepted to IEEE BDCAT 2016) ● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission).