Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulzar and Matteo Interlandi

Debugging Big Data Analytics in Apache Spark with BigDebug MATTEO INTERLANDI - MUHAMMAD ALI GULZAR

Open Source Data-Intensive Scalable Computing (DISC) Platforms: Hadoop MapReduce and Spark ◦ Severe lack of debugging support in these systems ◦ Programs (i.e., queries, jobs) are batch executed / black boxes So what to do? ◦ Trial and error debugging on subsample ◦ Post-mortem analysis of error logs ◦ Analyze physical view of the execution (a job id, failed node, etc). Debugging Cloud Computing Programs

BigDebug Project Overview Titian: Data Provenance for Fine- Grained Tracing [PVLDB 2016] Vega: Incremental Computation for Interactive Debugging [SoCC 2016] Collaboration with Tyson Condie, Miryung Kim, and Todd Millstein BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark [ICSE 2016] Automated Debugging in Data Intensive Scalable Computing Systems [Under Submission]

Simulated Breakpoint1 BigDebug: Interactive Debugger Features

Simulated Breakpoint1 Guarded Watchpoint2 BigDebug: Interactive Debugger Features

Simulated Breakpoint1 Guarded Watchpoint2 Crash Culprit Identification3 BigDebug: Interactive Debugger Features

Simulated Breakpoint1 Guarded Watchpoint2 Crash Culprit Identification3 Backward Tracing4 BigDebug: Interactive Debugger Features $>Crash inducing input records: 9K23 Cruz TX 1440023645 2FSD Cruz KS 1440026456 9909 Cruz KS 1440023768

Simulated Breakpoint1 Guarded Watchpoint2 Crash Culprit Identification3 Backward Tracing4 Automated Fault Localization5 BigDebug: Interactive Debugger Features Test Fails Split Test PassesTest Fails … .. .

Feature 1: Simulated Breakpoint Stage 0 Stage 1 map simulated breakpoint groupByKey map map

Feature 1: Simulated Breakpoint Simulated breakpoint enables user to inspect intermediate program state without pausing the computation

Feature 2: On Demand Guarded Watchpoint map simulated breakpoint map map groupBy Key Stage 0 Stage 1 watch point

Feature 2: On Demand Guarded Watchpoint A user can inspect intermediate data using a guard and also update it on the fly

Feature 3: Crash Culprit Identification and Remediation map simulated breakpoint map Stage 0 Stage 1 watch point map groupBy Key

Feature 3: Crash Culprit Identification and Remediation A user can use BigDebug to identify the crashing records and remediate from the failure

Feature 4: Backward Tracing Data Provenance enables users to identify crash inducing inputs records

Feature 5: Automated Fault Localization Goal: Given a test function and set of failing results ◦ Identify the minimum set of input records that can reproduce the failure After all is said and done more is said than done than done done more is said After all is said and After all is said and done more is said than done (After,1) (all,1) (is,1) (said,1) (and,1) (done,1) (more,1) (is,1) (said,1) (than,1) (done,1) (After,1) (all,1) (is,2) (and,1) (more,1) (said,2) (than,1) (done,2) After,1 all, 1 is, 2 and, 1 more, 1 said, 2 done, 2 than, 1 FlatMap Map ReduceByKey CollectTextFile Task 1 Task 3 Task 1 Task 3 Task 2Task 2 Stage 2Stage 1

Feature 5: Automated Fault Localization Goal: Given a test function and set of failing results ◦ Identify the minimum set of input records that can reproduce the failure ◦ We apply data provenance and delta debugging in tandem After all is said and done more is said than done than done done more is said After all is said and After all is said and done more is said than done (After,1) (all,1) (is,1) (said,1) (and,1) (done,1) (more,1) (is,1) (said,1) (than,1) (done,1) (After,1) (all,1) (is,2) (and,1) (more,1) (said,2) (than,1) (done,2) After,1 all, 1 is, 2 and, 1 more, 1 said, 2 done, 2 than, 1 FlatMap Map ReduceByKey CollectTextFile Task 1 Task 3 Task 1 Task 3 Task 2Task 2 Stage 2Stage 1

We apply data provenance and delta debugging [Zeller et al. ] in tandem Feature 5: Automated Fault Localization

We apply data provenance and delta debugging [Zeller et al. ] in tandem Feature 5: Automated Fault Localization Test

We apply data provenance and delta debugging [Zeller et al. ] in tandem Feature 5: Automated Fault Localization Test Split

We apply data provenance and delta debugging [Zeller et al. ] in tandem Feature 5: Automated Fault Localization Test Split In average BigDebug is able to localize faults within 63% of the original job running time

We apply data provenance and delta debugging [Zeller et al. ] in tandem Feature 5: Automated Fault Localization …. Test Split In average BigDebug is able to localize faults within 63% of the original job running time In average BigDebug is able to localize faults within 63% of the original job running time

Running Example val log = "s3n://xcr:wJY@ws/logs/enroll.log" val text_file = sc.textFile(log) text_file .map{line=>(line.split()[2],line.split()[3])} .map{t => (t._1 , getYears(t._2))} .groupByKey() .map(v => (v._1 , average(v._2))) .collect() 1 Michael Sophomore 03/12/1996 2 Justin Freshman 05/01/1998 .. .. ..

Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulzar and Matteo Interlandi

More Related Content

What's hot

Similar to Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulzar and Matteo Interlandi

More from Databricks

Recently uploaded

Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulzar and Matteo Interlandi