Debugging	Big	Data	Analytics	in Apache	Spark	with	BigDebug MATTEO	INTERLANDI	- MUHAMMAD	ALI	GULZAR
Open	Source	Data-Intensive	Scalable	Computing	(DISC) Platforms:	Hadoop	MapReduce	and	Spark ◦ Severe	lack	of	debugging support	in	these	systems ◦ Programs	(i.e.,	queries,	jobs)	are	batch	executed	/	black	boxes So	what	to	do? ◦ Trial	and	error debugging	on	subsample ◦ Post-mortem	analysis of	error	logs ◦ Analyze	physical	view of	the	execution	(a	job	id,	failed	node,	etc). Debugging	Cloud	Computing	Programs
BigDebug	Project	Overview Titian:	Data	Provenance	for	Fine- Grained	Tracing [PVLDB	2016] Vega:	Incremental	Computation	for Interactive	Debugging [SoCC	2016] Collaboration	with	Tyson	Condie,	Miryung	Kim, and	Todd	Millstein BigDebug:	Debugging	Primitives for	Interactive	Big	Data	Processing in	Spark [ICSE	2016] Automated	Debugging	in	Data Intensive	Scalable	Computing Systems [Under	Submission]
Simulated	Breakpoint1 BigDebug:	Interactive	Debugger	Features
Simulated	Breakpoint1 Guarded	Watchpoint2 BigDebug:	Interactive	Debugger	Features
Simulated	Breakpoint1 Guarded	Watchpoint2 Crash	Culprit	Identification3 BigDebug:	Interactive	Debugger	Features
Simulated	Breakpoint1 Guarded	Watchpoint2 Crash	Culprit	Identification3 Backward	Tracing4 BigDebug:	Interactive	Debugger	Features $>Crash inducing input records: 9K23 Cruz TX 1440023645 2FSD Cruz KS 1440026456 9909 Cruz KS 1440023768
Simulated	Breakpoint1 Guarded	Watchpoint2 Crash	Culprit	Identification3 Backward	Tracing4 Automated	Fault	Localization5 BigDebug:	Interactive	Debugger	Features Test Fails Split Test PassesTest Fails … .. .
Feature	1: Simulated Breakpoint Stage 0 Stage 1 map simulated breakpoint groupByKey map map
Feature	1: Simulated Breakpoint Simulated	breakpoint	enables	user	to	inspect	intermediate program	state	without	pausing	the	computation
Feature	2:	On	Demand Guarded	Watchpoint map simulated breakpoint map map groupBy Key Stage 0 Stage 1 watch point
Feature	2:	On	Demand Guarded	Watchpoint A	user	can	inspect	intermediate	data	using	a	guard and	also	update	it	on	the	fly
Feature	3:	Crash	Culprit Identification	and Remediation map simulated breakpoint map Stage 0 Stage 1 watch point map groupBy Key
Feature	3:	Crash	Culprit Identification	and Remediation A	user	can	use	BigDebug	to	identify	the	crashing	records and	remediate	from	the	failure
Feature	4:	Backward	Tracing Data	Provenance enables	users	to	identify	crash	inducing inputs	records
Feature	5:	Automated	Fault	Localization Goal:	Given	a	test	function and	set	of	failing	results ◦ Identify	the	minimum	set	of	input	records	that	can	reproduce	the	failure After all is said and done more is said than done than done done more is said After all is said and After all is said and done more is said than done (After,1) (all,1) (is,1) (said,1) (and,1) (done,1) (more,1) (is,1) (said,1) (than,1) (done,1) (After,1) (all,1) (is,2) (and,1) (more,1) (said,2) (than,1) (done,2) After,1 all, 1 is, 2 and, 1 more, 1 said, 2 done, 2 than, 1 FlatMap Map ReduceByKey CollectTextFile Task 1 Task 3 Task 1 Task 3 Task 2Task 2 Stage 2Stage 1
Feature	5:	Automated	Fault	Localization Goal:	Given	a	test	function and	set	of	failing	results ◦ Identify	the	minimum	set	of	input	records	that	can	reproduce	the	failure ◦ We	apply	data	provenance and	delta	debugging	in	tandem After all is said and done more is said than done than done done more is said After all is said and After all is said and done more is said than done (After,1) (all,1) (is,1) (said,1) (and,1) (done,1) (more,1) (is,1) (said,1) (than,1) (done,1) (After,1) (all,1) (is,2) (and,1) (more,1) (said,2) (than,1) (done,2) After,1 all, 1 is, 2 and, 1 more, 1 said, 2 done, 2 than, 1 FlatMap Map ReduceByKey CollectTextFile Task 1 Task 3 Task 1 Task 3 Task 2Task 2 Stage 2Stage 1
We	apply	data	provenance	and	delta	debugging [Zeller	et	al.	]	in	tandem Feature	5:	Automated	Fault	Localization
We	apply	data	provenance	and	delta	debugging [Zeller	et	al.	]	in	tandem Feature	5:	Automated	Fault	Localization Test
We	apply	data	provenance	and	delta	debugging [Zeller	et	al.	]	in	tandem Feature	5:	Automated	Fault	Localization Test Split
We	apply	data	provenance	and	delta	debugging [Zeller	et	al.	]	in	tandem Feature	5:	Automated	Fault	Localization Test Split In	average	BigDebug	is	able	to	localize	faults	within	63%	of the	original	job	running	time
We	apply	data	provenance	and	delta	debugging [Zeller	et	al.	]	in	tandem Feature	5:	Automated	Fault	Localization …. Test Split In	average	BigDebug	is	able	to	localize	faults	within	63%	of the	original	job	running	time In	average	BigDebug	is	able	to	localize	faults	within	63%	of the	original	job	running	time
Demo
Running Example val log = "s3n://xcr:wJY@ws/logs/enroll.log" val text_file = sc.textFile(log) text_file .map{line=>(line.split()[2],line.split()[3])} .map{t => (t._1 , getYears(t._2))} .groupByKey() .map(v => (v._1 , average(v._2))) .collect() 1 Michael Sophomore 03/12/1996 2 Justin Freshman 05/01/1998 .. .. ..

Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulzar and Matteo Interlandi