Scaling stream data	pipelines Flavio	Junqueira,	Pravega - Dell	EMC Till Rohrmann,	Data	Artisans
Motivation Flink	Forward	- San	Francisco,	2018 2
Flink	Forward	- San	Francisco,	2018 Social	networks Online	shopping Streams ahoy! Stream	of	user	events • Status	updates • Online	transactions 3
Flink	Forward	- San	Francisco,	2018 Social	networks Online	shopping Server	monitoring Stream	of	user	events • Status	updates • Online	transactions Stream	of	server	events • CPU,	memory,	disk	utilization Streams ahoy! 4
Flink	Forward	- San	Francisco,	2018 Social	networks Online	shopping Server	monitoring Sensors	(IoT) Stream	of	user	events • Status	updates • Online	transactions Stream	of	server	events • CPU,	memory,	disk	utilization Stream	of	sensor	events • Temperature	samples • Samples	from	radar	and	image	sensors	in	cars Streams ahoy! 5
Workload cycles and	seasonal spikes Flink	Forward	- San	Francisco,	2018 6 Daily cycles NYC	Yellow	Taxi	Trip	Records,	March	2015 http://www.nyc.gov/html/tlc/html/about/trip _record_data.shtml Seasonal spikes https://www.slideshare.net/iwmw/building- highly-scalable-web-applications/7- Seasonal_Spikes
Workload cycles and	spikes Flink	Forward	- San	Francisco,	2018 7 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Seasonal	spikes 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 1:00 3:00 5:00 7:00 9:00 11:00 13:00 15:00 17:00 19:00 21:00 23:00 Daily	cycles 0 2 4 6 8 10 12 14 Weekly	cycles Unplanned
Overprovisioning… what if we don’t want to overprovision? Flink	Forward	- San	Francisco,	2018 8
Event processing Flink	Forward	- San	Francisco,	2018 9 Processor 1Source Source emits 2 events/second Processor processes 3	events/second Append-only Log Colors represent event keys
Event processing Flink	Forward	- San	Francisco,	2018 10 Source Processor processes 3	events/second Processor 1 Source emits 2 events/second Append-only Log Colors represent event keys
Event processing Flink	Forward	- San	Francisco,	2018 11 Source ü Source rate increases ü New	rate:	4 events/second ü Processor still processes 3 events/second ü Can’t keep up	with the source rate Processor 1 Append-only Log Colors represent event keys
Event processing Flink	Forward	- San	Francisco,	2018 12 Source ü Source rate increases ü New	rate:	4 events/second Processor 1 Append-only Log Processor 2 ü Add a	second processor ü Each processor processes 3 events/second ü Can	keep up	with the rate Colors represent event keys
Event processing Flink	Forward	- San	Francisco,	2018 13 Source ü Source rate increases ü New	rate:	4 events/second Processor 1 Append-only Log Processor 2 ü Add a	second processor ü Each processor processes 3 events/second ü Can	keep up	with the rate Problem:	Per-key order
Event processing Flink	Forward	- San	Francisco,	2018 14 Source Processor 1 Processor 2 ü Source rate increases ü New	rate:	4 events/second ü Add a	second processor ü Each processor processes 3 events/second ü Can	keep up	with the rate Split	the input	and add processors Append-only Log
Event processing Flink	Forward	- San	Francisco,	2018 15 Source Processor 1 Processor 2 ü Source rate increases ü New	rate:	4 events/second ü Add a	second processor ü Each processor processes 3 events/second ü Can	keep up	with the rate Split	the input	and add processors Append-only Log Problem:	Per-key order
Event processing Flink	Forward	- San	Francisco,	2018 16 Source Processor 1 Processor 2 ü Source rate increases ü New	rate:	4 events/second ü Add a	second processor ü Each processor processes 3 events/second ü Can	keep up	with the rate Split	the input	and add processors Processor 2	only starts once	earlier events have been processed
Flink	Forward	- San	Francisco,	2018 17 What about the order of events? What happens if the rate increases again? What if it drops?
Scaling in Pravega Flink	Forward	- San	Francisco,	2018 18
Pravega • Storing data streams • Young project, under active development • Open source http://pravega.io http://github.com/pravega/pravega 19Flink	Forward	- San	Francisco,	2018
Flink	Forward	- San	Francisco,	2018 Time PresentRecent Past Distant Past Anatomy of a stream 20
Flink	Forward	- San	Francisco,	2018 Messaging Pub-sub Bulk	store Time PresentRecent Past Distant Past Anatomy of a stream 21
Flink	Forward	- San	Francisco,	2018 Time PresentRecent Past Distant Past Anatomy of a stream 22 Pravega
Flink	Forward	- San	Francisco,	2018 Time PresentRecent Past Distant Past Anatomy of a stream Unbounded amount	of	data Ingestion	rate might	vary 23 Pravega
Pravega aims to be a stream store able to: • Store stream data permanently • Preserve order • Accommodate unbounded streams • Adapt to varying workloads automatically • Low-latency from append to read Flink	Forward	- San	Francisco,	2018 24
Pravega and	Streams …..	01110110	01100001	01101100 …..	01001010	01101111	01101001 Pravega 01000110 01110110 Append Read 01000110 01110110 Flink	Forward	- San	Francisco,	2018 Ingest	stream	data Process	stream	data 25
Pravega and	Streams 01000110 01110110 Append Read Flink	Forward	- San	Francisco,	2018 26 Event writer Event writer Event reader Event reader Group • Load	balance • Grow and	shrink Pravega Ingest	stream	data Process	stream	data
Segments	in	Pravega Flink	Forward	- San	Francisco,	2018 01000111 01110110 11000110 01000111 01110110 11000110 Pravega Stream Composition	of Segment: • Stream	unit • Append	only • Sequence	of	bytes 27
Parallelism Flink	Forward	- San	Francisco,	2018 28
Segments	in	Pravega Pravega 01000110 01110110 Segments Append Read 01000110 01110110 01101111 01101001 01101001 01101111 Segments • Segments	are	sequences	of	bytes • Use	routing	keys	to	determine	segment Flink	Forward	- San	Francisco,	2018 〈key,	01101001	〉 Routing key …..	01110110	01100001	01101100 …..	01001010	01101111	01101001 29
Segments can be sealed Flink	Forward	- San	Francisco,	2018 30
Segments	in	Pravega …..	01110110	01100001	01101100 …..	01101001 01110110 01001010 Pravega 01000110 01110110 Segments Append Read 01000110 01110110 01101111 01101001 01101001 01101111 Segments Once sealed, a segment can’t be appended to any longer. Flink	Forward	- San	Francisco,	2018 E.g.,	ad	clicks 31
How is sealing segments useful? Flink	Forward	- San	Francisco,	2018 32
Segments	in	Pravega Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 0110111101101111 01000110 01101111 Stream Compose	to	form	a	stream Flink	Forward	- San	Francisco,	2018 33
Segments	in	Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 0110111101101111 01000110 01101111 Stream Compose	to	form	a	stream • Each	segment	can	live	in	a	different	server • Not	limited	to	the	capacity	of	a	single	server • Unbounded	streams Flink	Forward	- San	Francisco,	2018 00101111 01101001 34 Pravega
Segments	in	Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 01101111 01000110 01101111 Stream Compose	to	form	a	stream 01101111 Flink	Forward	- San	Francisco,	2018 35 Pravega
Stream scaling Flink	Forward	- San	Francisco,	2018 36
01000110 Scaling	a	stream …..	01110110	01100001	01101100 01000110 • Stream	has	one segment 1 …..	01110110	01100001	01101100 • Seal	current segment • Create	new	ones 2 01000110 01000110 • Say	input	load	has	increased • Need	more	parallelism • Auto	or	manual	scaling Flink	Forward	- San	Francisco,	2018 37
Routing key	space 0.0 1.0 Time Split Split Merge 0.5 0.75 Segment	1 Segment	2 Segment	3 Segment	4 Segment	5 Segment	6 t0 t1 t2 Flink	Forward	- San	Francisco,	2018 38
Routing key	space 0.0 1.0 Time 0.5 0.75 Segment	1 Segment	2 Segment	3 Segment	4 Segment	5 Segment	6 t0 t1 t2 Key	ranges	are	not	statically assigned	to	segments Flink	Forward	- San	Francisco,	2018 39 Split Split Merge
Flink	Forward	- San	Francisco,	2018 40
Daily	Cycles Peak rate is 10x higher than lowest rate 4:00	AM 9:00	AM NYC	Yellow	Taxi	Trip	Records,	March	2015 http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Pravega Auto	Scaling Merge Split
Source:	Virtual	cluster	- Nautilus	Platform Flink	Forward	- San	Francisco,	2018 43
Source:	Virtual	cluster	- Nautilus	PlatformScale	up Scale	down Flink	Forward	- San	Francisco,	2018 44
How do I control scaling? Flink	Forward	- San	Francisco,	2018 45
Scaling policies • Configured on a	per	stream basis • Specifies a	policy for the stream • Policies • Fixed • Set	of	segments is fixed • Bytes	per	second • Scales up	and	down according to	volume of	data • Target	data	rate • Events per	second • Scales up	and	down according to	volume of	events • Target	event rate Flink	Forward	- San	Francisco,	2018 46
Auto-Scaling:	Triggering a	scaling event • By byte	and	event rates • Target	T	per	segment • Reports every 2	minutes ü 2-min	rate (2M) ü 5-min	rate (5M) ü 10-min	rate (10M) ü 20-min	rate (20M) Flink Forward	- San	Francisco,	2018 47 Scale up x x + 2	min x + 4 min x + 6	min time • Scaling down ∧ 2M,	5M,	10M	<	T ∧ 20M <	T /	2 2M	=	60 5M	=	56 10M	=	46 T	=	50 2M	=	60 5M	=	60 10M	=	48 T	=	50 2M	=	60 5M	=	60 10M	=	5 T	=	50 2M	=	60 5M	=	60 10M	=	52 T	=	50 Scale down x x + 2	min x + 4 min x + 6	min time 2M	=	20 5M	=	20 10M	=	20 20M	=	27 T	=	50 • Scaling up ∨ 2M	>	5 x	T ∨ 5M	>	2	x	T ∨ 10M	>	T 2M	=	20 5M	=	20 10M	=	20 20M	=	26 T	=	50 2M	=	20 5M	=	20 10M	=	20 20M	=	25 T	=	50 2M	=	20 5M	=	20 10M	=	20 20M	=	24 T	=	50
Read order Flink	Forward	- San	Francisco,	2018 48
Reader	groups	+	Scaling Pravega Segment	2 Segment	1 Reader Reader 1 Pravega Segment	2 Segment	1 Reader Reader 2 Segment	3 Segment	4 Scale	up! Flink	Forward	- San	Francisco,	2018 49
Reader	groups	+	Scaling Pravega Segment	2 Segment	1 Reader Reader 3 Segment	3 Segment	4 • Hit	end	of	segment • Get	successors • Update	reader	group	state Pravega Reader Reader 4 Segment	4 Segment	2 Segment	3 Pravega Reader	{3} Reader	{2,	4} 5 Segment	4 Segment	2 Segment	3 Flink	Forward	- San	Francisco,	2018 50
Building pipelines – Scaling downstream Flink	Forward	- San	Francisco,	2018 51
Scaling pipelines Flink	Forward	- San	Francisco,	2018 52 Stage 1 Stage 2Source All stages can	handle the load	induced by the source
Scaling pipelines Flink	Forward	- San	Francisco,	2018 53 Scaled Stage 1 Stage 2Big	source Stage 2	can’t cope	with the load	change Load	coming from source increases Stage 1	scales and adapts to	the load change
Scaling signals Flink	Forward	- San	Francisco,	2018 54 Pravega AppBig	source • Pravega won’t scale the application
Scaling signals Flink	Forward	- San	Francisco,	2018 55 Pravega AppBig	source • Pravega won’t scale the application downstream • …	but it can	signal • E.g.,	more	segments • E.g.,	number of	unread bytes	is growing Signals from Pravega
Reader	group notifier • Listener API • Register a	listener to	react to	changes • E.g.,	changes to	the number of	segments Flink	Forward	- San	Francisco,	2018 56 ReaderGroupManager groupManager = new ReaderGroupManagerImpl(SCOPE, controller, clientFactory, connectionFactory); ReaderGroup readerGroup = groupManager.createReaderGroup(GROUP_NAME, ReaderGroupConfig.builder().build(), Collections.singleton(STREAM)); readerGroup.getSegmentNotifier(executor).registerListener(segmentNotification -> { int numOfReaders = segmentNotification.getNumOfReaders(); int segments = segmentNotification.getNumOfSegments(); if (numOfReaders < segments) { //Scale up number of readers based on application capacity } else { //More readers available time to shut down some } });
Reader	group:	listener and	metrics • Listener API • Register a	listener to	react to	changes • E.g.,	changes to	the number of	segments • Metrics • Reports specific values of	interest • E.g.,	number of	unread bytes	in	a	stream Flink	Forward	- San	Francisco,	2018 57
Consuming Pravega streams with Apache Flink Flink Forward	- San	Francisco,	2018 58
How	to	read	Pravega streams	with	Flink? Flink	Forward	- San	Francisco,	2018 59 Task	Manager ReaderPravega Stream • FlinkPravegaReader • ReaderGroup • Assignment	of	segments • Rebalance • Key	to	automatic	scaling Task	Manager Task	Manager Task	Manager Reader https://github.com/pravega/flink-connectors
How	to	react	to	segment	changes? Flink Forward	- San	Francisco,	2018 60 Pravega Stream Task	Manager Job	Manager Reader	Rescaling Policy Task	Manager Task	Manager (2)	Segment	change notification (1)	Register	segment listener (3)	Rescale	job (4)	Take	savepoint (5)	Redeploy	&	resume	tasks Reader
Scaling	signals 61 • Latency • Throughput • Resource	utilization • Connector	signals Flink Forward	- San	Francisco,	2018
Rescaling Flink applications Flink	Forward	- San	Francisco,	2018 62
Scaling	stateless	jobs 63 Scale	Up Scale	Down Source Mapper Sink • Scale	up:	Deploy	new	tasks • Scale	down:	Cancel	running	tasks Flink Forward	- San	Francisco,	2018
Scaling	stateful jobs 64 ? • Problem:	Which	state	to	assign	to	new	task? Flink Forward	- San	Francisco,	2018
Different state types in Flink Flink	Forward	- San	Francisco,	2018 65
Keyed	vs.	operator	state 66 • State	bound	to	a	key • E.g.	Keyed	UDF	and	window	state • State	bound	to	a	subtask • E.g.	Source	state Keyed Operator Flink Forward	- San	Francisco,	2018
Repartitioning	keyed	state • Similar	to	consistent	hashing • Split	key	space	into	key	groups • Assign	key	groups	to	tasks 67 Key	space Key	group	#1 Key	group	#2 Key	group	#3Key	group	#4 Flink Forward	- San	Francisco,	2018
Repartitioning	keyed	state	contd. • Rescaling	changes	key	group assignment • Maximum	parallelism	defined	by #key	groups 68Flink Forward	- San	Francisco,	2018
Repartitioning	operator	state • Breaking	operator	state	up	into	finer granularity • State	has	to	contain	multiple	entries • Automatic	repartitioning	wrt	granularity 69 #1 #2 #3 Flink Forward	- San	Francisco,	2018
Acquiring New Resources – Resource Elasticity Flink	Forward	- San	Francisco,	2018 70
Flink’s Revamped	Distributed	Architecture Flink Forward	- San	Francisco,	2018 71 • Motivation • Resource	elasticity • Support	for	different	deployments • REST	interface	for	client-cluster communication • Introduce	generic	building	blocks • Compose	blocks	for	different	scenarios
The	Building	Blocks 72 • ClusterManager-specific • May	live	across	jobs • Manages	available	Containers/TaskManagers • Used	to	acquire	/	release	resources ResourceManager TaskManagerJobManager • Registers	at	ResourceManager • Gets	tasks	from	one	or	more	JobManagers • Single	job	only,	started	per	job • Thinks	in	terms	of	"task	slots" • Deploys	and	monitors	job/task	execution Dispatcher • Lives	across	jobs • Touch-point	for	job	submissions • Spawns	JobManagers Flink Forward	- San	Francisco,	2018
The	Building	Blocks 73 ResourceManager (3)	Request	slots TaskManager JobManager (4)	Start	TaskManagers (5)	Register (7)	Deploy	Tasks Dispatcher Client (1)	Submit	Job (2)	Start JobManager (6)	Offer	slots Flink Forward	- San	Francisco,	2018
Building	Flink-on-YARN 74 YARN ResourceManager YARN Cluster YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) Application Master Flink-YARN ResourceManager JobManager TaskManager TaskManager TaskManager (2) Spawn Application Master (4) Start TaskManagers (6) Deploy Tasks (5) Register (3) Request slots
Does It Actually Work? Flink	Forward	- San	Francisco,	2018 75 Flink	Forward	- San	Francisco,	2018
Demo	Topology 76 Pravega Source Sink FILE.out • Executed	on	Yarn	to	support	dynamic	resource	allocation time Event	rate Flink Forward	- San	Francisco,	2018
Wrap Up Flink	Forward	- San	Francisco,	2018 77 Flink	Forward	- San	Francisco,	2018
Wrap up • Pravega • Stream store • Scalable ingestion of	continuously generated data • Stream scaling • Apache	Flink • Stateful job scaling • Full	resource elasticity • Operator	rescaling	policies	work	in	progress • Pravega +	Apache	Flink • End-to-end scalable data	pipelines Flink	Forward	- San	Francisco,	2018 78
Flink	Forward	- San	Francisco,	2018 79 Questions? http://pravega.io http://github.com/pravega/pravega http://flink.apache.org http://github.com/pravega/flink-connectors https://github.com/tillrohrmann/flink/tree/rescalingPolicy E-mail: fpj@apache.org,	trohrmann@apache.org Twitter:	@fpjunqueira,	@stsffap

Scaling stream data pipelines with Pravega and Apache Flink