Stateful distributed stream processing Gyula Fóra gyfora@apache.org @GyulaFora
This talk § Stateful processing by example § Definition and challenges § State in current open-source systems § State in Apache Flink § Closing 2Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Stateful processing by example § Window aggregations • Total number of customers in the last 10 minutes • State: Current aggregate § Machine learning • Fitting trends to the evolving stream • State: Model 3Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Stateful processing by example § Pattern recognition • Detect suspicious financial activity • State: Matched prefix § Stream-stream joins • Match ad views and impressions • State: Elements in the window 4Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Stateful operators § All these examples use a common processing pattern § Stateful operator (in essence): 𝒇:   𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆. § State hangs around and can be read and modified as the stream evolves § Goal: Get as close as possible while maintaining scalability and fault-tolerance 5Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
State-of-the-art systems § Most systems allow developers to implement stateful programs § Trick is to limit the scope of 𝒇 (state access) while maintaining expressivity § Issues to tackle: • Expressivity • Exactly-once semantics • Scalability to large inputs • Scalability to large states 6Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ States available only in Trident API § Dedicated operators for state updates and queries § State access methods • stateQuery(…) • partitionPersist(…) • persistentAggregate(…) § It’s very difficult to implement transactional states Exactly-­‐‑once  guarantee 7Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Storm Word Count 8Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ Stateless runtime by design • No continuous operators • UDFs are assumed to be stateless § State can be generated as a stream of RDDs: updateStateByKey(…) 𝒇:   𝑺𝒆𝒒[𝒊𝒏 𝒌], 𝒔𝒕𝒂𝒕𝒆 𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆. 𝒌 § 𝒇 is scoped to a specific key § Exactly-once semantics 9Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
val stateDstream = wordDstream.updateStateByKey[Int]( newUpdateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initialRDD) val updateFunc = (values: Seq[Int], state: Option[Int]) => { val currentCount = values.sum val previousCount = state.getOrElse(0) Some(currentCount + previousCount) } Spark Streaming Word Count 10Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ Stateful dataflow operators (Any task can hold state) § State changes are stored as a log by Kafka § Custom storage engines can be plugged in to the log § 𝒇 is scoped to a specific task § At-least-once processing semantics 11Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Samza Word Count public class WordCounter implements StreamTask, InitableTask { //Some omitted details… private KeyValueStore<String, Integer> store; public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { //Get the current count String word = (String) envelope.getKey(); Integer count = store.get(word); if (count == null) count = 0; //Increment, store and send count += 1; store.put(word, count); collector.send( new OutgoingMessageEnvelope(OUTPUT_STREAM, word ,count)); } } 12Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
What can we say so far? § Trident + Consistent state accessible from outside – Only works well with idempotent states – States are not part of the operators § Spark + Integrates well with the system guarantees – Limited expressivity – Immutability increases update complexity § Samza + Efficient log based state updates + States are well integrated with the operators – Lack of exactly-once semantics – State access is not fully transparent 13Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ Take what’s good, make it work + add some more § Clean and powerful abstractions • Local (Task) state • Partitioned (Key) state § Proper API integration • Java: OperatorState interface • Scala: mapWithState, flatMapWithState… § Exactly-once semantics by checkpointing 14Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Flink Word Count words.keyBy(x => x).mapWithState { (word, count: Option[Int]) => { val newCount = count.getOrElse(0) + 1 val output = (word, newCount) (output, Some(newCount)) } } 15Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Local State § Task scoped state access § Can be used to implement custom access patterns § Typical usage: • Source operators (offset) • Machine learning models • Use cyclic flows to simulate global state access 16Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Local State Example (Java) public class MySource extends RichParallelSourceFunction { // Omitted details private OperatorState<Long> offset; @Override public void run(SourceContext ctx) { Object checkpointLock = ctx.getCheckpointLock(); isRunning = true; while (isRunning) { synchronized (checkpointLock) { offset.update(offset.value() + 1); // ctx.collect(next); } } } } 17Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Partitioned State § Key scoped state access § Highly scalable § Allows for incremental backup/restore § Typical usage: • Any per-key operation • Grouped aggregations • Window buffers 18Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Partitioned State Example (Scala) // Compute the current average of each city's temperature temps.keyBy("city").mapWithState { (in: Temp, state: Option[(Double, Long)]) => { val current = state.getOrElse((0.0, 0L)) val updated = (current._1 + in.temp, current._2 + 1) val avg = Temp(in.city, updated._1 / updated._2) (avg, Some(updated)) } } case class Temp(city: String, temp: Double) 19Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Exactly-once semantics § Based on consistent global snapshots § Algorithm designed for stateful dataflows 20Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27 Detailed  mechanism
Exactly-once semantics § Low runtime overhead § Checkpointing logic is separated from application logic 21Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27 Blogpost  on  streaming  fault-­‐‑tolerance
Summary § State is essential to many applications § Fault-tolerant streaming state is a hard problem § There is a trade-off between expressivity vs scalability/fault-tolerance § Flink tries to hit the sweet spot with… • Providing very flexible abstractions • Keeping good scalability and exactly-once semantics 22Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Thank you!

Stateful Distributed Stream Processing

  • 1.
    Stateful distributed stream processing GyulaFóra gyfora@apache.org @GyulaFora
  • 2.
    This talk § Statefulprocessing by example § Definition and challenges § State in current open-source systems § State in Apache Flink § Closing 2Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 3.
    Stateful processing byexample § Window aggregations • Total number of customers in the last 10 minutes • State: Current aggregate § Machine learning • Fitting trends to the evolving stream • State: Model 3Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 4.
    Stateful processing byexample § Pattern recognition • Detect suspicious financial activity • State: Matched prefix § Stream-stream joins • Match ad views and impressions • State: Elements in the window 4Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 5.
    Stateful operators § Allthese examples use a common processing pattern § Stateful operator (in essence): 𝒇:   𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆. § State hangs around and can be read and modified as the stream evolves § Goal: Get as close as possible while maintaining scalability and fault-tolerance 5Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 6.
    State-of-the-art systems § Mostsystems allow developers to implement stateful programs § Trick is to limit the scope of 𝒇 (state access) while maintaining expressivity § Issues to tackle: • Expressivity • Exactly-once semantics • Scalability to large inputs • Scalability to large states 6Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 7.
    § States availableonly in Trident API § Dedicated operators for state updates and queries § State access methods • stateQuery(…) • partitionPersist(…) • persistentAggregate(…) § It’s very difficult to implement transactional states Exactly-­‐‑once  guarantee 7Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 8.
    Storm Word Count 8Apache Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 9.
    § Stateless runtimeby design • No continuous operators • UDFs are assumed to be stateless § State can be generated as a stream of RDDs: updateStateByKey(…) 𝒇:   𝑺𝒆𝒒[𝒊𝒏 𝒌], 𝒔𝒕𝒂𝒕𝒆 𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆. 𝒌 § 𝒇 is scoped to a specific key § Exactly-once semantics 9Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 10.
    val stateDstream =wordDstream.updateStateByKey[Int]( newUpdateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initialRDD) val updateFunc = (values: Seq[Int], state: Option[Int]) => { val currentCount = values.sum val previousCount = state.getOrElse(0) Some(currentCount + previousCount) } Spark Streaming Word Count 10Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 11.
    § Stateful dataflowoperators (Any task can hold state) § State changes are stored as a log by Kafka § Custom storage engines can be plugged in to the log § 𝒇 is scoped to a specific task § At-least-once processing semantics 11Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 12.
    Samza Word Count publicclass WordCounter implements StreamTask, InitableTask { //Some omitted details… private KeyValueStore<String, Integer> store; public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { //Get the current count String word = (String) envelope.getKey(); Integer count = store.get(word); if (count == null) count = 0; //Increment, store and send count += 1; store.put(word, count); collector.send( new OutgoingMessageEnvelope(OUTPUT_STREAM, word ,count)); } } 12Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 13.
    What can wesay so far? § Trident + Consistent state accessible from outside – Only works well with idempotent states – States are not part of the operators § Spark + Integrates well with the system guarantees – Limited expressivity – Immutability increases update complexity § Samza + Efficient log based state updates + States are well integrated with the operators – Lack of exactly-once semantics – State access is not fully transparent 13Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 14.
    § Take what’sgood, make it work + add some more § Clean and powerful abstractions • Local (Task) state • Partitioned (Key) state § Proper API integration • Java: OperatorState interface • Scala: mapWithState, flatMapWithState… § Exactly-once semantics by checkpointing 14Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 15.
    Flink Word Count words.keyBy(x=> x).mapWithState { (word, count: Option[Int]) => { val newCount = count.getOrElse(0) + 1 val output = (word, newCount) (output, Some(newCount)) } } 15Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 16.
    Local State § Taskscoped state access § Can be used to implement custom access patterns § Typical usage: • Source operators (offset) • Machine learning models • Use cyclic flows to simulate global state access 16Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 17.
    Local State Example(Java) public class MySource extends RichParallelSourceFunction { // Omitted details private OperatorState<Long> offset; @Override public void run(SourceContext ctx) { Object checkpointLock = ctx.getCheckpointLock(); isRunning = true; while (isRunning) { synchronized (checkpointLock) { offset.update(offset.value() + 1); // ctx.collect(next); } } } } 17Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 18.
    Partitioned State § Keyscoped state access § Highly scalable § Allows for incremental backup/restore § Typical usage: • Any per-key operation • Grouped aggregations • Window buffers 18Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 19.
    Partitioned State Example(Scala) // Compute the current average of each city's temperature temps.keyBy("city").mapWithState { (in: Temp, state: Option[(Double, Long)]) => { val current = state.getOrElse((0.0, 0L)) val updated = (current._1 + in.temp, current._2 + 1) val avg = Temp(in.city, updated._1 / updated._2) (avg, Some(updated)) } } case class Temp(city: String, temp: Double) 19Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 20.
    Exactly-once semantics § Basedon consistent global snapshots § Algorithm designed for stateful dataflows 20Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27 Detailed  mechanism
  • 21.
    Exactly-once semantics § Lowruntime overhead § Checkpointing logic is separated from application logic 21Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27 Blogpost  on  streaming  fault-­‐‑tolerance
  • 22.
    Summary § State isessential to many applications § Fault-tolerant streaming state is a hard problem § There is a trade-off between expressivity vs scalability/fault-tolerance § Flink tries to hit the sweet spot with… • Providing very flexible abstractions • Keeping good scalability and exactly-once semantics 22Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 23.