Introducing Apache Flink™ @StephanEwen
Flink’s Recent History April 2014 April 2015Dec 2014 Top Level Project Graduation 0.70.60.5 0.90.9-m1
What is Apache Flink? 3 Gelly Table ML SAMOA DataSet (Java/Scala) DataStream (Java/Scala) HadoopM/R Local Remote YARN Tez Embedded Dataflow Dataflow(WiP) MRQL Table Cascading (WiP) Streaming dataflow runtime Zeppelin A Top-Level project of the Apache Software Foundation
Program compilation 4 case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Optimizer Type extraction stack Task scheduling Dataflow metadata Pre-flight (Client) Master Data Source orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow Graph Independent of batch or streaming job deploy operators track intermediate results
Native workload support 5 Flink Streaming topologies Long batch pipelines Machine Learning at scale How can an engine natively support all these workloads? And what does "native" mean? Graph Analysis  Low latency  resource utilization  iterative algorithms  Mutable state
E.g.: Non-native iterations 6 Step Step Step Step Step Client for (int i = 0; i < maxIterations; i++) { // Execute MapReduce job }
E.g.: Non-native streaming 7 stream discretizer Job Job Job Job while (true) { // get next few records // issue batch job } Data Stream
Native workload support 8 Flink Streaming topologies Long batch pipelines Machine Learning at scale How can an engine natively support all these workloads? And what does "native" mean? Graph Analysis  Low latency  resource utilization  iterative algorithms  Mutable state
Ingredients for “native” support 1. Execute everything as streams Pipelined execution, backpressure or buffered, push/pull model 2. Special code paths for batch Automatic job optimization, fault tolerance 3. Allow some iterative (cyclic) dataflows 4. Allow some mutable state 5. Operate on managed memory Make data processing on the JVM robust 9
Stream processing in Flink 10
Stream platform architecture 11 - Gather and backup streams - Offer streams for consumption - Provide stream recovery - Analyze and correlate streams - Create derived streams and state - Provide these to downstream systems Server logs Trxn logs Sensor logs Downstream systems
What is a stream processor? 1. Pipelining 2. Stream replay 3. Operator state 4. Backup and restore 5. High-level APIs 6. Integration with batch 7. High availability 8. Scale-in and scale-out 12 Basics State App development Large deployments See http://data-artisans.com/stream-processing-with-flink.html
Pipelining 13 Basic building block to “keep the data moving” Note: pipelined systems do not usually transfer individual tuples, but buffers that batch several tuples!
Operator state  User-defined state • Flink transformations (map/reduce/etc) are long-running operators, feel free to keep around objects • Hooks to include in system's checkpoint  Windowed streams • Time, count, data-driven windows • Managed by the system (currently WiP) 14
Streaming fault tolerance  Ensure that operators see all events • “At least once” • Solved by replaying a stream from a checkpoint, e.g., from a past Kafka offset  Ensure that operators do not perform duplicate updates to their state • “Exactly once” • Several solutions 15
Exactly once approaches  Discretized streams (Spark Streaming) • Treat streaming as a series of small atomic computations • “Fast track” to fault tolerance, but does not separate application logic (semantics) from recovery  MillWheel (Google Cloud Dataflow) • State update and derived events committed as atomic transaction to a high-throughput transactional store • Needs a very high-throughput transactional store   Chandy-Lamport distributed snapshots (Flink) 16
Distributed snapshots in Flink Super-impose checkpointing mechanism on execution instead of using execution as the checkpointing mechanism 17
18 JobManager Register checkpoint barrier on master Replay will start from here
19 JobManagerBarriers “push” prior events (assumes in-order delivery in individual channels) Operator checkpointing starting Operator checkpointing finished Operator checkpointing in progress
20 JobManager Operator checkpointing takes snapshot of state after data prior to barrier have updated the state. Checkpoints currently synchronous, WiP for incremental and asynchronous State backup Pluggable mechanism. Currently either JobManager (for small state) or file system (HDFS/Tachyon). WiP for in-memory grids
21 JobManager Operators with many inputs need to wait for all barriers to pass before they checkpoint their state
22 JobManager State snapshots at sinks signal successful end of this checkpoint At failure, recover last checkpointed state and restart sources from last barrier guarantees at least once State backup
Benefits of Flink’s approach  Data processing does not block • Can checkpoint at any interval you like to balance overhead/recovery time  Separates business logic from recovery • Checkpointing interval is a config parameter, not a variable in the program (as in discretization)  Can support richer windows • Session windows, event time, etc  Best of all worlds: true streaming latency, exactly-once semantics, and low overhead for recovery 23
DataStream API 24 case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming):
Roadmap  Short-term (3-6 months) • Graduate DataStream API from beta • Fully managed window and user-defined state with pluggable backends • Table API for streams (towards StreamSQL)  Long-term (6+ months) • Highly available master • Dynamic scale in/out • FlinkML and Gelly for streams • Full batch + stream unification 25
Batch processing Batch on Streaming 26
Batch Pipelines 27
Batch on Streaming  Batch programs are a special kind of streaming program 28 Infinite Streams Finite Streams Stream Windows Global View Pipelined Data Exchange Pipelined or Blocking Exchange Streaming Programs Batch Programs
Batch Pipelines 29 Data exchange (shuffle / broadcast) is mostly streamed Some operators block (e.g. sorts / hash tables)
Operators Execution Overlaps 30
Memory Management 31
Memory Management 32
Smooth out-of-core performance 33 More at: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html Blue bars are in-memory, orange bars (partially) out-of-core
Other features of Flink There is more… 34
More Engine Features 35 Automatic Optimization / Static Code Analysis Closed Loop Iterations Stateful Iterations DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT prob e broadc ast forward Combine GroupRed sort DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] hash-part [0,1] GroupRed sort forward
Closing 36
Apache Flink: community 37
I Flink, do you?  38 If you find this exciting, get involved and start a discussion on Flink‘s mailing list, or stay tuned by subscribing to news@flink.apache.org, following flink.apache.org/blog, and @ApacheFlink on Twitter
39 flink-forward.org Bay Area Flink meetup Tomorrow

Apache Flink Overview at SF Spark and Friends

  • 1.
  • 2.
    Flink’s Recent History April2014 April 2015Dec 2014 Top Level Project Graduation 0.70.60.5 0.90.9-m1
  • 3.
    What is ApacheFlink? 3 Gelly Table ML SAMOA DataSet (Java/Scala) DataStream (Java/Scala) HadoopM/R Local Remote YARN Tez Embedded Dataflow Dataflow(WiP) MRQL Table Cascading (WiP) Streaming dataflow runtime Zeppelin A Top-Level project of the Apache Software Foundation
  • 4.
    Program compilation 4 case classPath (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Optimizer Type extraction stack Task scheduling Dataflow metadata Pre-flight (Client) Master Data Source orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow Graph Independent of batch or streaming job deploy operators track intermediate results
  • 5.
    Native workload support 5 Flink Streaming topologies Longbatch pipelines Machine Learning at scale How can an engine natively support all these workloads? And what does "native" mean? Graph Analysis  Low latency  resource utilization  iterative algorithms  Mutable state
  • 6.
    E.g.: Non-native iterations 6 StepStep Step Step Step Client for (int i = 0; i < maxIterations; i++) { // Execute MapReduce job }
  • 7.
    E.g.: Non-native streaming 7 stream discretizer JobJob Job Job while (true) { // get next few records // issue batch job } Data Stream
  • 8.
    Native workload support 8 Flink Streaming topologies Longbatch pipelines Machine Learning at scale How can an engine natively support all these workloads? And what does "native" mean? Graph Analysis  Low latency  resource utilization  iterative algorithms  Mutable state
  • 9.
    Ingredients for “native”support 1. Execute everything as streams Pipelined execution, backpressure or buffered, push/pull model 2. Special code paths for batch Automatic job optimization, fault tolerance 3. Allow some iterative (cyclic) dataflows 4. Allow some mutable state 5. Operate on managed memory Make data processing on the JVM robust 9
  • 10.
  • 11.
    Stream platform architecture 11 -Gather and backup streams - Offer streams for consumption - Provide stream recovery - Analyze and correlate streams - Create derived streams and state - Provide these to downstream systems Server logs Trxn logs Sensor logs Downstream systems
  • 12.
    What is astream processor? 1. Pipelining 2. Stream replay 3. Operator state 4. Backup and restore 5. High-level APIs 6. Integration with batch 7. High availability 8. Scale-in and scale-out 12 Basics State App development Large deployments See http://data-artisans.com/stream-processing-with-flink.html
  • 13.
    Pipelining 13 Basic building blockto “keep the data moving” Note: pipelined systems do not usually transfer individual tuples, but buffers that batch several tuples!
  • 14.
    Operator state  User-definedstate • Flink transformations (map/reduce/etc) are long-running operators, feel free to keep around objects • Hooks to include in system's checkpoint  Windowed streams • Time, count, data-driven windows • Managed by the system (currently WiP) 14
  • 15.
    Streaming fault tolerance Ensure that operators see all events • “At least once” • Solved by replaying a stream from a checkpoint, e.g., from a past Kafka offset  Ensure that operators do not perform duplicate updates to their state • “Exactly once” • Several solutions 15
  • 16.
    Exactly once approaches Discretized streams (Spark Streaming) • Treat streaming as a series of small atomic computations • “Fast track” to fault tolerance, but does not separate application logic (semantics) from recovery  MillWheel (Google Cloud Dataflow) • State update and derived events committed as atomic transaction to a high-throughput transactional store • Needs a very high-throughput transactional store   Chandy-Lamport distributed snapshots (Flink) 16
  • 17.
    Distributed snapshots inFlink Super-impose checkpointing mechanism on execution instead of using execution as the checkpointing mechanism 17
  • 18.
    18 JobManager Register checkpoint barrier onmaster Replay will start from here
  • 19.
    19 JobManagerBarriers “push” priorevents (assumes in-order delivery in individual channels) Operator checkpointing starting Operator checkpointing finished Operator checkpointing in progress
  • 20.
    20 JobManager Operator checkpointingtakes snapshot of state after data prior to barrier have updated the state. Checkpoints currently synchronous, WiP for incremental and asynchronous State backup Pluggable mechanism. Currently either JobManager (for small state) or file system (HDFS/Tachyon). WiP for in-memory grids
  • 21.
    21 JobManager Operators with manyinputs need to wait for all barriers to pass before they checkpoint their state
  • 22.
    22 JobManager State snapshots atsinks signal successful end of this checkpoint At failure, recover last checkpointed state and restart sources from last barrier guarantees at least once State backup
  • 23.
    Benefits of Flink’sapproach  Data processing does not block • Can checkpoint at any interval you like to balance overhead/recovery time  Separates business logic from recovery • Checkpointing interval is a config parameter, not a variable in the program (as in discretization)  Can support richer windows • Session windows, event time, etc  Best of all worlds: true streaming latency, exactly-once semantics, and low overhead for recovery 23
  • 24.
    DataStream API 24 case classWord (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming):
  • 25.
    Roadmap  Short-term (3-6months) • Graduate DataStream API from beta • Fully managed window and user-defined state with pluggable backends • Table API for streams (towards StreamSQL)  Long-term (6+ months) • Highly available master • Dynamic scale in/out • FlinkML and Gelly for streams • Full batch + stream unification 25
  • 26.
  • 27.
  • 28.
    Batch on Streaming Batch programs are a special kind of streaming program 28 Infinite Streams Finite Streams Stream Windows Global View Pipelined Data Exchange Pipelined or Blocking Exchange Streaming Programs Batch Programs
  • 29.
    Batch Pipelines 29 Data exchange(shuffle / broadcast) is mostly streamed Some operators block (e.g. sorts / hash tables)
  • 30.
  • 31.
  • 32.
  • 33.
    Smooth out-of-core performance 33 Moreat: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html Blue bars are in-memory, orange bars (partially) out-of-core
  • 34.
    Other features ofFlink There is more… 34
  • 35.
    More Engine Features 35 AutomaticOptimization / Static Code Analysis Closed Loop Iterations Stateful Iterations DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT prob e broadc ast forward Combine GroupRed sort DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT prob e hash-part [0] hash-part [0] hash-part [0,1] GroupRed sort forward
  • 36.
  • 37.
  • 38.
    I Flink, doyou?  38 If you find this exciting, get involved and start a discussion on Flink‘s mailing list, or stay tuned by subscribing to news@flink.apache.org, following flink.apache.org/blog, and @ApacheFlink on Twitter
  • 39.

Editor's Notes

  • #4 Flink is an entire software stack the heart: streaming dataflow engine: think of programs as operators and data flows Kappa architecture: run batch programs on a streaming system Table API: logical representation, sql-style Samoa “on-line learners”
  • #5 toy program: native transitive closure type extraction: types that go in and out of each operator
  • #6 Flink is an analytical system streaming topology: real-time; low latency “native”: build-in support in the system, no working around, no black-box next slide: define native by some “non-native” examples
  • #7 Used for Machine Learning run the same job over the data multiple times to come up with parameters for a ml model this is how you do it when treating the engine as a black box
  • #8 If you only have a batch processor: do a lot of small batch jobs LIMITATION: state across the small jobs (batches)
  • #9 Flink is an analytical system streaming topology: real-time; low latency “native”: build-in support in the system, no working around, no black-box next slide: define native by some “non-native” examples
  • #10 Corner points / requirements for flink keep data in motion, avoid materialization even though it’s a streaming runtime, have special paths for batch: OPTIMIZER, CHECKPOINTING make the system aware of cyclic data flows, in a controlled way allow operators to have some state, in a controlled way (DELTA-ITERATIONS). relax “traditional” batch assumption flink runs in the jvm, but we want control over memory, not rely on GC
  • #12 What are the technologies that enable streaming? The open source leaders in this space is Apache Kafka (that solves the integration problem), and Apache Flink (that solves the analytics problem, removing the final barrier). Kafka and Flink combined can remove the batch barriers from the infrastructure, creating a truly real-time analytics platform.
  • #27 structure, different title
  • #38 Other data points Google (cloud dataflow) Hortonworks Cloudera Adatao Concurrent Confluent We have been part of this open source movement with Apache Flink. Flink is a streaming dataflow engine that can run in Hadoop clusters. Flink has grown a lot over the past year both in terms of code and community. We have added domain-specific libraries, a streaming API with streaming backend support, etc, etc. Tremendous growth. Flink has also grown in community. The project is by now a very established Apache project, it has more than 140 contributors (placing it at the top 5 of Apache big data projects), and several companies are starting to experiment with it. At data Artisans we are supporting two production installations (ResearchGate and Bouygues Telecom), and are helping a number of companies that are testing Flink (e.g., Spotify, King.com, Amadeus, and a group at Yahoo). Huawei and Intel have started contributing to Flink, and interest in vendors is picking up (e.g., Adatao, Huawei, Hadoop vendors). All of this is the result of purely organic growth with very little marketing investment from data Artisans.