Making Sense of Apache Flink: A Fearless Introduction David Anderson Software Practice Lead, Confluent Apache Flink Committer
Real-time Data A Sale A Shipment A Trade Rich Customer Experiences A Customer Experience Real-Time Backend Operations Real-time Stream Processing
Innovative companies have adopted both Kafka & Flink
Digital natives leverage Kafka and Flink to disrupt markets and gain competitive advantage UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection
Driving business value with Apache Flink Real-time analytics Event-driven applications Streaming data pipelines Continuously produce and update results which are displayed and delivered to users as real-time data streams are consumed ● Ad/campaign performance ● Content performance ● Quality monitoring of Telco networks ● Usage metering and billing Recognize patterns and react to incoming events by triggering computations, state updates, or external actions ● Fraud detection ● Anomaly detection ● Business process monitoring ● Geo-fencing Real-time data pipelines that continuously ingest, enrich, and transform data streams, loading them into destination systems for timely action (vs. batch processing) ● Continuous ETL ● Real-time search index building ● ML pipelines ● Data lake ingestion
Developers are choosing Flink because of Its performance and rich feature set Scalability & high performance Flink supports stream processing workloads at tremendous scale Flink supports Java, Python, & SQL, enabling developers to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology Unified processing Flink's checkpointing mechanism provides exactly-once guarantees automatically Fault tolerance & high availability Language flexibility Flink is a top 5 Apache project and has a very active community
@yourtwitterhandle | developer.confluent.io Streaming The four cornerstones on which Flink is built State Time Snapshots
● A stream is a sequence of events ● Business data is always a stream: bounded or unbounded ● For Flink, batch processing is just a special case in the runtime now past future bounded stream unbounded stream Streaming
Kafka Databases Key/Value Stores Files Apps Sources Real-time Stream Processing Sinks
Real-time Stream Processing Kafka Databases Key/Value Stores Files Apps Sources Sinks
The Job Graph (or Topology)
The Job Graph (or Topology) OPERATOR CONNECTION
Stream processing • Parallel • Forward • Repartition grouped by shape SOURCE
Stream processing • Parallel • Forward • Repartition grouped by shape SOURCE
Stream processing • Parallel • Forward • Repartition group by color FILTER
Stream processing • Parallel • Forward • Repartition COUNT 1 2 3 1 2 3 4
Stream processing with SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Stream processing with SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Stream processing with SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Stream processing with SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Stream processing with SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Flink’s APIs Process Functions DataStream API Flink SQL Table API Level of abstraction
Flink’s APIs Apache Flink Runtime Low-Level Stream Operator API Optimizer / Planner Table / SQL API DataStream API Process Functions Process Functions DataStream API Flink SQL Table API Level of abstraction How the code is organized
Runtime Architecture
Runtime Architecture
@yourtwitterhandle | developer.confluent.io Streaming State Time Snapshots
Stateful stream processing with Flink SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Stateful stream processing with Flink SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
Stateful stream processing with Flink SQL ● Counting requires state GROUP BY color events COUNT WHERE color <> orange results
State • Local • Fast • Fault tolerant
State • Local • Fast • Fault tolerant
@yourtwitterhandle | developer.confluent.io Streaming State Time Snapshots
Time • Synchronize • Wait • Timeout 09:05:44 When the event was created at its original source. Event time 09:08:01 When the event is being processed. This time varies between applications. Processing time
● Streams are (roughly) ordered by time Out-of-order event streams 10:10 10:14 10:10 10:14
Coping with out of order events This event will be read next
Coping with out of order events These events follow
Coping with out of order events Imagine a window counting events for the hour ending at 2:00. How long should this window wait before producing its results?
Watermarks measure progress of event time Watermark ● This watermark has been generated by assuming that the stream is at most 5 minutes out-of-order
Watermarks measure progress of event time ● This watermark has been generated by assuming that the stream is at most 5 minutes out-of-order ● The watermark is the max timestamp seen so far, minus this out-of-orderness estimate 1:50 - 5 = 1:45
Watermarks measure progress of event time ● This watermark has been generated by assuming that the stream is at most 5 minutes out-of-order ● The watermark is the max timestamp seen so far, minus this out-of-orderness estimate ● A watermark is an assertion about the completeness of the stream Now this stream is complete up to 1:45
Watermarks measure progress of event time Imagine a window counting events for the hour ending at 2:00. How long should this window wait before producing its results? It should wait for a watermark with a timestamp of at least 2:00.
What are watermarks for? They make things happen when the time is right.
Why isn’t my Flink job producing any results? There’s probably something wrong with the watermarking.
The idle stream problem ● Streams that are idle do not advance the watermark ● This prevents windows from producing results
The idle stream problem ● Streams that are idle do not advance the watermark ● This prevents windows from producing results Solutions ● Balance the partitions so none are empty or idle, or ● Send keep-alive events, or ● Configure the watermarking to use idleness detection
@yourtwitterhandle | developer.confluent.io Streaming State Time Snapshots
A checkpoint is an automatic snapshot created by Flink, for the purpose of failure recovery
A checkpoint is an automatic snapshot created by Flink, for the purpose of failure recovery A savepoint is a manual snapshot created for some operational purpose (e.g., an upgrade)
Snapshots GROUP BY color results COUNT WHERE color <> orange events
Snapshots GROUP BY color results COUNT events FILTER Read from source Filter Count by color Write to sink
Snapshots Read from source Filter Count by color Write to sink Source offsets Source offsets GROUP BY color results COUNT events FILTER
Snapshots Read from source Filter Count by color Write to sink Source offsets Source offsets GROUP BY color results COUNT events FILTER
Snapshots Read from source Filter Count by color Write to sink Source offsets Counters Source offsets Counters GROUP BY color results COUNT events FILTER
Snapshots Read from source Filter Count by color Write to sink Source offsets Counters Transaction ID Source offsets Counters Transaction ID GROUP BY color results COUNT events FILTER
Taking a snapshot does NOT stop the world Checkpoints and savepoints are created asynchronously, while the job continues to process events and produce results
Because these are self-consistent, global snapshots ● Flink provides (effectively) exactly-once guarantees ● Recovery involves restarting the entire job from the most recent checkpoint Recovery
Wrap-up
61 Streaming Unfamiliar to many developers, but ultimately straightforward
62 Streaming Unfamiliar to many developers, but ultimately straightforward Delightfully simple ● local ● key/value ● single-threaded State
63 Streaming Unfamiliar to many developers, but ultimately straightforward Watermarks encapsulate something complex in one place – the sources ● how out-of-order? ● can it be idle? Delightfully simple ● local ● key/value ● single-threaded State Event time and watermarks
Streaming Unfamiliar to many developers, but ultimately straightforward Watermarks encapsulate something complex in one place – the sources ● how out-of-order? ● can it be idle? Transparent to application developers State snapshots for recovery Delightfully simple ● local ● key/value ● single-threaded State Event time and watermarks
Your Apache Flink® journey begins here developer.confluent.io

Making Sense of Apache Flink: A Fearless Introduction

  • 1.
    Making Sense ofApache Flink: A Fearless Introduction David Anderson Software Practice Lead, Confluent Apache Flink Committer
  • 2.
    Real-time Data A Sale A Shipment ATrade Rich Customer Experiences A Customer Experience Real-Time Backend Operations Real-time Stream Processing
  • 3.
    Innovative companies haveadopted both Kafka & Flink
  • 4.
    Digital natives leverageKafka and Flink to disrupt markets and gain competitive advantage UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection
  • 5.
    Driving business valuewith Apache Flink Real-time analytics Event-driven applications Streaming data pipelines Continuously produce and update results which are displayed and delivered to users as real-time data streams are consumed ● Ad/campaign performance ● Content performance ● Quality monitoring of Telco networks ● Usage metering and billing Recognize patterns and react to incoming events by triggering computations, state updates, or external actions ● Fraud detection ● Anomaly detection ● Business process monitoring ● Geo-fencing Real-time data pipelines that continuously ingest, enrich, and transform data streams, loading them into destination systems for timely action (vs. batch processing) ● Continuous ETL ● Real-time search index building ● ML pipelines ● Data lake ingestion
  • 6.
    Developers are choosingFlink because of Its performance and rich feature set Scalability & high performance Flink supports stream processing workloads at tremendous scale Flink supports Java, Python, & SQL, enabling developers to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology Unified processing Flink's checkpointing mechanism provides exactly-once guarantees automatically Fault tolerance & high availability Language flexibility Flink is a top 5 Apache project and has a very active community
  • 7.
    @yourtwitterhandle | developer.confluent.io Streaming Thefour cornerstones on which Flink is built State Time Snapshots
  • 8.
    ● A streamis a sequence of events ● Business data is always a stream: bounded or unbounded ● For Flink, batch processing is just a special case in the runtime now past future bounded stream unbounded stream Streaming
  • 9.
  • 10.
  • 11.
    The Job Graph(or Topology)
  • 12.
    The Job Graph(or Topology) OPERATOR CONNECTION
  • 13.
    Stream processing • Parallel •Forward • Repartition grouped by shape SOURCE
  • 14.
    Stream processing • Parallel •Forward • Repartition grouped by shape SOURCE
  • 15.
    Stream processing • Parallel •Forward • Repartition group by color FILTER
  • 16.
    Stream processing • Parallel •Forward • Repartition COUNT 1 2 3 1 2 3 4
  • 17.
    Stream processing withSQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 18.
    Stream processing withSQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 19.
    Stream processing withSQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 20.
    Stream processing withSQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 21.
    Stream processing withSQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 22.
    Flink’s APIs Process Functions DataStreamAPI Flink SQL Table API Level of abstraction
  • 23.
    Flink’s APIs Apache FlinkRuntime Low-Level Stream Operator API Optimizer / Planner Table / SQL API DataStream API Process Functions Process Functions DataStream API Flink SQL Table API Level of abstraction How the code is organized
  • 24.
  • 25.
  • 26.
  • 27.
    Stateful stream processingwith Flink SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 28.
    Stateful stream processingwith Flink SQL INSERT INTO results SELECT color, COUNT(*) FROM events WHERE color <> orange GROUP BY color; GROUP BY color results COUNT WHERE color <> orange events
  • 29.
    Stateful stream processingwith Flink SQL ● Counting requires state GROUP BY color events COUNT WHERE color <> orange results
  • 30.
  • 31.
  • 32.
  • 33.
    Time • Synchronize • Wait •Timeout 09:05:44 When the event was created at its original source. Event time 09:08:01 When the event is being processed. This time varies between applications. Processing time
  • 34.
    ● Streams are(roughly) ordered by time Out-of-order event streams 10:10 10:14 10:10 10:14
  • 35.
    Coping with outof order events This event will be read next
  • 36.
    Coping with outof order events These events follow
  • 37.
    Coping with outof order events Imagine a window counting events for the hour ending at 2:00. How long should this window wait before producing its results?
  • 38.
    Watermarks measure progressof event time Watermark ● This watermark has been generated by assuming that the stream is at most 5 minutes out-of-order
  • 39.
    Watermarks measure progressof event time ● This watermark has been generated by assuming that the stream is at most 5 minutes out-of-order ● The watermark is the max timestamp seen so far, minus this out-of-orderness estimate 1:50 - 5 = 1:45
  • 40.
    Watermarks measure progressof event time ● This watermark has been generated by assuming that the stream is at most 5 minutes out-of-order ● The watermark is the max timestamp seen so far, minus this out-of-orderness estimate ● A watermark is an assertion about the completeness of the stream Now this stream is complete up to 1:45
  • 41.
    Watermarks measure progressof event time Imagine a window counting events for the hour ending at 2:00. How long should this window wait before producing its results? It should wait for a watermark with a timestamp of at least 2:00.
  • 42.
    What are watermarks for? Theymake things happen when the time is right.
  • 43.
    Why isn’t my Flinkjob producing any results? There’s probably something wrong with the watermarking.
  • 44.
    The idle stream problem ●Streams that are idle do not advance the watermark ● This prevents windows from producing results
  • 45.
    The idle stream problem ●Streams that are idle do not advance the watermark ● This prevents windows from producing results Solutions ● Balance the partitions so none are empty or idle, or ● Send keep-alive events, or ● Configure the watermarking to use idleness detection
  • 47.
  • 48.
    A checkpoint isan automatic snapshot created by Flink, for the purpose of failure recovery
  • 49.
    A checkpoint isan automatic snapshot created by Flink, for the purpose of failure recovery A savepoint is a manual snapshot created for some operational purpose (e.g., an upgrade)
  • 50.
  • 51.
    Snapshots GROUP BY color results COUNT eventsFILTER Read from source Filter Count by color Write to sink
  • 52.
    Snapshots Read from sourceFilter Count by color Write to sink Source offsets Source offsets GROUP BY color results COUNT events FILTER
  • 53.
    Snapshots Read from sourceFilter Count by color Write to sink Source offsets Source offsets GROUP BY color results COUNT events FILTER
  • 54.
    Snapshots Read from sourceFilter Count by color Write to sink Source offsets Counters Source offsets Counters GROUP BY color results COUNT events FILTER
  • 55.
    Snapshots Read from sourceFilter Count by color Write to sink Source offsets Counters Transaction ID Source offsets Counters Transaction ID GROUP BY color results COUNT events FILTER
  • 56.
    Taking a snapshot does NOT stopthe world Checkpoints and savepoints are created asynchronously, while the job continues to process events and produce results
  • 57.
    Because these are self-consistent, global snapshots ● Flinkprovides (effectively) exactly-once guarantees ● Recovery involves restarting the entire job from the most recent checkpoint Recovery
  • 59.
  • 60.
    61 Streaming Unfamiliar to many developers,but ultimately straightforward
  • 61.
    62 Streaming Unfamiliar to many developers,but ultimately straightforward Delightfully simple ● local ● key/value ● single-threaded State
  • 62.
    63 Streaming Unfamiliar to many developers,but ultimately straightforward Watermarks encapsulate something complex in one place – the sources ● how out-of-order? ● can it be idle? Delightfully simple ● local ● key/value ● single-threaded State Event time and watermarks
  • 63.
    Streaming Unfamiliar to many developers,but ultimately straightforward Watermarks encapsulate something complex in one place – the sources ● how out-of-order? ● can it be idle? Transparent to application developers State snapshots for recovery Delightfully simple ● local ● key/value ● single-threaded State Event time and watermarks
  • 66.
    Your Apache Flink® journeybegins here developer.confluent.io