Real-time Stream Processing using Apache Apex Bhupesh Chawda bhupesh@apache.org DataTorrent Software
Apache Apex - Stream Processing ● YARN - Native - Uses Hadoop YARN framework for resource negotiation ● Highly Scalable - Scales statically as well as dynamically ● Highly Performant - Can reach single digit millisecond end-to-end latency ● Fault Tolerant - Automatically recovers from failures - without manual intervention ● Stateful - Guarantees that no state will be lost ● Easily Operable - Exposes an easy API for developing Operators (part of an application) and Applications
Project History ● Project development started in 2012 at DataTorrent ● Open-sourced in July 2015 ● Apache Apex started incubation in August 2015 ● 50+ committers from Apple, GE, Capital One, DirecTV, Silver Spring Networks, Barclays, Ampool and DataTorrent ● Mentors from Class Software, MapR and Hortonworks ● Soon to be a top level Apache project
Apex Platform Overview
An Apex Application is a DAG (Directed Acyclic Graph) ● A DAG is composed of vertices (Operators) and edges (Streams). ● A Stream is a sequence of data tuples which connects operators at end-points called Ports ● An Operator takes one or more input streams, performs computations & emits one or more output streams ● Each operator is USER’s business logic, or built-in operator from our open source library ● Operator may have multiple instances that run in parallel
Hadoop 1.0 vs 2.0 - YARN
Apex as a YARN Application ● YARN (Hadoop 2.0) replaces MapReduce with a more generic Resource Management Framework. ● Apex uses YARN for resource management and HDFS for storing any persistent storage
Support for Windowing ● Apex splits incoming tuples into finite time slices - Streaming Windows ○ Transparent to the user ○ Apex Default = 500 ms ● Checkpointing and book-keeping done at Streaming window boundary ● Applications may need to perform computations in windows - Application Windows ○ Specified as a multiple of Streaming Window size ○ Call backs to user operator logic ■ beginWindow(long windowId) ■ endWindow() ○ Example - An application which identifies some aggregates and emits them every minute. Here application window size = 60 secs = 30 Streaming Windows ● Sliding and Tumbling Application windows are supported natively
Buffer Server ● Staging area for outgoing tuples ● Downstream operators connect to upstream Buffer Server to subscribe for tuples ● Plays a role in recovery by replaying data to the downstream operator from a particular checkpoint ● Spooling to disk is also supported
Fault Tolerance - Checkpointing ● During checkpointing all operator state is written to HDFS asynchronously ● This is decentralized and happens independently for each operator ● If all operators in the DAG have checkpointed a particular window, then that window is said to be committed and all previous checkpoints are purged O1 O2 O3 O4 3 3 3 2Checkpoint # ---> Committed Window # = 120180 180 180 120 Checkpoint Window # ---> Committed Checkpoint # = 2 Checkpoint Window = 60 Streaming Windows
Recovery ● Apex Application Master detects the failure of an operator based on the missing heart beats from the operators or if windows are not progressing ● All downstream operators from the failed operator are restarted from the last committed checkpoint to recover from their states. ● Data is replayed from the same checkpoint by the Buffer Server ● Recovery is automatic and does not require manual intervention.
Scalability - Partitioning ● Operators can be “replicated” (partitioned) into multiple instances to cope up with high speed input streams. ● Can be specified at Application launch time ● User can control the distribution of tuples to downstream partitions. ● Automatic Unifier to unify the tuples
Scalability - Dynamic scaling ● Auto scaling is also supported. Number of partitions may automatically increase or decrease based on the incoming load. Can be customized by the user ● User has to define the trigger for auto scaling: ○ Example - Increase partitions if latency goes above 100 ms.
Apex Processing Semantics ● AT_LEAST_ONCE (default): Windows are processed at least once ● AT_MOST_ONCE: Windows are processed at most once ○ During recovery, all downstream operators are fast-forwarded to the window of latest checkpoint ● EXACTLY_ONCE: Windows are processed exactly once ○ Checkpoint every window ○ Checkpointing becomes blocking
Apex Guarantees ● Apex guarantees No loss of data and computational state - Checkpointed periodically ● Automatic recovery ensures that processing resumes from where it left off ● Order of incoming data is guaranteed to be maintained ○ Not applicable in case of partitioning of operators ● Events in a window are always replayed in the same window in case of failures
Application Specification 1. Add Operators 2. Add Streams
Logical and Physical DAGs
Apex Malhar Library
1. Performance requirements a. A system which can provide a very very low latency for decision making (40 ms) b. Ability to handle large volumes of data and ever changing rules (1,000 events per 20 ms burst) c. 99.5% uptime. Which is about 1.5 days downtime in an year ➔ Apex achieved: ◆ 2 ms latency against the requirement of 40ms ◆ Was able to handle 2,000 events burst against requirement of 1,000 events burst at a net rate of 70,000 events/s. ◆ 99.9995% uptime against requirement of and 99.5% uptime and 2. Relevant Roadmap 3. Enterprise grade 4. Have a healthy and diverse community and committers, i.e. not controlled by one vendor Talk Slides: http://www.slideshare.net/ilganeli/nextgen-decision-making-in-under-2ms DataTorrent Blog: https://www.datatorrent.com/blog/next-gen-decision-making-in-under- 2-milliseconds/ Decision Making in < 2ms
Decision making in < 2ms contd.. ● Comparison finally boiled down to ○ Apache Storm ○ Apache Flink ○ Apache Apex ● Some problems in Storm and Flink among others ○ Nimbus is a single point of failure ○ Bolts / Spouts / Operators share a JVM. Hard to debug ○ No dynamic topologies ○ Restarting entire topologies in case of failures
Resources ● Mailing List ○ Developers dev@apex.incubator.apache.org ○ Users users@apex.incubator.apache.org ● Apache Apex http://apex.apache.org/ ● Github ○ Apex Core: http://github.com/apache/incubator-apex-core ○ Apex Malhar: http://github.com/apache/incubator-apex-malhar ● DataTorrent: http://www.datatorrent.com ● Twitter @ApacheApex Follow - https://twitter.com/apacheapex ● Facebook https://www.facebook.com/ApacheApex/ ● Meetup http://www.meetup.com/topics/apache-apex ● Startup Program Free Enterprise License for Startups, Universities, Non-Profits
Thank you! Please send your questions to bhupesh@apache.org

Real-time Stream Processing using Apache Apex

  • 1.
    Real-time Stream Processing using ApacheApex Bhupesh Chawda bhupesh@apache.org DataTorrent Software
  • 2.
    Apache Apex -Stream Processing ● YARN - Native - Uses Hadoop YARN framework for resource negotiation ● Highly Scalable - Scales statically as well as dynamically ● Highly Performant - Can reach single digit millisecond end-to-end latency ● Fault Tolerant - Automatically recovers from failures - without manual intervention ● Stateful - Guarantees that no state will be lost ● Easily Operable - Exposes an easy API for developing Operators (part of an application) and Applications
  • 3.
    Project History ● Projectdevelopment started in 2012 at DataTorrent ● Open-sourced in July 2015 ● Apache Apex started incubation in August 2015 ● 50+ committers from Apple, GE, Capital One, DirecTV, Silver Spring Networks, Barclays, Ampool and DataTorrent ● Mentors from Class Software, MapR and Hortonworks ● Soon to be a top level Apache project
  • 4.
  • 5.
    An Apex Applicationis a DAG (Directed Acyclic Graph) ● A DAG is composed of vertices (Operators) and edges (Streams). ● A Stream is a sequence of data tuples which connects operators at end-points called Ports ● An Operator takes one or more input streams, performs computations & emits one or more output streams ● Each operator is USER’s business logic, or built-in operator from our open source library ● Operator may have multiple instances that run in parallel
  • 6.
    Hadoop 1.0 vs2.0 - YARN
  • 7.
    Apex as aYARN Application ● YARN (Hadoop 2.0) replaces MapReduce with a more generic Resource Management Framework. ● Apex uses YARN for resource management and HDFS for storing any persistent storage
  • 8.
    Support for Windowing ●Apex splits incoming tuples into finite time slices - Streaming Windows ○ Transparent to the user ○ Apex Default = 500 ms ● Checkpointing and book-keeping done at Streaming window boundary ● Applications may need to perform computations in windows - Application Windows ○ Specified as a multiple of Streaming Window size ○ Call backs to user operator logic ■ beginWindow(long windowId) ■ endWindow() ○ Example - An application which identifies some aggregates and emits them every minute. Here application window size = 60 secs = 30 Streaming Windows ● Sliding and Tumbling Application windows are supported natively
  • 9.
    Buffer Server ● Stagingarea for outgoing tuples ● Downstream operators connect to upstream Buffer Server to subscribe for tuples ● Plays a role in recovery by replaying data to the downstream operator from a particular checkpoint ● Spooling to disk is also supported
  • 10.
    Fault Tolerance -Checkpointing ● During checkpointing all operator state is written to HDFS asynchronously ● This is decentralized and happens independently for each operator ● If all operators in the DAG have checkpointed a particular window, then that window is said to be committed and all previous checkpoints are purged O1 O2 O3 O4 3 3 3 2Checkpoint # ---> Committed Window # = 120180 180 180 120 Checkpoint Window # ---> Committed Checkpoint # = 2 Checkpoint Window = 60 Streaming Windows
  • 11.
    Recovery ● Apex ApplicationMaster detects the failure of an operator based on the missing heart beats from the operators or if windows are not progressing ● All downstream operators from the failed operator are restarted from the last committed checkpoint to recover from their states. ● Data is replayed from the same checkpoint by the Buffer Server ● Recovery is automatic and does not require manual intervention.
  • 12.
    Scalability - Partitioning ●Operators can be “replicated” (partitioned) into multiple instances to cope up with high speed input streams. ● Can be specified at Application launch time ● User can control the distribution of tuples to downstream partitions. ● Automatic Unifier to unify the tuples
  • 13.
    Scalability - Dynamicscaling ● Auto scaling is also supported. Number of partitions may automatically increase or decrease based on the incoming load. Can be customized by the user ● User has to define the trigger for auto scaling: ○ Example - Increase partitions if latency goes above 100 ms.
  • 14.
    Apex Processing Semantics ●AT_LEAST_ONCE (default): Windows are processed at least once ● AT_MOST_ONCE: Windows are processed at most once ○ During recovery, all downstream operators are fast-forwarded to the window of latest checkpoint ● EXACTLY_ONCE: Windows are processed exactly once ○ Checkpoint every window ○ Checkpointing becomes blocking
  • 15.
    Apex Guarantees ● Apexguarantees No loss of data and computational state - Checkpointed periodically ● Automatic recovery ensures that processing resumes from where it left off ● Order of incoming data is guaranteed to be maintained ○ Not applicable in case of partitioning of operators ● Events in a window are always replayed in the same window in case of failures
  • 16.
    Application Specification 1. AddOperators 2. Add Streams
  • 17.
  • 18.
  • 19.
    1. Performance requirements a.A system which can provide a very very low latency for decision making (40 ms) b. Ability to handle large volumes of data and ever changing rules (1,000 events per 20 ms burst) c. 99.5% uptime. Which is about 1.5 days downtime in an year ➔ Apex achieved: ◆ 2 ms latency against the requirement of 40ms ◆ Was able to handle 2,000 events burst against requirement of 1,000 events burst at a net rate of 70,000 events/s. ◆ 99.9995% uptime against requirement of and 99.5% uptime and 2. Relevant Roadmap 3. Enterprise grade 4. Have a healthy and diverse community and committers, i.e. not controlled by one vendor Talk Slides: http://www.slideshare.net/ilganeli/nextgen-decision-making-in-under-2ms DataTorrent Blog: https://www.datatorrent.com/blog/next-gen-decision-making-in-under- 2-milliseconds/ Decision Making in < 2ms
  • 20.
    Decision making in< 2ms contd.. ● Comparison finally boiled down to ○ Apache Storm ○ Apache Flink ○ Apache Apex ● Some problems in Storm and Flink among others ○ Nimbus is a single point of failure ○ Bolts / Spouts / Operators share a JVM. Hard to debug ○ No dynamic topologies ○ Restarting entire topologies in case of failures
  • 21.
    Resources ● Mailing List ○Developers dev@apex.incubator.apache.org ○ Users users@apex.incubator.apache.org ● Apache Apex http://apex.apache.org/ ● Github ○ Apex Core: http://github.com/apache/incubator-apex-core ○ Apex Malhar: http://github.com/apache/incubator-apex-malhar ● DataTorrent: http://www.datatorrent.com ● Twitter @ApacheApex Follow - https://twitter.com/apacheapex ● Facebook https://www.facebook.com/ApacheApex/ ● Meetup http://www.meetup.com/topics/apache-apex ● Startup Program Free Enterprise License for Startups, Universities, Non-Profits
  • 22.
    Thank you! Please sendyour questions to bhupesh@apache.org