Yogi Devendra yogidevendra@apache.org Building your first Apache Apex Application
● Key concepts: DAG, Operators, Ports ● APIs for defining Applications, Operators ● “Word Count” example DAG ● Building Apache Apex from source code ● Creating a sample application ● Demo ● Questions Outline
● An Application is defined as Directed Acyclic Graph : DAG ● Vertices of the DAG are computational units : Operators ● Edges of the DAG are data tuples in-motion : Streams ● Operator end-points for input , output : Ports ● An Operator takes one or more input streams, performs computations & emits one or more output streams ○ Each operator is USER’s business logic, or built-in operator from our open source library ○ Operator may have multiple instances that run in parallel Application as a DAG
Typical application example
● MyApplication implements StreamingApplication ○ Provide implementation for populateDAG ○ Stitch the DAG ● SampleOperator extends BaseOperator ○ Define input ports, output ports ○ Define process methods ○ Optional : Define beginWindow, endWindow, setup, teardown APIs : Application, Operator
Operator workflow
● Data at Rest - Count occurrences of words in a file ● Data in Motion - Emit counts at the end of the window ● Another variation - Emit cumulative counts at the end of every window. Sample application Apex Application DAGHDFS LOGS Lines Counts
Defining DAG Reader Parser Counter Output Input Operator (Adapter) Output Operator (Adapter) Generic Operators HDFS LOGS
• Java : 1.7.x • mvn : 3.0 + • git : 1.7 + • Apache hadoop : How to : Single node cluster • Apache Apex Core • git clone git@github.com:apache/apex-core.git • cd apex-core/ • git checkout master • mvn clean install -DskipTests • Apache Apex Malhar • git clone git@github.com:apache/apex-malhar.git • cd apex-malhar/ • git checkout master • mvn clean install -DskipTests • DataTorrent RTS community edition Building Apache Apex
10 Questions Image ref [2]
● Apache Apex website - http://apex.apache.org/ ● Subscribe - http://apex.apache.org/community.html ● Download - http://apex.apache.org/downloads.html ● Youtube : subscribe DataTorrent ● Meetup - http://www.meetup.com/topics/apache-apex ● Twitter : follow @ApacheApex ● Startup Program – Free Enterprise License for Startups, Educational Institutions, Non-Profits Resources 11
12

Building Your First Apache Apex Application

  • 1.
  • 2.
    ● Key concepts:DAG, Operators, Ports ● APIs for defining Applications, Operators ● “Word Count” example DAG ● Building Apache Apex from source code ● Creating a sample application ● Demo ● Questions Outline
  • 3.
    ● An Applicationis defined as Directed Acyclic Graph : DAG ● Vertices of the DAG are computational units : Operators ● Edges of the DAG are data tuples in-motion : Streams ● Operator end-points for input , output : Ports ● An Operator takes one or more input streams, performs computations & emits one or more output streams ○ Each operator is USER’s business logic, or built-in operator from our open source library ○ Operator may have multiple instances that run in parallel Application as a DAG
  • 4.
  • 5.
    ● MyApplication implementsStreamingApplication ○ Provide implementation for populateDAG ○ Stitch the DAG ● SampleOperator extends BaseOperator ○ Define input ports, output ports ○ Define process methods ○ Optional : Define beginWindow, endWindow, setup, teardown APIs : Application, Operator
  • 6.
  • 7.
    ● Data atRest - Count occurrences of words in a file ● Data in Motion - Emit counts at the end of the window ● Another variation - Emit cumulative counts at the end of every window. Sample application Apex Application DAGHDFS LOGS Lines Counts
  • 8.
    Defining DAG Reader ParserCounter Output Input Operator (Adapter) Output Operator (Adapter) Generic Operators HDFS LOGS
  • 9.
    • Java :1.7.x • mvn : 3.0 + • git : 1.7 + • Apache hadoop : How to : Single node cluster • Apache Apex Core • git clone git@github.com:apache/apex-core.git • cd apex-core/ • git checkout master • mvn clean install -DskipTests • Apache Apex Malhar • git clone git@github.com:apache/apex-malhar.git • cd apex-malhar/ • git checkout master • mvn clean install -DskipTests • DataTorrent RTS community edition Building Apache Apex
  • 10.
  • 11.
    ● Apache Apexwebsite - http://apex.apache.org/ ● Subscribe - http://apex.apache.org/community.html ● Download - http://apex.apache.org/downloads.html ● Youtube : subscribe DataTorrent ● Meetup - http://www.meetup.com/topics/apache-apex ● Twitter : follow @ApacheApex ● Startup Program – Free Enterprise License for Startups, Educational Institutions, Non-Profits Resources 11
  • 12.