Fast Data Processing with RFX Simplify Fast Data Processing trieunt@fpt.com.vn tantrieuf31@gmail.com
Today topic : We would talk about all things in this red circle
Demo first https://github.com/rfxlab/pageview-analytics-with-rfx
Content at glance 1. BEAM✲ methodology for agile data warehouse 2. Introduction to Fast Data 3. Problem “Fast Data in web analytics” 4. Examples for fast data design pattern (RFX or Reactive Function X) 4.1. Event data actor 4.2. Event data agent 4.3. Event data collector 4.4. Event data router 4.5. Event data processor 4.6. Event data storage 4.7. Event data query 4.8. Event data reactor 5. Demo “Fast Data in web analytics” with source code explanation
1 - BEAM✲ methodology
1 - BEAM✲ methodology for Agile Data Warehouse BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses. It was developed by Lawrence Corr (@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema.
Example with BEAM✲
Goal: Modeling all business events and put into a database in agile way
2 - Fast Data
Introduction to Fast Data
3 - Problems in Practice
Problems “Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
4 - Thinking with RFX
● A design pattern to solve big fast data problems ● A collection of Open Source Tools ● The mission of RFX 1. Build data product quickly with design patterns 2. Apply BEAM✲ for agile data pipeline 3. React to critical events in near-real-time What is RFX or Reactive Function X ?
RFX framework What ? ● The Java framework, is built from open source projects: ○ Based on core Akka Actor ( http://akka.io ) ○ Lightweight DAO with Spring JDBC ( https://spring.io ) ○ Netty ( http://netty.io ) and VertX ( http://vertx.io/ ) ○ Common utils class for Apache { Kafka, Hadoop , Spark } ○ Common utils class for NoSQL ( Redis ( http://redis.io ), MongoDB ) ● a R&D project, started since 11/2013 for fast data processing Why ? ● Divide Java code into modules: ○ common infrastructure code ( rfx-stream ) ○ business logic code ( check valid data stream ) ○ machine learning code ( automation & optimization ) ● Focus on best practices and reusability ● Foundation for scalability (system and business) ● Test-driven development for Real-Time Analytics ● Continuous integration & improvement
Your business logic here
Reactive Function (X) Philosophy
Core elements of rfx-stream
Core backend modules rfx-track: ● collecting all events from log agent rfx-stream: ● processing stream data (PipelineProcessing pattern) ● processing real-time analytics ● processing business logic (by reactive function) rfx-cronjob: ● synchronizing real-time data to report database (by parsing data in Redis and update to Report database)
Core frontend modules rfx-report: ● visualizing data in real-time ● monitoring real-time event rfx-agent: ● tracking user activity: heatmap data, pageview, ... ● logging user activity to rfx-track (via network protocol: HTTP, TCP or UDP)
How to solve problems with RFX ?
Use Cases in “Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
Apply RFX into Pageview Analytics 1.1. Event data actor: a web user 1.2. Event data agent: RFX-track-js 1.3. Event data collector: RFX-track-server 1.4. Event data queue: Apache Kafka 1.5. Event data processor: RFX-stream 1.6. Event data storage: Redis, MySQL 1.7. Event data query: RFX-data-api 1.8. Event data reactor: RFX-reactor
Demo and Explanation for code and concepts
Readings ● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf ● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798 ● Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc ● https://www.tutorialspoint.com/apache_kafka/ ● https://kafka.apache.org/quickstart ● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/ ● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr ocessing-with-apache-hadoop/ ● https://www.oreilly.com/ideas/drivetrain-approach-data-products

Slide 3 Fast Data processing with kafka, rfx and redis

  • 1.
    Fast Data Processingwith RFX Simplify Fast Data Processing trieunt@fpt.com.vn tantrieuf31@gmail.com
  • 2.
    Today topic :We would talk about all things in this red circle
  • 3.
  • 4.
    Content at glance 1.BEAM✲ methodology for agile data warehouse 2. Introduction to Fast Data 3. Problem “Fast Data in web analytics” 4. Examples for fast data design pattern (RFX or Reactive Function X) 4.1. Event data actor 4.2. Event data agent 4.3. Event data collector 4.4. Event data router 4.5. Event data processor 4.6. Event data storage 4.7. Event data query 4.8. Event data reactor 5. Demo “Fast Data in web analytics” with source code explanation
  • 5.
    1 - BEAM✲methodology
  • 6.
    1 - BEAM✲methodology for Agile Data Warehouse BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses. It was developed by Lawrence Corr (@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema.
  • 7.
  • 8.
    Goal: Modeling allbusiness events and put into a database in agile way
  • 9.
  • 11.
  • 13.
    3 - Problemsin Practice
  • 14.
    Problems “Fast Data inweb analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
  • 15.
    4 - Thinkingwith RFX
  • 16.
    ● A designpattern to solve big fast data problems ● A collection of Open Source Tools ● The mission of RFX 1. Build data product quickly with design patterns 2. Apply BEAM✲ for agile data pipeline 3. React to critical events in near-real-time What is RFX or Reactive Function X ?
  • 17.
    RFX framework What ? ●The Java framework, is built from open source projects: ○ Based on core Akka Actor ( http://akka.io ) ○ Lightweight DAO with Spring JDBC ( https://spring.io ) ○ Netty ( http://netty.io ) and VertX ( http://vertx.io/ ) ○ Common utils class for Apache { Kafka, Hadoop , Spark } ○ Common utils class for NoSQL ( Redis ( http://redis.io ), MongoDB ) ● a R&D project, started since 11/2013 for fast data processing Why ? ● Divide Java code into modules: ○ common infrastructure code ( rfx-stream ) ○ business logic code ( check valid data stream ) ○ machine learning code ( automation & optimization ) ● Focus on best practices and reusability ● Foundation for scalability (system and business) ● Test-driven development for Real-Time Analytics ● Continuous integration & improvement
  • 18.
  • 19.
  • 20.
    Core elements ofrfx-stream
  • 21.
    Core backend modules rfx-track: ●collecting all events from log agent rfx-stream: ● processing stream data (PipelineProcessing pattern) ● processing real-time analytics ● processing business logic (by reactive function) rfx-cronjob: ● synchronizing real-time data to report database (by parsing data in Redis and update to Report database)
  • 22.
    Core frontend modules rfx-report: ●visualizing data in real-time ● monitoring real-time event rfx-agent: ● tracking user activity: heatmap data, pageview, ... ● logging user activity to rfx-track (via network protocol: HTTP, TCP or UDP)
  • 23.
    How to solveproblems with RFX ?
  • 24.
    Use Cases in“Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
  • 25.
    Apply RFX intoPageview Analytics 1.1. Event data actor: a web user 1.2. Event data agent: RFX-track-js 1.3. Event data collector: RFX-track-server 1.4. Event data queue: Apache Kafka 1.5. Event data processor: RFX-stream 1.6. Event data storage: Redis, MySQL 1.7. Event data query: RFX-data-api 1.8. Event data reactor: RFX-reactor
  • 26.
    Demo and Explanationfor code and concepts
  • 27.
    Readings ● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf ● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798 ●Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc ● https://www.tutorialspoint.com/apache_kafka/ ● https://kafka.apache.org/quickstart ● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/ ● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr ocessing-with-apache-hadoop/ ● https://www.oreilly.com/ideas/drivetrain-approach-data-products