Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

Apache Phoenix with Actor Model (Akka.io) for Real-time Big Data Programming Stack Why we still need SQL for Big Data ? How to make Big Data more responsive and faster ? By http://nguyentantrieu.info Tech Lead at eClick team - FPT Online

Contents 1. What is Big data and Why ? 2. When standard relational database (Oracle,MySQL, ...) is not good enough 3. Common problems in big data system 4. Introducing open-source tools in Big Data System a. Apache Phoenix for ad-hoc query b. Actor Model and Akka.io for reactive data processing

What Does Big Data Actually Mean? “Big data means data that cannot fit easily into a standard relational database.” Hal Varian- Chief Economist, Google http://www.brookings.edu/blogs/techtank/posts/2014/09/11-big-data-definition

When standard relational database (Oracle,MySQL, ...) is not good enough the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...

Complex analytic system and the “scale” pain

Definition from the crowd “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.” Jonathan Stuart Ward and Adam Barker Source: http://arxiv.org/abs/1309.5821 http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define- it/

“Chaotic” fact and the demand 80% of that data is unstructured or “chaotic” Photos, videos and social media posts - data that says so much about us - but cannot be analyzed via traditional methods Demand: “Finding order among chaos”

3 common problems in Big Data System 1. Size: the volume of the datasets is a critical factor. 2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor. 3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.

Introducing open-source tools in Big Data System Apache Phoenix as SQL ad-hoc query engine Actor Model as nano-service for reactive data computation in the dawn of “Fast data”

Some innovative tools were born in the dawn of Big Data Age

But could an elephant fly without wings ?

What is Apache Phoenix ? Apache Phoenix is a SQL skin over HBase. It means scaling Phoenix just like scale-up and scale-out the Hbase

Interesting features of Apache Phoenix ● Embedded JDBC driver implements the majority of java.sql interfaces, including the metadata APIs. ● Allows columns to be modeled as a multi-part row key or key/value cells. ● Full query support with predicate push down and optimal scan key formation. ● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns. ● Versioned schema repository. Snapshot queries use the schema that was in place when data was written. ● DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows. ● Limited transaction support through client-side batching. ● Single table only - no joins yet and secondary indexes are a work in progress. ● Follows ANSI SQL standards whenever possible ● Requires HBase v 0.94.2 or above ● 100% Java

Phoenix and SQL tool in Eclipse 4

Phoenix vs Hive (running over HDFS and HBase) http://phoenix.apache.org/performance.html

Actor Model in the dawn of “Fast data”

http://youtu.be/TnLiEWglqHk - Google I/O 2014 - The dawn of "Fast Data"

The paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale

What is actor model ? ● Carl Hewitt defined the Actor Model in 1973 as a mathematical theory that treats “Actors” as the universal primitives of concurrent digital computation. ● A fitting model for heavily-parallel processing in a cloud environment

is the framework for implementing Actor computation

Inspired by MillWheel of Google and Storm of Twitter, I have developed my own framework, the “Rfx” (Reactive Functor Extension) with Akka as core

The pipeline of finding social trends in real-time analytics

Facebook Social Trending from a website

Quick demo Using Akka (Rfx) and Apache Phoenix for Social Media Real-time Analytics

Links for self-study and research Actor Model and Programming: ● http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-with- reactive-actor-model ● http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency ● http://www.infoq.com/articles/reactive-cloud-actors ● http://www.mc2ads.com/p/rfx-for-big-data-developer.html Apache Phoenix ● http://java.dzone.com/articles/apache-phoenix-sql-driver ● http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html Big Data and Data Science ● http://www.mc2ads.com and http://www.mc2ads.org ● http://datascience101.wordpress.com ● http://lambda-architecture.net ● http://www.bigdata-startups.com ● https://www.coursera.org/course/datasci

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

More Related Content

What's hot

Viewers also liked

Similar to Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

More from Trieu Nguyen

Recently uploaded

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack