NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World Ajaykumar Gupte IBM 1

2 Agenda Basic overview of JSON data management. Overview of IBM in-memory accelerator Performance using in-memory accelerator with JSON data

3 Explosion of mobile devices – gaming and social apps Advertising: serving ads and real-time bidding Social networking, online communities E-commerce, social commerce Machine data and real-time operational decisions Smart Devices Internet of Data Internet of Things SQL SQL, {JSON}, Spatial {JSON}, TimeSeries SQL, {JSON} Simple, {JSON}, Timeseries SQL, {JSON}

4  Schema-less app development lowers costs – Simplicity and agility to develop apps quickly – Inter operate with modern applications, especially mobile – Applications address schema migrations – Sometimes referred to a “flexible schema” development  While fixed schema is required for traditional relational – Transaction volumes growing – Operational data, analytics, static records – Creates greater value through “360 degree” view of the business Demand is growing in both areas These approaches are complementary NoSQL Compared to “Traditional” DBMS

5 Real Time Analytics • Customer Issues – Several different models of data (SQL, NoSQL, TimeSeries/Sensor) – NoSQL is not strong building relations between collections – Most valuable analytics combine the results of all data models – Most prominent analytic system written using standard SQL

6 NoSQL SQL Relational Fixed schema Joins, DML Optimizer Transactions(multi-stmt) Analytics Replication Triggers, Stored Procedures Document store Schema less Indexing Aggregation Sharding Replication Map/reduce Scalability Emerging Trends

7 IBM Informix database 12.1 Relational, Embeddable Real-time Analytics Fast, Always-on Transactions NoSQL capability Multi-Tenancy Sensor data management High availability Easy to Use

8 Tier 2 Smart Gateway / Aggregator Tier 3 Sensor Operational Analytics Zone Tier 1 Devices/Sensors AnalyticsTier 4 Deep Analytics Zone Simple Customer Portal REST (https) Time Series REST (https) MAC Addr. MAC Addr. MAC Addr. MAC Addr. MAC Addr. MAC Addr. Informix TS IWA Streams Logic dist. engine Cloudant MessageSight Lightweight Analytics Context The IoT Architecture IBM has opportunity in multiple Tiers of IoT

9 • Relational and non-relational data in one system • NoSQL/MongoDB Apps can access Informix Relational Tables • Distributed Queries • Multi-statement Transactions • Enterprise Proven Reliability • Enterprise Scalability • Enterprise Level Availability Informix provides the capability to leverage the abilities of both relational DBMS and document store systems. The Hybrid Solution Informix has the Best of Both Worlds

10 Informix 12.1 & MongoDB Clients • New Wire Protocol Listener supports existing MongoDB drivers • Connect to MongoDB or Informix with same application! MongoDB native Client MongoDB web browser Mobile Applications MongoDB Wire Protocol Informix 12.1 MongoDB driver

11 Informix JSON Store Benefits • Row locking on the individual JSON document • Large documents, up to 2GB maximum size • Ability to compress documents • Ability to intelligently cache commonly used documents • Use existing storage options and management tools.

12 Two New Data Types JSON and BSON • Native JSON and BSON data types • Index support for NoSQL data types • Native operators and comparator functions allow for direct manipulation of the BSON data type • Database Server seamlessly converts to and from • JSON BSON • Character data JSON

13 Informix: All Together Now! SQL Tables JSON Collections TimeSeries MQ Series SQL APIs JDBC, ODBC Informix IWA – BLU ACCELERATION GENBSON: SQL to {BSON} MongoDB Drivers TEXT SEARCH SPATIAL TIME SERIES {BSON} REST API

14 RelationalTable JSON Timeseries SpatialText SQL API StandardODBC, JDBC,.NET, OData,etc.Language SQL. Direct SQLAccess.Dynamic ViewsRow types StandardSQL/extJDBC/ODBCJSON Support Standard SQLJDBC/ODBCJSON Support Mongo API(NoSQL) Mongo APIs forJava, Javascript, C++, C#,... Mongo APIs forJava,Javascript, C++, C#,... Virtual TableJSON support JSON Support Hybrid Access:SQL, JSON, Timeseries & Spatial

15 Benefits of Hybrid Power Access consistent data from its source Avoid ETL, continuous data sync and conflicts. Exploit the power of SQL, MongoAPI seamlessly Exploit the power of RDBMS technologies in MongoAPI: – Informix Warehouse accelerator, – Cost based Optimizer & power of SQL – R-tree indices for spatial, Lucene text indexes, and more. Access all your data thru any interface: MongoAPI & SQL Store data in one place and efficiently transform and use them on demand. Existing SQL based tools and APIs can access new data in JSON

16 How to Convert Relational Data as JSON Documents • Relational data can be treated as structured JSON documents; column name-value becomes key-value pair. • select partner, pnum, country from partners; partner pnum Country Acme 1748 Australia Vernco 1746 USA Baker 1472 Spain Contrex 1742 France {parnter: “Acme”, pnum:1748, Country: “Australia”} {parnter: “Vernco”, pnum:1746, Country: “USA”} {parnter: “Backer”, pnum:1472, Country: “Spain”} {parnter: “Contrex”, pnum:1742, Country: “France”} • GENBSON function Method for transforming existing SQL data into a JSON or BSON document store format • select GENBSON( partners )::JSON from partners; 16

17 • Supports B-Tree indexes on any key-value pairs. • Typed indices could be on simple basic type (int, decimal,) • Type-less indices could be created on BSON and use BSON type comparison • Informix translates ensureIndex() to CREATE INDEX • Informix translates dropIndex() to DROP INDEX Indexing Mongo Operation SQL Operation db.customers.ensureIndex( {orderDate:1, zip:-1}) CREATE INDEX IF NOT EXISTS v_customer_2 ONcustomer (bson_get(data,‘orderDate') ASC,bson_get(data,‘zip') DESC) USING BSON CREATE UNIQUE INDEX IF NOT EXISTS v_customer_3 ON customer (bson_get(data,'orderDate')ASC) USING BSON

Flexible Grid + Sharding Informix Shard 1 Informix/1 Secondary Disk or Diskless Informix/1 Secondary Disk or Diskless Informix Secondary 1 Disk or Diskless Informix Shard 2 Informix/1 Secondary Disk or Diskless Informix/1 Secondary Disk or Diskless Informix Secondary 2 Disk or Diskless Informix Shard 3 Informix/1 Secondary Disk or Diskless Informix/1 Secondary Disk or Diskless Informix Secondary 3 Disk or Diskless Informix Shard 4 Informix/1 Secondary Disk or Diskless Informix/1 Secondary Disk or Diskless Informix Secondary 4 Disk or Diskless Informix NoSQL Cluster Architecture Overview  Scaling in both directions  Shard Disk Secondary  Secondary server(s) provide HA and scaling  Allow write on Secondary

19 Mongo Application IBM Wire Listener IDXs Logs Enterprise replication + Flexible Grid + Sharding Distributed Queries Informix Dynamic Server Tables Tables IDXs Relational Tables JSON Collections SELECT bson_new(bson, ‘{}’) FROM customer WHERE data.state::varchar(128) =“MO” db.customer.find({state:”MO”}) db.partners.find({state:”CA”}) SELECT * FROM partners WHERE state=“CA” Customer partners JSON JSON Access RelationalAccess JSON MongoAPI Accessing Both NoSQL and Relational Tables

20 MongoAPI Accessing Both NoSQL and Relational Tables • Typically NoSQL does not involve transactions – In many cases, a document update is atomic, but not the application statement (Example :7 targeted for deletion, but only 4 are removed ) • Informix-NoSQL provides transactions on all application statements – Each server operation INSERT, UPDATE, DELETE, SELECT will automatically be committed after each operation. • Default isolation level is DIRTY READ • All standard isolation level support • $sql operator – execute SQL commands within Informix database db.getCollection("system.sql").find({ "$sql": "select c.customer_num, p.customer_num as p_cust from customer c left outer join partners p on c.customer_num = p.customer_num order by 1" })

21 IBM Informix Warehouse Accelerator

22 IBM Informix Warehouse Accelerator (IWA) Results Analytic query Linux on Intel / AMD 64-bit TCP/IP Query Optimizer In-Memory Compressed Columnar Database Partition Bulk Loader Query Processor Yes Analytic query Results Accelerate Query? Most Unix/Linux 64-bit platforms In-Disk [Compressed] Relational / Row-based Database Informix database server Informix Warehouse Accelerator No POWERFUL HYBRID DATABASE PLATFORMPOWERFUL HYBRID DATABASE PLATFORM Extreme Performance Transactions Extreme Performance Analytics

23 You can use IWA's In-Memory Analytics to Speed Up queries on… Local or remote views HA Clusters

IWA Overview and Seamless Integration with Informix/IDS • Before IWA…  Informix  Receives analytic query from client  Spends some time doing intensive I/O  Returns results back to the client Informix 12.1 Results SQL

Informix/IWA Setup and Workflow • Using IWA: Process is transparent to Informix client Results SQL  Informix  Receives analytic query from client  If query uses data matching an IWA datamart and can be accelerated, route/offload it to IWA  Returns results back to the client  If query is not based on an IWA datamart or cannot be accelerated, Informix will resolve it Informix 12.1  The Accelerator  Processes the routed SQL query extremely fast and returns answer back to Informix Linux on Intel/AMD 64-bit Bulk Loader Compressed Database Partition TCP/IP Compression In-Memory Columnar Storage Frequency Partitioning Parallelism Predicate evaluation on compressed data Multi-core and Vector optimized algorithms SIMD Query Router Query Processor

26 Informix Dynamic Server Tables Tables Relational Tables and views JSON Collections {Customer} partners SQL & BI Applications {Orders} Inventory Tables Timeseries Tables {Orders} Text index (BTS) spatial indices Informix Warehouse Accelerator (In-Memory Query Engine) ODBC, JDBC connections SQL Apps/Tools MongoDB Drivers NoSQL Apps/Tools

27 Informix Database Server Informix warehouse Accelerator BI Applications Step 1. Install, configure, start Informix Step 2. Install, configure, start Accelerator Step 3. Connect Studio to Informix & add accelerator Step 4. Design, validate, Deploy Data mart Step 5. Load data to accelerator Ready for Queries IBM Smart Analytics Studio Step 1 Step 2 Step 3 Step 4 Step 5 Ready IWA – BLU ACCELERATION

28  All NOSQL Marts (all views based on JSON collections )  Hybrid Marts (subset of views based on JSON collections)  TPCDS 10GB workload  web_returns fact table along with 13 dimension tables  Total Memory on the machine: 250GB  Total nodes: 5  Coordinator nodes: 1  Coordinator node Memory: 20000 MB  Worker nodes : 4  Worker nodes memory : 100000 MB  IWA DRDA Interface: eth1 (IWA running on a different machine than IDS server) JSON Data Acceleration

29  Using genbson to create web_returns, we can literally accomplish this with just two statements and with good performance create table json_web_returns_coll(c1 serial, c2 bson); insert into json_web_returns_coll select 0, genbson( web_returns_ext ) from web_returns_ext; 719964 row(s) inserted. create table json_customer_address_coll(c1 serial, c2 bson); insert into json_customer_address_coll select 0, genbson( customer_address_ext ) from customer_address_ext; 250000 row(s) inserted. create table json_date_dim_coll(c1 serial, c2 bson); insert into json_date_dim_coll select 0, genbson( date_dim_ext ) from date_dim_ext; 73049 row(s) inserted. create table json_time_dim_coll(c1 serial, c2 bson); insert into json_time_dim_coll select 0, genbson( time_dim_ext ) from time_dim_ext; 86400 row(s) inserted. JSON Data Acceleration

30 Run time comparision of queries on JSON Collection with IDS and IWA Queries against TPCDS (web_returns) SQL tables JSON Collections (IDS) JSON Collections (IWA) Query_1.sql 5m 8.09s 0m 1.28s Query_2.sql 1m 12.49s 0m 0.92s Query_3.sql 0m 3.80s 0m 0.14s Query_4.sql 0m 47.40s 0m 0.59s Query_5.sql 3m 59.92s 0m 2.50s Query_6.sql 0m 24.80s 0m 0.34s Query_7.sql 5m 55.08s 0m 3.96s Query_8.sql 8m 49.87s 0m 1.48s Query_9.sql 5m 20.83s 0m 0.67s Query_10.sql 18m 1.37s 0m 6.77s Query_11.sql 15m 8.29s 0m 3.97s Query_12.sql 0m 21.51s 0m 0.25s

31 Sqexplain - Query_1 Select {+FACT(web_returns)} first 100 wr_returning_customer_sk as ctr_customer_sk ,ca_state as ctr_state,sum(wr_return_amt) as ctr_total_return from web_returns ,date_dim ,customer_address where wr_returned_date_sk = d_date_sk and d_year =1999 and wr_returning_addr_sk = ca_address_sk group by wr_returning_customer_sk ,ca_state order by 1,2,3 …... DYNAMIC HASH JOIN Dynamic Hash Filters: BSON_VALUE_INT (informix.json_web_returns_coll.data , 'wr_returned_date_sk' ) = BSON_VALUE_INT (informix.json_date_dim_coll.data , 'd_date_sk' ) 3) informix.json_customer_address_coll: SEQUENTIAL SCAN DYNAMIC HASH JOIN Dynamic Hash Filters: BSON_VALUE_INT (informix.json_web_returns_coll.data , 'wr_returning_addr_sk' ) = BSON_VALUE_INT (informix.json_customer_address_coll.data , 'ca_address_sk' )

32 Create demo database from the mongo shell $ mongo demo_database MongoDB shell version: 2.4.9 connecting to: demo_database mongos> Create / Load the "comments" fact table mongos> db.comments.insert( [ { uid:12345, pid:444, comment:"first" }, { uid:12345, pid:888, comment:"second" }, { uid:99999, pid:444, comment:"third" } ] ) Create /Load the "users" dimension table mongos> db.users.insert( [ { uid:12345, name:"john" }, { uid:99999, name:"mia" } ] ) JSON Data Acceleration

33 demo_database – create view vcomments(uid,pid,comment) AS SELECT data.uid::INT, data.pid::INT, data.comment::VARCHAR(128) from comments; create view vusers(uid,name) AS SELECT data.uid::int, data.name::varchar(128) from users; Create the SQL Views & analyze workload

34 set environment use_dwa 'probe cleanup'; set environment use_dwa 'probe start'; select {+ avoid_execute} * from vcomments c,vusers u where c.uid=u.uid; set environment use_dwa 'probe stop'; execute procedure ifx_probe2mart('demo_database','noSQL_mart'); execute function ifx_createmart('demo_dwa','noSQL_mart'); execute function ifx_loadmart('demo_dwa','noSQL_mart','NONE'); Create the SQL Views & analyze workload

35 set environment use_dwa 'accelerate on'; select c.uid,name,comment from vcomments c,vusers u where c.uid=u.uid and pid=444; uid 12345 name john comment first uid 99999 name mia comment third Deploy NoSQL data mart & Issue queries

NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

More Related Content

What's hot

Viewers also liked

Similar to NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

More from Ajay Gupte

Recently uploaded

NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

Editor's Notes