DEMYSTIFYING IN-MEMORY DATA GRID, IN- MEMORY DATA FABRIC AND NOSQL DB PRADEEP NAIK See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
SPEAKER INTRODUCTION Pradeep Naik, Principal Consultant, CTO Office, Wipro Technologies  Responsible for incubating next generation technology @ Wipro  20+ years of experience in Database Management  Cross-domain IT Solution Architecting and Managing database solutions in Telecom, Healthcare and Financial Industries
ABOUT WIPRO ** Global leader in the Software & Services category, Member of Dow Jones Sustainability World Indices – 5th year in a row Ranked 8th in the Best Companies for Leaders 2015 list in a study conducted by Chally Group Honored as a World’s Most Ethical Companies by Ethisphere Institute for the Fourth Successive Year, 2015 Attracts the Best Talent Sustained Growth Partner to Industry Leaders Global Leaders Global Presence $7.6 Bn 1071* No.1** 100+161,789* Revenue in FY 2014-15 (IT Services – $ 7.06 Bn, IT Products – 0.55 Bn) Active Global Clients Workforce Leader in Software & Services industry category Serving clients in 100+ countries *Figures based on Q1 results 2015–16 for Global IT Services business
IN-MEMORY TECHNOLOGY OVERVIEW, CONCEPTS AND KEY DIFFERENCES
WHAT BUSINESS DEMANDS?  High Throughput  Low Latency Big Data Use case High Throughput Low Latency Data Ingestion Data Storage Data Processing Data Access Data Analytics Data Visualization
TRADE OFF BETWEEN HIGH THROUGHPUT AND LOW LATENCY High Throughput Low Latency In-memory computing Localized Processing Parallel Computing Stream Processing Parallel Computing Localized processing Eventual Consistency Auto scaling
IN-MEMORY WORLD  In-Memory Database (IMDB)  NoSQL  In-Memory Data Grid (IMDG)  In-Memory Data Fabric (IMDF) Operation al Analytical HTAP
IN-MEMORY DATABASE (IMDB) Architecture of In-Memory Database  Good ANSI SQL Support  Strong support for ACID transactions  Lack of co-location processing  Based on vertically scalable Symmetrical Processing Architecture  Does not support distributed computing  Minimal application changes for upgrading to IMDB  Cannot work directly with domain objects. Users need to perform Object-To-Relational Mapping which typically adds significant performance overhead  Unit of movement is Data and not the Process Source: Oracle Times Ten
NOSQL Architecture of NoSQL  A distributed data storage with in-memory option  Most commonly used for high throughput requirements  Read latency in the range of millisecond to seconds  Low latency data access is achieved by caching the table/document in memory  Data is always stored in disk and can be configured to cache in memory  Achieves high availability via replication mechanism  Tunable consistency (Eventual and Immediate)  Limitation on size of table that can be cached NoSQL Cluster
IN-MEMORY DATA GRID (IMDG) Architecture of In-Memory Data Grid  IMDG is a data structure that completely resides in memory and distributed across multiple severs  Fault tolerance – uses master-master or master-slave topology  Varied variety of data structures supported to map domain objects  Distributed computing - Collocate processing to cluster node where data is cached  Distributed Concurrency- Supports distributed transaction locking  Persistence- Supports seamless synchronous read through, write through or asynchronous write-behind to other data sources  Support applications with low latency requirements Database server Application Servers Node1 Memory from each servers K1,V 1 K2,V 2 K3,V 3 K4,V 4 Node2 Node3 Node4 In-Memory Data Grid Scale horizontally Data Write-through or Write behind Read-through persistence
IN-MEMORY DATA FABRIC (IMDF) Application Servers Data/ Task/Query Data Grid Compute Grid Streaming CEP Map Reduce Hadoop Accl. In-Memory Data Fabric Distributed Cluster Messaging File SystemNoSQL/RDBMS  IMDF is a comprehensive in-memory data platform that includes data grid, clustering, compute grid, Complex Event Processing and real-time streaming  It is a superset of IMDG  Supports standard SQL for querying in-memory data including support for distributed SQL joins.  Distribute computations and data processing across multiple computers in a cluster in order to gain high performance and low latency  Support multiple execution paths for same events executing in parallel on one or more nodes  Works on underlying concept of MPP architecture  Converged platform to support multiple use cases Architecture of In-Memory Data Fabric
TECHNOLOGY EVALUATION CRITERIA  High Throughput vs. Low Latency  High Availability  Scalability (Vertical vs. Horizontal)  Distributed disk based data storage vs. distributed in-memory storage  Co-location of data processing  Distributed transactional ACID support  Eventual Consistency vs. Strong Consistency  Application change impacts  Reuse existing database technology stack vs. migrate to new databases?  Platform support for in-memory computation and storage  Support for flexible Data Structure
SUMMARIZING THE DIFFERENCES Strong ACID Support Structured Data ANSI SQL Support Low latency performance Mixed workload Leverage existing database Stack Flexible Data Structure (document/columnar/key- value) Distributed data storage for high volume of data Data Caching for low read latency (ms) Varied variety of data structure Distributed in-memory data store Scalable and Fault tolerant need Low latency performance (ns to ms ) High performance distributed processing Data persistence for high availability Converged platform to support multiple use cases High performance distributed parallel processing Distributed in-memory data store Co-located data processing Accelerate Hadoop ecosystem In-Memory Database NoSQL In-Memory Data Grid In-Memory Data Fabric IMDB IMDG IMDF NoSQL
IN-MEMORY SOLUTIONS – KEY PLAYERS Technology Vendors In-Memory Database In-Memory Data Grid In-Memory Data Fabric NoSQL
CASE STUDY EASY TO ADD THE LOGO AND TEXT
SEARCH ENGINE OPTIMIZATION  E-commerce online retail store dealing with the SEO across different browser, countries and languages are implemented usually using XML sitemap. Few search engines such as Baidu and Yandex does not support the XML sitemap implementation of “hreflang” which limits use of sitemap implementation of tag to address multiple countries/languages  Does not consider the localization causing incorrect search results  Yandex only supports the on-page “hreflang” tags Problem Statement
ECOMMERCE SEARCH ENGINE OPTIMIZATION  The data in the sitemap XML needs to be populated in the head of pages for both canonical urls and alternate URLs  For low latency access consider caching data from DB. The data in DB is not structured and needs to be transformed and aggregated to minimize the size of the cache.  Use MapReduce framework for the data transformation and aggregation. The challenge was refreshing the cache within 4 hours to avoid having stale data. Design Decision <link rel="alternate" hreflang="en-us" href=“https://webstore.online.com/us”> <link rel="alternate" hreflang="en-mx" href=“https://webstore.online.com/mx”>
ECOMMERCE SEARCH ENGINE OPTIMIZATION A. Store the data into NoSQL database such as MongoDB, run periodic map reduce jobs for transformation and aggregation before the data getting refreshed in the database. The challenge was to complete the Map Reduce within 4 hours to avoid getting the stale data B. Supplement data store with some sort of caching layer to read the data from native memory rather from the disk. The solution was to use IMDB. , but then we need to upgrade the tech stack to implement the Map Reduce jobs on the underlying IMDB. C. Store the data in IMDG which can also facilitate the Map Reduce framework, so that the Map Reduce jobs can be achieved in the stipulated time
ECOMMERCE SEARCH ENGINE OPTIMIZATION  Implemented solution using IMDG such as Hazelcast  Two data center with 5 node each  Data is stored and reduced in the native memory using the Map Reduce feature set/API(s)  Configured near cache option to fetch the data locally from the server where the URLs is served.
THANK YOU PRESENTER Pradeep Naik (pradeep.naik@wipro.com) TEAM MEMBERS: Viresh Kumar, Chandra Sekar K.R

IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid, In-Memory Data Fabric and NoSQL DB

  • 1.
    DEMYSTIFYING IN-MEMORY DATAGRID, IN- MEMORY DATA FABRIC AND NOSQL DB PRADEEP NAIK See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
  • 2.
    SPEAKER INTRODUCTION Pradeep Naik, PrincipalConsultant, CTO Office, Wipro Technologies  Responsible for incubating next generation technology @ Wipro  20+ years of experience in Database Management  Cross-domain IT Solution Architecting and Managing database solutions in Telecom, Healthcare and Financial Industries
  • 3.
    ABOUT WIPRO ** Globalleader in the Software & Services category, Member of Dow Jones Sustainability World Indices – 5th year in a row Ranked 8th in the Best Companies for Leaders 2015 list in a study conducted by Chally Group Honored as a World’s Most Ethical Companies by Ethisphere Institute for the Fourth Successive Year, 2015 Attracts the Best Talent Sustained Growth Partner to Industry Leaders Global Leaders Global Presence $7.6 Bn 1071* No.1** 100+161,789* Revenue in FY 2014-15 (IT Services – $ 7.06 Bn, IT Products – 0.55 Bn) Active Global Clients Workforce Leader in Software & Services industry category Serving clients in 100+ countries *Figures based on Q1 results 2015–16 for Global IT Services business
  • 4.
  • 5.
    WHAT BUSINESS DEMANDS? High Throughput  Low Latency Big Data Use case High Throughput Low Latency Data Ingestion Data Storage Data Processing Data Access Data Analytics Data Visualization
  • 6.
    TRADE OFF BETWEENHIGH THROUGHPUT AND LOW LATENCY High Throughput Low Latency In-memory computing Localized Processing Parallel Computing Stream Processing Parallel Computing Localized processing Eventual Consistency Auto scaling
  • 7.
    IN-MEMORY WORLD  In-MemoryDatabase (IMDB)  NoSQL  In-Memory Data Grid (IMDG)  In-Memory Data Fabric (IMDF) Operation al Analytical HTAP
  • 8.
    IN-MEMORY DATABASE (IMDB) Architectureof In-Memory Database  Good ANSI SQL Support  Strong support for ACID transactions  Lack of co-location processing  Based on vertically scalable Symmetrical Processing Architecture  Does not support distributed computing  Minimal application changes for upgrading to IMDB  Cannot work directly with domain objects. Users need to perform Object-To-Relational Mapping which typically adds significant performance overhead  Unit of movement is Data and not the Process Source: Oracle Times Ten
  • 9.
    NOSQL Architecture of NoSQL A distributed data storage with in-memory option  Most commonly used for high throughput requirements  Read latency in the range of millisecond to seconds  Low latency data access is achieved by caching the table/document in memory  Data is always stored in disk and can be configured to cache in memory  Achieves high availability via replication mechanism  Tunable consistency (Eventual and Immediate)  Limitation on size of table that can be cached NoSQL Cluster
  • 10.
    IN-MEMORY DATA GRID(IMDG) Architecture of In-Memory Data Grid  IMDG is a data structure that completely resides in memory and distributed across multiple severs  Fault tolerance – uses master-master or master-slave topology  Varied variety of data structures supported to map domain objects  Distributed computing - Collocate processing to cluster node where data is cached  Distributed Concurrency- Supports distributed transaction locking  Persistence- Supports seamless synchronous read through, write through or asynchronous write-behind to other data sources  Support applications with low latency requirements Database server Application Servers Node1 Memory from each servers K1,V 1 K2,V 2 K3,V 3 K4,V 4 Node2 Node3 Node4 In-Memory Data Grid Scale horizontally Data Write-through or Write behind Read-through persistence
  • 11.
    IN-MEMORY DATA FABRIC(IMDF) Application Servers Data/ Task/Query Data Grid Compute Grid Streaming CEP Map Reduce Hadoop Accl. In-Memory Data Fabric Distributed Cluster Messaging File SystemNoSQL/RDBMS  IMDF is a comprehensive in-memory data platform that includes data grid, clustering, compute grid, Complex Event Processing and real-time streaming  It is a superset of IMDG  Supports standard SQL for querying in-memory data including support for distributed SQL joins.  Distribute computations and data processing across multiple computers in a cluster in order to gain high performance and low latency  Support multiple execution paths for same events executing in parallel on one or more nodes  Works on underlying concept of MPP architecture  Converged platform to support multiple use cases Architecture of In-Memory Data Fabric
  • 12.
    TECHNOLOGY EVALUATION CRITERIA High Throughput vs. Low Latency  High Availability  Scalability (Vertical vs. Horizontal)  Distributed disk based data storage vs. distributed in-memory storage  Co-location of data processing  Distributed transactional ACID support  Eventual Consistency vs. Strong Consistency  Application change impacts  Reuse existing database technology stack vs. migrate to new databases?  Platform support for in-memory computation and storage  Support for flexible Data Structure
  • 13.
    SUMMARIZING THE DIFFERENCES StrongACID Support Structured Data ANSI SQL Support Low latency performance Mixed workload Leverage existing database Stack Flexible Data Structure (document/columnar/key- value) Distributed data storage for high volume of data Data Caching for low read latency (ms) Varied variety of data structure Distributed in-memory data store Scalable and Fault tolerant need Low latency performance (ns to ms ) High performance distributed processing Data persistence for high availability Converged platform to support multiple use cases High performance distributed parallel processing Distributed in-memory data store Co-located data processing Accelerate Hadoop ecosystem In-Memory Database NoSQL In-Memory Data Grid In-Memory Data Fabric IMDB IMDG IMDF NoSQL
  • 14.
    IN-MEMORY SOLUTIONS –KEY PLAYERS Technology Vendors In-Memory Database In-Memory Data Grid In-Memory Data Fabric NoSQL
  • 15.
    CASE STUDY EASY TOADD THE LOGO AND TEXT
  • 16.
    SEARCH ENGINE OPTIMIZATION E-commerce online retail store dealing with the SEO across different browser, countries and languages are implemented usually using XML sitemap. Few search engines such as Baidu and Yandex does not support the XML sitemap implementation of “hreflang” which limits use of sitemap implementation of tag to address multiple countries/languages  Does not consider the localization causing incorrect search results  Yandex only supports the on-page “hreflang” tags Problem Statement
  • 17.
    ECOMMERCE SEARCH ENGINEOPTIMIZATION  The data in the sitemap XML needs to be populated in the head of pages for both canonical urls and alternate URLs  For low latency access consider caching data from DB. The data in DB is not structured and needs to be transformed and aggregated to minimize the size of the cache.  Use MapReduce framework for the data transformation and aggregation. The challenge was refreshing the cache within 4 hours to avoid having stale data. Design Decision <link rel="alternate" hreflang="en-us" href=“https://webstore.online.com/us”> <link rel="alternate" hreflang="en-mx" href=“https://webstore.online.com/mx”>
  • 18.
    ECOMMERCE SEARCH ENGINEOPTIMIZATION A. Store the data into NoSQL database such as MongoDB, run periodic map reduce jobs for transformation and aggregation before the data getting refreshed in the database. The challenge was to complete the Map Reduce within 4 hours to avoid getting the stale data B. Supplement data store with some sort of caching layer to read the data from native memory rather from the disk. The solution was to use IMDB. , but then we need to upgrade the tech stack to implement the Map Reduce jobs on the underlying IMDB. C. Store the data in IMDG which can also facilitate the Map Reduce framework, so that the Map Reduce jobs can be achieved in the stipulated time
  • 19.
    ECOMMERCE SEARCH ENGINEOPTIMIZATION  Implemented solution using IMDG such as Hazelcast  Two data center with 5 node each  Data is stored and reduced in the native memory using the Map Reduce feature set/API(s)  Configured near cache option to fetch the data locally from the server where the URLs is served.
  • 20.
    THANK YOU PRESENTER Pradeep Naik(pradeep.naik@wipro.com) TEAM MEMBERS: Viresh Kumar, Chandra Sekar K.R