© 2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk
© 2017 GridGain Systems, Inc. Apache Ignite and Apache Spark Where Fast Data Meets the IoT Denis Magda Apache Ignite PMC Chair GridGain Director of Product Management
© 2017 GridGain Systems, Inc. • IoT Demands to Software • IoT Software Stack • Device OS/RTOS • Data Collection and Enrichment • NewSQL Database • Application APIs • Demo Agenda
© 2017 GridGain Systems, Inc. IoT Demands to Software Real-time Processing SQL, Geo-Spatial Analytics (BI, ML) High-Availability Simple Scalability
© 2017 GridGain Systems, Inc. IoT Software Stack Device OS/Real-Time OS Data Collection and Enrichment NewSQL Database Application APIs
© 2017 GridGain Systems, Inc. Apache IoT Software Stack Device OS/Real-Time OS Data Collection and Enrichment NewSQL Database Application APIs
© 2017 GridGain Systems, Inc. Apache MyNewt Open Source RTOS Cortex M, MIPS Bluetooth, Wifi, TCP/IP Secured Bootloader Remote Firmware Upgrade
© 2017 GridGain Systems, Inc. Data Collection and Enrichment DURABLE MEMORY DURABLE MEMORY Ignite Cluster
© 2017 GridGain Systems, Inc. Apache Ignite Database and Caching Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
© 2017 GridGain Systems, Inc. Distributed Storage JCache Transactions Compute SQL RDBMS NoSQL HDFS Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node 3rd party storage caching DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
© 2017 GridGain Systems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools
© 2017 GridGain Systems, Inc. Ignite and Spark Integration Spark Application Spark Worker Spark Job Spark Job Yarn Mesos Docker HDFS Spark Worker Spark Job Spark Job Spark Worker Spark Job Spark Job Share RDD across jobs on the host In-Memory Indexes SQL on top of RDDs Share RDD Globally Ignite Node Ignite Node Ignite Node
© 2017 GridGain Systems, Inc. The company develops IoT solutions that transmit energy consumption data between meters, consumers and utilities in real time. Problem • Could not meet latency and throughput SLAs • Missing scalability and elasticity GridGain Solution • 50 millions meters stream the data back in real-time • Collocated in-memory processing • Advanced security and multi-tenancy SQL Smart Meters GridGain Ignite Cluster DB IN-MEMORY IN-MEMORY IN-MEMORY IN-MEMORY GridGain Advanced Security Large IoT Provider - Smart Metering and Utilities Compute Transactions Company’s Platform
© 2017 GridGain Systems, Inc. DEMO
© 2017 GridGain Systems, Inc. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org #apacheignite #denismagda

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

  • 1.
    © 2017 GridGainSystems, Inc. In-Memory Performance Durability of Disk
  • 2.
    © 2017 GridGainSystems, Inc. Apache Ignite and Apache Spark Where Fast Data Meets the IoT Denis Magda Apache Ignite PMC Chair GridGain Director of Product Management
  • 3.
    © 2017 GridGainSystems, Inc. • IoT Demands to Software • IoT Software Stack • Device OS/RTOS • Data Collection and Enrichment • NewSQL Database • Application APIs • Demo Agenda
  • 4.
    © 2017 GridGainSystems, Inc. IoT Demands to Software Real-time Processing SQL, Geo-Spatial Analytics (BI, ML) High-Availability Simple Scalability
  • 5.
    © 2017 GridGainSystems, Inc. IoT Software Stack Device OS/Real-Time OS Data Collection and Enrichment NewSQL Database Application APIs
  • 6.
    © 2017 GridGainSystems, Inc. Apache IoT Software Stack Device OS/Real-Time OS Data Collection and Enrichment NewSQL Database Application APIs
  • 7.
    © 2017 GridGainSystems, Inc. Apache MyNewt Open Source RTOS Cortex M, MIPS Bluetooth, Wifi, TCP/IP Secured Bootloader Remote Firmware Upgrade
  • 8.
    © 2017 GridGainSystems, Inc. Data Collection and Enrichment DURABLE MEMORY DURABLE MEMORY Ignite Cluster
  • 9.
    © 2017 GridGainSystems, Inc. Apache Ignite Database and Caching Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
  • 10.
    © 2017 GridGainSystems, Inc. Distributed Storage JCache Transactions Compute SQL RDBMS NoSQL HDFS Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node 3rd party storage caching DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
  • 11.
    © 2017 GridGainSystems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools
  • 12.
    © 2017 GridGainSystems, Inc. Ignite and Spark Integration Spark Application Spark Worker Spark Job Spark Job Yarn Mesos Docker HDFS Spark Worker Spark Job Spark Job Spark Worker Spark Job Spark Job Share RDD across jobs on the host In-Memory Indexes SQL on top of RDDs Share RDD Globally Ignite Node Ignite Node Ignite Node
  • 13.
    © 2017 GridGainSystems, Inc. The company develops IoT solutions that transmit energy consumption data between meters, consumers and utilities in real time. Problem • Could not meet latency and throughput SLAs • Missing scalability and elasticity GridGain Solution • 50 millions meters stream the data back in real-time • Collocated in-memory processing • Advanced security and multi-tenancy SQL Smart Meters GridGain Ignite Cluster DB IN-MEMORY IN-MEMORY IN-MEMORY IN-MEMORY GridGain Advanced Security Large IoT Provider - Smart Metering and Utilities Compute Transactions Company’s Platform
  • 14.
    © 2017 GridGainSystems, Inc. DEMO
  • 15.
    © 2017 GridGainSystems, Inc. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org #apacheignite #denismagda

Editor's Notes

  • #10 The Apache Ignite Platform Apache Ignite is a memory-centric data platform that is used to build fast, scalable & resilient solutions. At the heart of the Apache Ignite platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables Apache Ignite to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery. The main difference between the memory-centric approach and the traditional disk-centric approach is that the memory is treated as a fully functional storage, not just as a caching layer, like most databases do. For example, Apache Ignite can function in a pure in-memory mode, in which case it can be treated as an In-Memory Database (IMDB) and In-Memory Data Grid (IMDG) in one. On the other hand, when persistence is turned on, Ignite begins to function as a memory-centric system where most of the processing happens in memory, but the data and indexes get persisted to disk. The main difference here from the traditional disk-centric RDBMS or NoSQL system is that Ignite is strongly consistent, horizontally scalable, and supports both SQL and key-value processing APIs. Apache Ignite platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management. The Apache Ignite platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
  • #11 Ignite Data Grid is a distributed key-value store that enables storing data both in memory and on disk within distributed clusters and provides extensive APIs. Ignite Data Grid can be viewed as a distributed partitioned hash map with every cluster node owning a portion of the overall data. This way the more cluster nodes we add, the more data we can store.
  • #12 Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases. Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk. You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
  • #13 Apache Ignite provides an implementation of Spark RDD abstraction which allows to easily share state in memory across Spark jobs. The main difference between native Spark RDD and IgniteRDD is that Ignite RDD provides a shared in-memory view on data across different Spark jobs, workers, or applications, while native Spark RDD cannot be seen by other Spark jobs or applications. The way IgniteRDD is implemented is as a view over a distributed Ignite cache, which may be deployed either within the Spark job executing process, or on a Spark worker, or in its own cluster. This means that depending on the chosen deployment mode the shared state may either exist only during the lifespan of a Spark application (embedded mode), or it may out-survive the Spark application (standalone mode) in which case the state can be shared across multiple Spark applications.
  • #14 * 50 million meters deployed worldwide The meters stream the data back to the providers platform. The data is to be analyzed to control optimal power coverage and usage of diverse territory