Continuous Machine and Deep Learning at Scale With Apache Ignite Denis Magda Apache Ignite Committer & PMC Chair @denismagda
2019 © GridGain Systems @denismagda @ApacheIgnite Agenda 1 • Why Machine Learning at Scale? • Ignite Machine Learning Intro • TensorFlow Integration • Ignite Machine Learning Internals • Q&A
2019 © GridGain Systems @denismagda @ApacheIgnite2 5 Mins Guide to Ignite: Overview and why to support ML?
2019 © GridGain Systems @denismagda @ApacheIgnite Why Machine Learning at Scale? 3 • Scalability – Data exceed capacity of single server – Burden for dev and business • Models trained and deployed in different systems – Move data out for training – Wait for training to complete – Redeploy models in production
2019 © GridGain Systems @denismagda @ApacheIgnite App Continuous Learning Approach Without ETL Periodic update of models Periodic ETL of terabytes of data Loading data for training Model training & testing Storing and processing working set Before Storing and processing working set Instant updates of models After (With CL) App ML/DL Engine Model training & testing No ETL
2019 © GridGain Systems @denismagda @ApacheIgnite Apache Ignite Overview Mainframe NoSQL HadoopIgnite Persistence Persistent Layer RDBMS Machine and Deep Learning EventsStreamingMessagingTransactionsSQLKey-Value Service GridCompute Grid Application Layer Web SaaS SocialMobile IoT In-Memory Data Store
2019 © GridGain Systems @denismagda @ApacheIgnite6 Ignite Deployment Modes Enhance Legacy Architecture - IMDG Simplified Modern Architecture - IMDB Ignite In-Memory Storage Application Layer Web-Scale Apps Mobile AppsIoT Social Media Ignite In-Memory Storage External Database NoSQLRDBMS Hadoop Application Layer Web-Scale Apps Mobile AppsIoT Social Media Ignite Persistence
2019 © GridGain Systems @denismagda @ApacheIgnite7 Ignite Machine Learning: Slightly More Details
2019 © GridGain Systems @denismagda @ApacheIgnite Ignite Machine and Deep Learning Ignite Persistence Distributed Machine Learning Datasets TensorFLowRegressionsK-Means Decision Trees In-Memory Data Store Ignite Machine and Deep Learning Compute and Service Grid C++.NETJava Python Binary Protocal (Thin client) Distributed Algorithms Large Scale Parallelization Multi-language Support No ETL Distributed Dataset based on partitioned caches
2019 © GridGain Systems @denismagda @ApacheIgnite Distributed Classification • Logistic Regression • SVM, KNN, ANN • Decision trees • Random Forest • Naive Bayes
2019 © GridGain Systems @denismagda @ApacheIgnite Distributed Regression • KNN Regression • Linear Regression • Decision tree regression • Random forest regression • Gradient-boosted tree regression
2019 © GridGain Systems @denismagda @ApacheIgnite Distributed Clustering • K-means • GMM
2019 © GridGain Systems @denismagda @ApacheIgnite Multilayer Perceptron Neural Network
2019 © GridGain Systems @denismagda @ApacheIgnite Ignite ML API Usage IgniteCache<Integer, Vector> dataCache = TitanicUtils.readPassengers (ignite); Vectorizer vectorizer = new SampleVectorizer(0, 5, 6).labeled(1); DecisionTreeClassificationTrainer trainer = new DecisionTreeClassificationTrainer(5, 0); DecisionTreeNode mdl = trainer.fit(ignite, dataCache, vectorizer); double accuracy = Evaluator.evaluate(dataCache, mdl, vectorizer, new Accuracy<>());
2019 © GridGain Systems @denismagda @ApacheIgnite Machine Learning Pipelines
2019 © GridGain Systems @denismagda @ApacheIgnite Pipelining with Apache Ignite IgniteCache<Integer, Vector> dataCache = TitanicUtils.readPassengers(ignite); // Extracts "pclass", "sibsp", "parch", "sex", "embarked", "age", "fare". Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>(0, 3, 4, 5, 6, 8, 10).labeled(1); PipelineMdl<Integer, Vector> mdl = new Pipeline<Integer, Vector, Integer, Double>() .addVectorizer(vectorizer) .addPreprocessingTrainer(new EncoderTrainer<Integer, Vector>() .withEncoderType(EncoderType.STRING_ENCODER) .withEncodedFeature(1) .withEncodedFeature(6)) .addPreprocessingTrainer(new ImputerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new MinMaxScalerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new NormalizationTrainer<Integer, Vector>() .withP(1)) .addTrainer(new DecisionTreeClassificationTrainer(5, 0)) .fit(ignite, dataCache);
2019 © GridGain Systems @denismagda @ApacheIgnite Continuous Learning With Apache Ignite SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer(); SVMLinearClassificationModel mdl1 = trainer.fit(ignite, dataCache1, vectorizer); SVMLinearClassificationModel mdl2 = trainer.update(mdl1, ignite, dataCache2, vectorizer);
2019 © GridGain Systems @denismagda @ApacheIgnite17 Demo: Payments Fraud Detection
2019 © GridGain Systems @denismagda @ApacheIgnite18 Ignite and TensorFlow
2019 © GridGain Systems @denismagda @ApacheIgnite TensorFlow Integration: Benefits 19 • Ignite as distributed data source – Perfect fit for distributed TF training • Less ETL – TF nodes deployed together with Ignite nodes – In-machine data movement only
2019 © GridGain Systems @denismagda @ApacheIgnite TensorFlow Integration: Main Features 20 • Distribution of user tasks written in Python • Automatic creation and maintenance of TF cluster • Minimization of ETL costs • Fault tolerance for both Ignite and TF instances >>> import tensorflow as tf >>> from tensorflow.contrib.ignite import IgniteDataset >>> >>> dataset = IgniteDataset(cache_name="SQL_PUBLIC_KITTEN_CACHE") >>> iterator = dataset.make_one_shot_iterator() >>> next_obj = iterator.get_next() >>> >>> with tf.Session() as sess: >>> for _ in range(3): >>> print(sess.run(next_obj)) {'key': 1, 'val': {'NAME': b'WARM KITTY'}} {'key': 2, 'val': {'NAME': b'SOFT KITTY'}} {'key': 3, 'val': {'NAME': b'LITTLE BALL OF FUR'}}
2019 © GridGain Systems @denismagda @ApacheIgnite21 Ignite Machine Learning: Internals
2019 © GridGain Systems @denismagda @ApacheIgnite Distributed In-Memory Data Store Ignite Memory-Centric Storage Ignite Cluster Predictable Memory Consumption Fully Transactional WAL (Write Ahead Log) Instantaneous Restarts Automatic Defragmentation Off-heap Removes Noticeable GC Pauses Stores Superset of Data Distributed Persistent Store In-Memory Data Store Persistent Store Server Node In-Memory Data Store Persistent Store Server Node In-Memory Data Store Persistent Store Server Node
2019 © GridGain Systems @denismagda @ApacheIgnite23 Record to Node Mapping Key Partition Server Node ON-DISK
2019 © GridGain Systems @denismagda @ApacheIgnite24 Caches and Partitions K1, V1 K2, V2 K3, V3 K4, V4 Partition 1 K5, V5 K6, V6 K7,V7 K8, V8 K9, V9 Partition 2 Cache
2019 © GridGain Systems @denismagda @ApacheIgnite25 Partitions Distribution Node 1 Node 2 Node 3 Node 4 0 1 2 3 0 1 2 3 Primary Backup
2019 © GridGain Systems @denismagda @ApacheIgnite26 Partition-Based Dataset Node 1 P1 C D Node 2 P2 C D Training Training REDUCE Client Initial solution
2019 © GridGain Systems @denismagda @ApacheIgnite27 Training Failover Node 3 Node 1 P C D* P = Partition C = Partition Context D = Partition Data D* = Local ETL P C D
2019 © GridGain Systems @denismagda @ApacheIgnite28 To be released soon
2019 © GridGain Systems @denismagda @ApacheIgnite Full Python Support and Model Importing 29 • Model Importing from Spark, XGBoost, etc. • Full Python support – https://github.com/gridgain/ml-python-api
2019 © GridGain Systems @denismagda @ApacheIgnite30 Wrapping Up
2019 © GridGain Systems @denismagda @ApacheIgnite Apache Ignite Benefits for ML Use Cases 31 • Massive scalability – Horizontal + Vertical – RAM + Disk • Minimal ETL – Train models and run algorithms in place • Fault tolerance and continuous learning – Partition-based dataset
2019 © GridGain Systems @denismagda @ApacheIgnite Resources 32 • Documentation: – https://apacheignite.readme.io/docs • Examples and Tutorials: – https://github.com/apache/ignite/tree/master/exam ples/src/main/java/org/apache/ignite/examples/ml • Details on TensorFlow • https://medium.com/tensorflow/tensorflow-on- apache-ignite-99f1fc60efeb
2019 © GridGain Systems @denismagda @ApacheIgnite Apache Ignite – We’re Hiring ;) 33 • Rapidly Growing Community • Great Way to Learn Distributed Storages, Computing, SQL, ML, Transactions • How To Contribute: – https://ignite.apache.org/
2019 © GridGain Systems @denismagda @ApacheIgnite - 50,000 100,000 150,000 200,000 Apr-14 Jun-14 Aug-14 Oct-14 Dec-14 Feb-15 Apr-15 Jun-15 Aug-15 Oct-15 Dec-15 Feb-16 Apr-16 Jun-16 Aug-16 Oct-16 Dec-16 Feb-17 Apr-17 Jun-17 Aug-17 Oct-17 Dec-17 Feb-18 Apr-18 Jun-18 Aug-18 Oct-18 Dec-18 Apache Ignite Is a Top 5 Apache Project Over 2M downloads per year and 4M total downloadsTop 5 Dev Mailing Lists 1. 2. 3. 4. 5. Top 5 User Mailing Lists 1. 2. 3. 4. 5. Monthly Ignite/GridGain Downloads From January 1, 2019 Apache Software Foundation Blog Post: “Apache in 2018 – By The Digits” A Top 5 Apache Software Foundation Project
2019 © GridGain Systems @denismagda @ApacheIgnite Logistics & Transportation Apache Ignite Users IoT AdTech/Media/Entertainment Pharma & Healthcare Reliance Financial Services FinTech Software/Cloud Telecom & Mobile IoT AdTech / Media / Entertainment Logistics & Transportation eCommerce & Retail Pharma & Healthcare
2019 © GridGain Systems @denismagda @ApacheIgnite 36 Any Questions? @apacheignite @denismagda

Continuous Machine and Deep Learning with Apache Ignite

  • 1.
    Continuous Machine andDeep Learning at Scale With Apache Ignite Denis Magda Apache Ignite Committer & PMC Chair @denismagda
  • 2.
    2019 © GridGainSystems @denismagda @ApacheIgnite Agenda 1 • Why Machine Learning at Scale? • Ignite Machine Learning Intro • TensorFlow Integration • Ignite Machine Learning Internals • Q&A
  • 3.
    2019 © GridGainSystems @denismagda @ApacheIgnite2 5 Mins Guide to Ignite: Overview and why to support ML?
  • 4.
    2019 © GridGainSystems @denismagda @ApacheIgnite Why Machine Learning at Scale? 3 • Scalability – Data exceed capacity of single server – Burden for dev and business • Models trained and deployed in different systems – Move data out for training – Wait for training to complete – Redeploy models in production
  • 5.
    2019 © GridGainSystems @denismagda @ApacheIgnite App Continuous Learning Approach Without ETL Periodic update of models Periodic ETL of terabytes of data Loading data for training Model training & testing Storing and processing working set Before Storing and processing working set Instant updates of models After (With CL) App ML/DL Engine Model training & testing No ETL
  • 6.
    2019 © GridGainSystems @denismagda @ApacheIgnite Apache Ignite Overview Mainframe NoSQL HadoopIgnite Persistence Persistent Layer RDBMS Machine and Deep Learning EventsStreamingMessagingTransactionsSQLKey-Value Service GridCompute Grid Application Layer Web SaaS SocialMobile IoT In-Memory Data Store
  • 7.
    2019 © GridGainSystems @denismagda @ApacheIgnite6 Ignite Deployment Modes Enhance Legacy Architecture - IMDG Simplified Modern Architecture - IMDB Ignite In-Memory Storage Application Layer Web-Scale Apps Mobile AppsIoT Social Media Ignite In-Memory Storage External Database NoSQLRDBMS Hadoop Application Layer Web-Scale Apps Mobile AppsIoT Social Media Ignite Persistence
  • 8.
    2019 © GridGainSystems @denismagda @ApacheIgnite7 Ignite Machine Learning: Slightly More Details
  • 9.
    2019 © GridGainSystems @denismagda @ApacheIgnite Ignite Machine and Deep Learning Ignite Persistence Distributed Machine Learning Datasets TensorFLowRegressionsK-Means Decision Trees In-Memory Data Store Ignite Machine and Deep Learning Compute and Service Grid C++.NETJava Python Binary Protocal (Thin client) Distributed Algorithms Large Scale Parallelization Multi-language Support No ETL Distributed Dataset based on partitioned caches
  • 10.
    2019 © GridGainSystems @denismagda @ApacheIgnite Distributed Classification • Logistic Regression • SVM, KNN, ANN • Decision trees • Random Forest • Naive Bayes
  • 11.
    2019 © GridGainSystems @denismagda @ApacheIgnite Distributed Regression • KNN Regression • Linear Regression • Decision tree regression • Random forest regression • Gradient-boosted tree regression
  • 12.
    2019 © GridGainSystems @denismagda @ApacheIgnite Distributed Clustering • K-means • GMM
  • 13.
    2019 © GridGainSystems @denismagda @ApacheIgnite Multilayer Perceptron Neural Network
  • 14.
    2019 © GridGainSystems @denismagda @ApacheIgnite Ignite ML API Usage IgniteCache<Integer, Vector> dataCache = TitanicUtils.readPassengers (ignite); Vectorizer vectorizer = new SampleVectorizer(0, 5, 6).labeled(1); DecisionTreeClassificationTrainer trainer = new DecisionTreeClassificationTrainer(5, 0); DecisionTreeNode mdl = trainer.fit(ignite, dataCache, vectorizer); double accuracy = Evaluator.evaluate(dataCache, mdl, vectorizer, new Accuracy<>());
  • 15.
    2019 © GridGainSystems @denismagda @ApacheIgnite Machine Learning Pipelines
  • 16.
    2019 © GridGainSystems @denismagda @ApacheIgnite Pipelining with Apache Ignite IgniteCache<Integer, Vector> dataCache = TitanicUtils.readPassengers(ignite); // Extracts "pclass", "sibsp", "parch", "sex", "embarked", "age", "fare". Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>(0, 3, 4, 5, 6, 8, 10).labeled(1); PipelineMdl<Integer, Vector> mdl = new Pipeline<Integer, Vector, Integer, Double>() .addVectorizer(vectorizer) .addPreprocessingTrainer(new EncoderTrainer<Integer, Vector>() .withEncoderType(EncoderType.STRING_ENCODER) .withEncodedFeature(1) .withEncodedFeature(6)) .addPreprocessingTrainer(new ImputerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new MinMaxScalerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new NormalizationTrainer<Integer, Vector>() .withP(1)) .addTrainer(new DecisionTreeClassificationTrainer(5, 0)) .fit(ignite, dataCache);
  • 17.
    2019 © GridGainSystems @denismagda @ApacheIgnite Continuous Learning With Apache Ignite SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer(); SVMLinearClassificationModel mdl1 = trainer.fit(ignite, dataCache1, vectorizer); SVMLinearClassificationModel mdl2 = trainer.update(mdl1, ignite, dataCache2, vectorizer);
  • 18.
    2019 © GridGainSystems @denismagda @ApacheIgnite17 Demo: Payments Fraud Detection
  • 19.
    2019 © GridGainSystems @denismagda @ApacheIgnite18 Ignite and TensorFlow
  • 20.
    2019 © GridGainSystems @denismagda @ApacheIgnite TensorFlow Integration: Benefits 19 • Ignite as distributed data source – Perfect fit for distributed TF training • Less ETL – TF nodes deployed together with Ignite nodes – In-machine data movement only
  • 21.
    2019 © GridGainSystems @denismagda @ApacheIgnite TensorFlow Integration: Main Features 20 • Distribution of user tasks written in Python • Automatic creation and maintenance of TF cluster • Minimization of ETL costs • Fault tolerance for both Ignite and TF instances >>> import tensorflow as tf >>> from tensorflow.contrib.ignite import IgniteDataset >>> >>> dataset = IgniteDataset(cache_name="SQL_PUBLIC_KITTEN_CACHE") >>> iterator = dataset.make_one_shot_iterator() >>> next_obj = iterator.get_next() >>> >>> with tf.Session() as sess: >>> for _ in range(3): >>> print(sess.run(next_obj)) {'key': 1, 'val': {'NAME': b'WARM KITTY'}} {'key': 2, 'val': {'NAME': b'SOFT KITTY'}} {'key': 3, 'val': {'NAME': b'LITTLE BALL OF FUR'}}
  • 22.
    2019 © GridGainSystems @denismagda @ApacheIgnite21 Ignite Machine Learning: Internals
  • 23.
    2019 © GridGainSystems @denismagda @ApacheIgnite Distributed In-Memory Data Store Ignite Memory-Centric Storage Ignite Cluster Predictable Memory Consumption Fully Transactional WAL (Write Ahead Log) Instantaneous Restarts Automatic Defragmentation Off-heap Removes Noticeable GC Pauses Stores Superset of Data Distributed Persistent Store In-Memory Data Store Persistent Store Server Node In-Memory Data Store Persistent Store Server Node In-Memory Data Store Persistent Store Server Node
  • 24.
    2019 © GridGainSystems @denismagda @ApacheIgnite23 Record to Node Mapping Key Partition Server Node ON-DISK
  • 25.
    2019 © GridGainSystems @denismagda @ApacheIgnite24 Caches and Partitions K1, V1 K2, V2 K3, V3 K4, V4 Partition 1 K5, V5 K6, V6 K7,V7 K8, V8 K9, V9 Partition 2 Cache
  • 26.
    2019 © GridGainSystems @denismagda @ApacheIgnite25 Partitions Distribution Node 1 Node 2 Node 3 Node 4 0 1 2 3 0 1 2 3 Primary Backup
  • 27.
    2019 © GridGainSystems @denismagda @ApacheIgnite26 Partition-Based Dataset Node 1 P1 C D Node 2 P2 C D Training Training REDUCE Client Initial solution
  • 28.
    2019 © GridGainSystems @denismagda @ApacheIgnite27 Training Failover Node 3 Node 1 P C D* P = Partition C = Partition Context D = Partition Data D* = Local ETL P C D
  • 29.
    2019 © GridGainSystems @denismagda @ApacheIgnite28 To be released soon
  • 30.
    2019 © GridGainSystems @denismagda @ApacheIgnite Full Python Support and Model Importing 29 • Model Importing from Spark, XGBoost, etc. • Full Python support – https://github.com/gridgain/ml-python-api
  • 31.
    2019 © GridGainSystems @denismagda @ApacheIgnite30 Wrapping Up
  • 32.
    2019 © GridGainSystems @denismagda @ApacheIgnite Apache Ignite Benefits for ML Use Cases 31 • Massive scalability – Horizontal + Vertical – RAM + Disk • Minimal ETL – Train models and run algorithms in place • Fault tolerance and continuous learning – Partition-based dataset
  • 33.
    2019 © GridGainSystems @denismagda @ApacheIgnite Resources 32 • Documentation: – https://apacheignite.readme.io/docs • Examples and Tutorials: – https://github.com/apache/ignite/tree/master/exam ples/src/main/java/org/apache/ignite/examples/ml • Details on TensorFlow • https://medium.com/tensorflow/tensorflow-on- apache-ignite-99f1fc60efeb
  • 34.
    2019 © GridGainSystems @denismagda @ApacheIgnite Apache Ignite – We’re Hiring ;) 33 • Rapidly Growing Community • Great Way to Learn Distributed Storages, Computing, SQL, ML, Transactions • How To Contribute: – https://ignite.apache.org/
  • 35.
    2019 © GridGainSystems @denismagda @ApacheIgnite - 50,000 100,000 150,000 200,000 Apr-14 Jun-14 Aug-14 Oct-14 Dec-14 Feb-15 Apr-15 Jun-15 Aug-15 Oct-15 Dec-15 Feb-16 Apr-16 Jun-16 Aug-16 Oct-16 Dec-16 Feb-17 Apr-17 Jun-17 Aug-17 Oct-17 Dec-17 Feb-18 Apr-18 Jun-18 Aug-18 Oct-18 Dec-18 Apache Ignite Is a Top 5 Apache Project Over 2M downloads per year and 4M total downloadsTop 5 Dev Mailing Lists 1. 2. 3. 4. 5. Top 5 User Mailing Lists 1. 2. 3. 4. 5. Monthly Ignite/GridGain Downloads From January 1, 2019 Apache Software Foundation Blog Post: “Apache in 2018 – By The Digits” A Top 5 Apache Software Foundation Project
  • 36.
    2019 © GridGainSystems @denismagda @ApacheIgnite Logistics & Transportation Apache Ignite Users IoT AdTech/Media/Entertainment Pharma & Healthcare Reliance Financial Services FinTech Software/Cloud Telecom & Mobile IoT AdTech / Media / Entertainment Logistics & Transportation eCommerce & Retail Pharma & Healthcare
  • 37.
    2019 © GridGainSystems @denismagda @ApacheIgnite 36 Any Questions? @apacheignite @denismagda

Editor's Notes

  • #6 Fraud prevention. A bank has developed a historical model of what indicates a loan application is likely fraudulent, but as the system ingests new credit applications the system continually updates the machine learning model based on the new data to identify in real-time any emerging trends that might indicate a new concerted effort to acquire credit fraudulently. Any related fraudulent activity can then be immediately identified. Ecommerce recommendations. Online shopping recommendation engines are based on historical data such as web page visits and purchase patterns, but they are far more powerful – and deliver an increased ROI – if they incorporate real-time continuous learning. Incorporating the latest web page information, referral information, and purchase patterns into the machine learning model can result in real-time improvements to the recommendation engine model, resulting in improved recommendations based on the latest data available.
  • #7 The GridGain Platform GridGain is a memory-centric data platform that is used to build fast, scalable & resilient solutions. At the heart of the GridGain platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables GridGain to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery. GridGain platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management. The GridGain platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
  • #8 * Architectural simplification
  • #10 Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases. Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk. You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
  • #15 Also you could await that your model is perfect. Calculate the classification metric, accuracy for example to evaluate the quality of model.
  • #24 Apache Ignite memory-centric platform is based on an in-memory architecture that allows storing and processing data and indexes both in memory and on disk when the Ignite Persistent Store feature is enabled. The memory architecture helps achieve in-memory performance with durability of disk using all the available resources of the cluster. The GridGain in-memory data store is built and operates in a way similar to the Virtual Memory of operating systems such as Linux. However, one significant difference between these two types of architectures is that Durable Memory always keeps the whole data set and indexes on disk if the Ignite Persistent Store is used, while Virtual Memory uses the disk for swapping purposes only. In-Memory • Off-Heap memory • Removes noticeable GC pauses • Automatic Defragmentation • Predictable memory consumption • Boosts SQL performance On Disk • Optional Persistence • Support of flash, SSD, Intel 3D Xpoint • Stores superset of data • Fully Transactional ◦ Write-Ahead-Log (WAL) • Instantaneous Cluster Restarts
  • #28 Abstraction layer on top of Ignite storage and computation MapReduce using Compute Grid Partition data Can be recovered from another node Partition context ML algorithms are iterative and require context
  • #36 Part of the reason behind our growth is the growth of Apache Ignite. HAVE YOU HEARD OF APACHE IGNITE? GridGain Systems donated the code to the Apache Ignite project in late 2014. It became a top level project of the Apache Software Foundation (ASF) in mid 2015, the second fastest to do so. Apache Ignite is now one of the top 5 Apache Software Foundation projects, and has been for the last 2 years now. While we continue to be the leading contributor, though there are several others. With over 4 million total downloads, Ignite has reached a 2 million download-a-year run rate. [1] http://globenewswire.com/news-release/2019/07/09/1534470/0/en/The-Apache-Software-Foundation-Announces-Annual-Report-for-2019-Fiscal-Year.html 2018 numbers [2] https://blogs.apache.org/foundation/entry/apache-in-2018-by-the 2017 numbers [3] https://blogs.apache.org/foundation/entry/apache-in-2017-by-the
  • #37 Today there are hundreds of leading companies that rely on GridGain to support their mission-critical applications. While GridGain started in Financial Services, today that is about 25% of its total business … USE THIS OPPORTUNITY TO TELL SOME OF THE RELEVENT STORIES. It is used by FinTech and SaaS companies to add speed and scale, usually to support the larger customers as they adopt the FinTech/SaaS technologies. In FinTech, Finastra, which supports 48 out of the 50 top banks worldwide, adopted GridGain for their Cloud platform to add the speed and scale needed for their offerings and to support FRTB real-time regulatory requirements. In SaaS Microsoft Azure uses GridGain for real-time attack prevention as part of their identity services for all customer applications on Azure. In telco, all of RingCentral’s VOIP relies on GridGain for storing all call/service sessions and making sure connections continue even as calls connect through different datacenters. In IoT, Itron supports hundreds of millions of smartmeters globally and relies on GridGain for real-time data ingestion at scale. They adopted GridGain at first to support their larger customers. American Airlines uses GridGain for real-time rerouting of customers and their luggage as they land Multiplan uses GridGain to better manage healthcare costs at scale.