Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. DMITRIY SETRAKYAN GridGain Founder & Chief Product Officer Apache Ignite PMC Apache IgniteTM - In-Memory Data Fabric Fast Data Meets Open Source http://ignite.apache.org @apacheignite @dsetrakyan
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Agenda • Apache Ignite(tm) Overview • Data Grid • Partitioning Schemes • SQL • Shared Memory Layer • Share Spark RDDs • In-Memory File System • DevOps: Yarn and Mesos • Faster MapReduce & Hive • Ignite MapReduce • Demo using Apache Zeppelin • Q & A
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Very Active Community • Great Way to Learn Distributed Computing • How To Contribute: – https://ignite.apache.org/community/contr ibute.html#contribute – https://cwiki.apache.org/confluence/displa y/IGNITE/How+to+Contribute We Are Hiring!
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apache IgniteTM In-Memory Data Fabric: Strategic Approach to IMC • Supports Applications of various types and languages • Open Source – Apache 2.0 • Simple Java APIs • 1 JAR Dependency • High Performance & Scale • Automatic Fault Tolerance • Management/Monitoring • Runs on Commodity Hardware • Supports existing & new data sources • No need to rip & replace
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apache Ignite In-Memory Data Fabric
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Long Running Applications – Passing State Between Jobs • Disk File System (HDFS?) – Convert RDDs to Disk Files and Back – Argh#$% • Share RDDs In-Memory – Native Spark API – Native Spark Transformations Why Share State in Spark?
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • In-Memory Key-Value Store – Good for Caching Tuples • Foundation for Shared Memory State – IgniteRDD is based on Data Grid – Ignite File System is based on Data Grid • On-Heap & Off-Heap Memory • In-Memory Indexes – Fast SQL • Built for High Throughput and Low Latencies Why Data Grid?
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • JCache (JSR 107) – In-Memory Key-Value Store – Basic Cache Operations – ConcurrentMap APIs – Collocated Processing (EntryProcessor) – Events and Metrics – Pluggable Persistence • Ignite Data Grid – ACID Transactions – SQL Queries (ANSI 99) – In-Memory Indexes – On-Heap & Off-Heap Memory – Automatic RDBMS Integration Data Grid: JCache (JSR 107)
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Data Grid: Distributed Caching Partitioned Cache Replicated Cache
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • ANSI-99 SQL • Always Consistent • Fault Tolerant • In-Memory Indexes (On-Heap and Off-Heap) • Automatic Group By, Aggregations, Sorting • Cross-Cache Joins, Unions, etc. • Ad-Hoc SQL Support Data Grid: Ad-Hoc SQL (ANSI 99)
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. SQL Cross-Cache GROUP BY Example
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apache Ignite for Spark and Hadoop
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Automatic Resource Management • Easy Data Center Installation • Easy Data Center Configuration • On-Demand Elasticity DevOps: Integration with Yarn and Mesos
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • IgniteRDD Deployment Modes – Share RDD across tasks on the host – Share RDD across tasks in the application – Share RDD globally – Embedded vs External Deployments • Faster SQL – In-Memory Indexes – SQL on top of Shared RDD Share RDDs Across Spark Jobs
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Main Entry Point from Spark to Ignite • Specify Different Ignite Configurations • Embedded vs External Deployments – Client vs Server Modes IgniteContext
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Implementation of SparkRDD • Mutable (unlike native RDDs) • Partitioned over Ignite Partitioned Caches • Indexed SQL – Spark only does Full Scans – Indexes are 1000x faster IgniteRDD
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Ignite In-Memory File System (IGFS) – Hadoop-compliant – Easy to Install – On-Heap and Off-Heap – Caching Layer for HDFS – Write-through and Read-through HDFS – Performance Boost Ignite In-Memory File System
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Ignite In-Memory Map Reduce • In-Memory Native Performance • Zero Code Change • Use existing MR code • Use existing Hive queries • No Name Node • No Network Noise • In-Process Data Colocation • Eager Push Scheduling
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • More SQL – Non-Collocated Joins – Data Modification Language (DML) – Dada Definition Language (DDL) • More Drivers – JDBC (already in Ignite 1.5) – ODBC (Ignite 1.6) Apache Ignite Roadmap
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Interactive SQL with Apache Zeppelin
Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. ANY QUESTIONS? Thank you for joining us. Follow the conversation. http://www.ignite.apache.org @apacheignite @dsetrakyan

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ignite™

  • 1.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. DMITRIY SETRAKYAN GridGain Founder & Chief Product Officer Apache Ignite PMC Apache IgniteTM - In-Memory Data Fabric Fast Data Meets Open Source http://ignite.apache.org @apacheignite @dsetrakyan
  • 2.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Agenda • Apache Ignite(tm) Overview • Data Grid • Partitioning Schemes • SQL • Shared Memory Layer • Share Spark RDDs • In-Memory File System • DevOps: Yarn and Mesos • Faster MapReduce & Hive • Ignite MapReduce • Demo using Apache Zeppelin • Q & A
  • 3.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Very Active Community • Great Way to Learn Distributed Computing • How To Contribute: – https://ignite.apache.org/community/contr ibute.html#contribute – https://cwiki.apache.org/confluence/displa y/IGNITE/How+to+Contribute We Are Hiring!
  • 4.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apache IgniteTM In-Memory Data Fabric: Strategic Approach to IMC • Supports Applications of various types and languages • Open Source – Apache 2.0 • Simple Java APIs • 1 JAR Dependency • High Performance & Scale • Automatic Fault Tolerance • Management/Monitoring • Runs on Commodity Hardware • Supports existing & new data sources • No need to rip & replace
  • 5.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apache Ignite In-Memory Data Fabric
  • 6.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Long Running Applications – Passing State Between Jobs • Disk File System (HDFS?) – Convert RDDs to Disk Files and Back – Argh#$% • Share RDDs In-Memory – Native Spark API – Native Spark Transformations Why Share State in Spark?
  • 7.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • In-Memory Key-Value Store – Good for Caching Tuples • Foundation for Shared Memory State – IgniteRDD is based on Data Grid – Ignite File System is based on Data Grid • On-Heap & Off-Heap Memory • In-Memory Indexes – Fast SQL • Built for High Throughput and Low Latencies Why Data Grid?
  • 8.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • JCache (JSR 107) – In-Memory Key-Value Store – Basic Cache Operations – ConcurrentMap APIs – Collocated Processing (EntryProcessor) – Events and Metrics – Pluggable Persistence • Ignite Data Grid – ACID Transactions – SQL Queries (ANSI 99) – In-Memory Indexes – On-Heap & Off-Heap Memory – Automatic RDBMS Integration Data Grid: JCache (JSR 107)
  • 9.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Data Grid: Distributed Caching Partitioned Cache Replicated Cache
  • 10.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • ANSI-99 SQL • Always Consistent • Fault Tolerant • In-Memory Indexes (On-Heap and Off-Heap) • Automatic Group By, Aggregations, Sorting • Cross-Cache Joins, Unions, etc. • Ad-Hoc SQL Support Data Grid: Ad-Hoc SQL (ANSI 99)
  • 11.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. SQL Cross-Cache GROUP BY Example
  • 12.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Apache Ignite for Spark and Hadoop
  • 13.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Automatic Resource Management • Easy Data Center Installation • Easy Data Center Configuration • On-Demand Elasticity DevOps: Integration with Yarn and Mesos
  • 14.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • IgniteRDD Deployment Modes – Share RDD across tasks on the host – Share RDD across tasks in the application – Share RDD globally – Embedded vs External Deployments • Faster SQL – In-Memory Indexes – SQL on top of Shared RDD Share RDDs Across Spark Jobs
  • 15.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Main Entry Point from Spark to Ignite • Specify Different Ignite Configurations • Embedded vs External Deployments – Client vs Server Modes IgniteContext
  • 16.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Implementation of SparkRDD • Mutable (unlike native RDDs) • Partitioned over Ignite Partitioned Caches • Indexed SQL – Spark only does Full Scans – Indexes are 1000x faster IgniteRDD
  • 17.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • Ignite In-Memory File System (IGFS) – Hadoop-compliant – Easy to Install – On-Heap and Off-Heap – Caching Layer for HDFS – Write-through and Read-through HDFS – Performance Boost Ignite In-Memory File System
  • 18.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Ignite In-Memory Map Reduce • In-Memory Native Performance • Zero Code Change • Use existing MR code • Use existing Hive queries • No Name Node • No Network Noise • In-Process Data Colocation • Eager Push Scheduling
  • 19.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. • More SQL – Non-Collocated Joins – Data Modification Language (DML) – Dada Definition Language (DDL) • More Drivers – JDBC (already in Ignite 1.5) – ODBC (Ignite 1.6) Apache Ignite Roadmap
  • 20.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Interactive SQL with Apache Zeppelin
  • 21.
    Apache®, Apache Ignite,Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. ANY QUESTIONS? Thank you for joining us. Follow the conversation. http://www.ignite.apache.org @apacheignite @dsetrakyan