Apache ignite as in-memory computing platform

S U R I N D E R 2 N D M A R C H 2 0 2 2 Apache Ignite

Agenda  Setting up context  Cache Evolution  Apache Ignite  Data Queries  Compute  Data Partitioning  Eviction policies  Performance Comparison

Stream Consuming Application Too many read, write and updates to database Limited connections Can slow down stream under load

Stream Consuming Application: 1 Cache serves as first data layer Manage persisting data to database Processing much faster due to no direct DB access

Stream Consuming Application cont… Cache serves as first class in memory data database Manage persisting data to native storage No DB connections, mechanism overhead

Cache Evolution  Distributed caches  Shared cache for app instances  Beyond local RAM capacity  Ease of maintenance  No auto sync with DB(yes/no) ?  In App caches  Cache results  More responsive application  Reduce load on DB  Limited to local RAM size

Cache Evolution : Data grids  Benefits  Distributed caches with brains  Compute capabilities  DB Read/Write through  Collocated processing  Better scalability

Cache Evolution : In memory computing  Memory centric storage  Scalable to store data in TBs  Sql, transactions support  Collocate related data  DB Read/Write through  Pluggable to ext databases  Native storage on disk  No Ram warm up  Compute capabilities  Map Reduce  Collocated processing  Better scalability

What is Apache Ignite ?  A distributed cache  A Distributed in memory data grid  A Distributed in memory database  High-performance computing with in-memory  ANSI 99 SQL Compliant  Transactional operations  SQL transactions in beta

Ignite cluster  Group of nodes  Types:  Server : stores data, baseline node  Thick client node : doesn’t store data  Thin client node : not part of cluster  Attribute based grouping possible  Scalable  Fault tolerant  Data consistency  Demo

Data Grid  Distributed In-Memory Caching  Read/Write through  Data Consistency  Off-Heap Storage  Distributed SQL  ACID Support  Transactions

Keep required backup Everyone knows everything Cache Modes

Cache Queries…  Scan Query : Return data matching BiPredicate  Predicate sent to each node,  Node scan its cache  Data consolidated by requested node  Sql Query : load data based on sql given  Needs indexing to be enabled  Registering indexing in config  Annotations for fields visibility  Other queries:  Text Query  Index query  Continuous query

Data Partitioning  Partitioned caches  Backups  Ensures data availability in node failures  Read from backup node when primary node leaves  Demo

Demo Queries  Scan Query  Sql Query  Data collocation  Next week : this slide onwards

Data collocation  Collocate related data for performance  All Employees of dept. can be stored together  Affinity on dept. attribute  Only key attribute can be used in affinity key  Performant CRUD operations  Avoids network trips  Reduced latency  Can cause hot nodes if used inappropriately

Compute Tasks  Run distributed computations on grid  Tasks can be run on selected nodes  Ignite manages the task management  E.g. node specific aggregates  List each dept.. students stored on each node  Can be parallelized

Continuous Queries  Exactly once processing semantic  3 basic components  Cache to monitor updates  Remote filter to look for data changes  Local listener to act upon data changes  Optional initial query to process initial data  Used to capture data changes on cache  Use case: Reacting to cache entry change  Listen for particular state of cache value  Process the state  Move to next state

Eviction Policies  On Heap [cache level]  LRU : Recommended when in doubt  FIFO : It ignores the element access order  Sorted : Sorted according to key for order  Off Heap [data region level]  Random LRU:  Random-2 LRU  Persistence On [Page replacement]  Random-LRU  Segmented-LRU  Clock

Persistent Store  CacheStoreAdapter extendable  Read through  Write through  Write behind  Works behind the cache API’s

Data Distribution  Why distributing data ?  Data size can go beyond node limits  Load beyond node processing limits  Solutions:  partition the dataset  Migrate to distributed database  Both will have set of nodes : topology

Data Distribution Soln.  Distribution Requirements:  Algorithm  Distribution Uniformity  Minimal disruption  Approaches:  Mod N  Consistent Hashing  Rendezvous(HRW)

Data Distribution in Ignite  Mapping partition to node  Rendezvous Hashing  Cluster changes moves partitions  Mapping key to partition  Mod N  Partitions are fixed  1024 by default

Data Rebalancing  Used when new node join the grid  In memory grids start rebalancing immediately  Enabled manually when persistence is enabled  Possibly more backups than configured in such scenarios  Rebalance Modes  SYNC: cache calls blocked until rebalancing is completed  ASYNC: rebalancing happen in background. Cache respond immediately  NONE : No rebalancing, cache loaded on demand when required or explicitly loading

Partition Map Exchange  Triggered when partitions need to moved across nodes  A node joins/leaves the cluster  New cache is created/stopped  An index is created etc.  Cluster waits for ongoing operations  Oldest/youngest node is coordinator

Native Storage Architecture  Work directory  Binary data : internal metadata  Marshaler : marshaler info  DB  Lock file : used to ensure node lock  node dir.(s) : cache partitions  cp dir. (checkpoint start end markers)  WAL dir.  node(s) dir. : wal segments  Archive dir.  Node(s) dir. : wal segments

Dirty Pages  Pages are always on disk, optionally in RAM  Each cache update is written to RAM and appended to WAL  Cache operation cause dirty pages  Dirty pages are accumulated in RAM  Checkpoint: batch of dirty pages written to disk  WAL file cleared after checkpoint  Updates between checkpoints are logged  Nodes crashes between checkpoints ?  WAL to the rescue

Apache Ignite ~ Cassandra  Insert and Update performance is comparable  Read and mixed(read + update) are 2x+ better in ignite  Cassandra UPADTE outperforms under high load  Cassandra demands upfront query patterns  Major model changes/new tables if  Query changes required  New queries with different requirements needed  Ignite support collocated/non collocated joins and hence  Queries can be created just like old school sql  No major changes required except creating few indexes if needed  Check reference slide for more

Next steps  Read docs  Get hands dirty with ignite  Explore queries  Ignite compute tasks  Native persistence  Third party persistence

References  https://ignite.apache.org/docs/latest/  https://www.youtube.com/watch?v=eMs_2vEsbBk  https://dzone.com/articles/apache-ignite-client-connectors-variety  https://apacheignite.readme.io/docs/leader-election  https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc hange+-+under+the+hood   https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs- distributed-cache-which-is-best/  https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/  https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher- cassandratm-benchmarks-power-in-memory-computing

Apache ignite as in-memory computing platform

More Related Content

What's hot

Similar to Apache ignite as in-memory computing platform

Recently uploaded

Apache ignite as in-memory computing platform

Editor's Notes