SlideShare a Scribd company logo
Deep dive into the native multi-model database ArangoDB Frank Celler Percona Live 2016, Santa Clara, 20 April 2016 www.arangodb.com
Overview
is a multi-model Database Features is a document store, a key/value store and a graph database, offers convenient queries (via HTTP/REST and AQL), including joins between different collections, and graph queries, with configurable consistency guarantees using transactions.
is a multi-model Database Features is a document store, a key/value store and a graph database, offers convenient queries (via HTTP/REST and AQL), including joins between different collections, and graph queries, with configurable consistency guarantees using transactions. =⇒ Allows polyglot persistence with multiple instances of a single technology.
is extensible by JavaScript Code The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine.
is extensible by JavaScript Code The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: custom-made complex queries or authorizations schema-validation push feeds, etc.
is a Data Center Operating System App These days, computing clusters run Data Center Operating Systems.
is a Data Center Operating System App These days, computing clusters run Data Center Operating Systems. Idea Distributed applications can be deployed as easily as one installs a mobile app on a phone.
is a Data Center Operating System App These days, computing clusters run Data Center Operating Systems. Idea Distributed applications can be deployed as easily as one installs a mobile app on a phone. Cluster resource management is automatic. This leads to significantly better resource utilization. Fault tolerance, self-healing and automatic failover is guaranteed.
Details
The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models.
The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models. Important: is able to compete with specialised products on their turf allows for polyglot persistence using a single database technology In a microservice architecture, there will be several different deployments.
performance https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/
Why is multi-model possible at all? Document stores and key/value stores Document stores: have primary key, are key/value stores. Without using secondary indexes, performance is nearly as good as with opaque data instead of JSON. Good horizontal scalability can be achieved for key lookups.
horizontal scalability Experiment: Single document writes (1kB / doc) on cluster of sizes 8 to 80 machi- nes (64 to 640 vCPUs), another 4 to 40 load servers, running on AWS. https://mesosphere.com/blog/2015/11/30/arangodb-benchmark-dcos/
Why is multi-model possible at all? Document stores and graph databases Graph database: would like to associate arbitrary data with vertices and edges, so JSON documents are a good choice. A good edge index, giving fast access to neighbours. This can be a secondary index. Graph support in the query language. Implementations of graph algorithms in the DB engine. https://www.arangodb.com/2016/04/ index-free-adjacency-hybrid-indexes-graph-databases/
Replication and Sharding ArangoDB provides (Version 2.8, January 2016) Sharding with automatic data distribution, easy setup of (asynchronous) replication (cluster and single), fault tolerance by automatic failover, full integration with Apache Mesos and Mesosphere DC/OS.
Replication and Sharding ArangoDB provides (Version 2.8, January 2016) Sharding with automatic data distribution, easy setup of (asynchronous) replication (cluster and single), fault tolerance by automatic failover, full integration with Apache Mesos and Mesosphere DC/OS. Work in progress (Version 3.0, RC in April 2016): synchronous replication in cluster mode, zero administration by a self-repairing and self-balancing cluster architecture.
Data-Center Operating Systems Resource Management Installation should be as easy as possible integration into the resource management of data-center gives better resource utilisation, full integration with Apache Mesos and Mesosphere DC/OS
Data-Center Operating Systems Resource Management Installation should be as easy as possible integration into the resource management of data-center gives better resource utilisation, full integration with Apache Mesos and Mesosphere DC/OS Work in progress Mesosphere DC/OS a very mature, Open-Source solution later this year integration also for Kubernetes, Docker-Swarm
About Mesosphere’s DC/OS https://dcos.io
Installing Mesosphere’s DC/OS https://dcos.io
Installing Mesosphere’s DC/OS https://dcos.io
Powerful query language AQL The built in Arango Query Language allows complex, powerful and convenient queries, with transaction semantics, allowing to do joins, AQL is independent of the driver used and offers protection against injections by design.
Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine.
Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: complex queries or authorizations, schema-validation, push feeds, etc. easy deployment via web interface or REST API, automatic API description through Swagger =⇒ discoverability of services.
Use Cases
Use case: Aircraft fleet management
Use case: Aircraft fleet management One of our customers uses ArangoDB to store each part, component, unit or aircraft as a document model containment as a graph thus can easily find all parts of some component keep track of maintenance intervals perform queries orthogonal to the graph structure thereby getting good efficiency for all needed queries http://radar.oreilly.com/2015/07/ data-modeling-with-multi-model-databases.html
Use case: rights management
Use case: rights management Right managements in relational model is hard: looks like a forest at first then exceptions pop-up one company sub-contracts another for a special station an engineer works for two companies some-one needs special permissions when being a proxy much easier expressed as graph structure
Use case: e-commerce
Use case: e-commerce AboutYou uses ArangoDB to create channels showing new products allow recommendation to friends celebrities presenting new fashion blog about fashion products nightly business analysis news stream https://www.arangodb.com/case-studies/ aboutyou-data-driven-personalization-with-arangodb/
Action
First deployment: a simple key/value store A key/value store One collection “data”, indexes on “value” (sorted) and “name” (hash). Single document requests Indexes possible Range queries possible
Second deployment: a Microservice as a Foxx app A Foxx Microservice Simple TODO app, deployed from app store with web UI. REST/JSON API available Swagger generates API description automatically
Third deployment: a single server graph database A Graph Database Graph “worldCountry” with vertex collection “worldVertex” and edge collection “worldEdges”, links from cities to countries to continents to world. Show some graph traversals. Show graph viewer.
Fourth deployment: a multi-model application A multi-model database Some data from a web shop. Show some queries.
AQL Internals
Life of a query Text and query parameters come from user
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST)
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc.
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP)
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers
Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers Execute plan, provide cursor API
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”???
Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline.
Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame
Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan
Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons
Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x
Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10
Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10
Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10
Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10
Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a
Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a
Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton Sort a.y, a.x Return a IndexRange a
Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton Return a IndexRange a
Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers.
Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution
Scatter/gather EnumerateCollection
Scatter/gather Remote EnumShard Remote Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter
Scatter/gather Remote EnumShard Remote Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter
Links https://www.arangodb.com https://docs.arangodb.com/cookbook/index.html https://github.com/ArangoDB/guesser http://mesos.apache.org/ https://mesosphere.com/ https://mesosphere.github.io/marathon/ https://dcos.io

More Related Content

What's hot (20)

PDF
Introduction and overview ArangoDB query language AQL
ArangoDB Database
 
PDF
Introduction to ArangoDB (nosql matters Barcelona 2012)
ArangoDB Database
 
PDF
Introduction to Foxx by our community member Iskandar Soesman @ikandars
ArangoDB Database
 
PDF
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
PDF
Multi-model databases and node.js
Max Neunhöffer
 
PDF
FOXX - a Javascript application framework on top of ArangoDB
ArangoDB Database
 
PDF
Scaling ArangoDB on Mesosphere DCOS
Max Neunhöffer
 
PDF
AvocadoDB query language (DRAFT!)
avocadodb
 
PDF
guacamole: an Object Document Mapper for ArangoDB
Max Neunhöffer
 
PDF
ArangoDB – A different approach to NoSQL
ArangoDB Database
 
PDF
Introduction to column oriented databases
ArangoDB Database
 
PDF
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
ArangoDB Database
 
PDF
An introduction to multi-model databases
Berta Hermida Plaza
 
PPTX
ReactJS
Ram Murat Sharma
 
PPTX
MongoDB - A next-generation database that lets you create applications never ...
Ram Murat Sharma
 
PDF
Complex queries in a distributed multi-model database
Max Neunhöffer
 
PDF
Experience with C++11 in ArangoDB
Max Neunhöffer
 
PDF
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 
PDF
ArangoDB
ArangoDB Database
 
Introduction and overview ArangoDB query language AQL
ArangoDB Database
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
ArangoDB Database
 
Introduction to Foxx by our community member Iskandar Soesman @ikandars
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
Multi-model databases and node.js
Max Neunhöffer
 
FOXX - a Javascript application framework on top of ArangoDB
ArangoDB Database
 
Scaling ArangoDB on Mesosphere DCOS
Max Neunhöffer
 
AvocadoDB query language (DRAFT!)
avocadodb
 
guacamole: an Object Document Mapper for ArangoDB
Max Neunhöffer
 
ArangoDB – A different approach to NoSQL
ArangoDB Database
 
Introduction to column oriented databases
ArangoDB Database
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
ArangoDB Database
 
An introduction to multi-model databases
Berta Hermida Plaza
 
MongoDB - A next-generation database that lets you create applications never ...
Ram Murat Sharma
 
Complex queries in a distributed multi-model database
Max Neunhöffer
 
Experience with C++11 in ArangoDB
Max Neunhöffer
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB Database
 

Viewers also liked (20)

PDF
Why Plone Will Die
Andreas Jung
 
PDF
Why we love ArangoDB. The hunt for the right NosQL Database
Andreas Jung
 
PDF
Polyglot Persistence & Multi-Model Databases
ArangoDB Database
 
PDF
Creating Fault Tolerant Services on Mesos
ArangoDB Database
 
PDF
NoSQL-Datenbanken am Beispiel CouchDB
Kerstin Puschke
 
PDF
Software + Babies
ArangoDB Database
 
PDF
Domain driven design @FrOSCon
ArangoDB Database
 
PDF
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
ArangoDB Database
 
PDF
Extensibility of a database api with js
ArangoDB Database
 
PDF
Guacamole
ArangoDB Database
 
PDF
Creating data centric microservices
ArangoDB Database
 
PDF
Microservice-based software architecture
ArangoDB Database
 
PDF
Polyglot Persistence & Multi-Model Databases (FullStack Toronto)
ArangoDB Database
 
PDF
Processing large-scale graphs with Google(TM) Pregel
ArangoDB Database
 
PDF
Performance comparison: Multi-Model vs. MongoDB and Neo4j
ArangoDB Database
 
PDF
Handling Billions of Edges in a Graph Database
ArangoDB Database
 
PDF
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Big Data Spain
 
PPT
Optimal Algorithm
guest628caa
 
PPT
Lru Counter
guest628caa
 
PPT
Lru Stack
guest628caa
 
Why Plone Will Die
Andreas Jung
 
Why we love ArangoDB. The hunt for the right NosQL Database
Andreas Jung
 
Polyglot Persistence & Multi-Model Databases
ArangoDB Database
 
Creating Fault Tolerant Services on Mesos
ArangoDB Database
 
NoSQL-Datenbanken am Beispiel CouchDB
Kerstin Puschke
 
Software + Babies
ArangoDB Database
 
Domain driven design @FrOSCon
ArangoDB Database
 
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
ArangoDB Database
 
Extensibility of a database api with js
ArangoDB Database
 
Creating data centric microservices
ArangoDB Database
 
Microservice-based software architecture
ArangoDB Database
 
Polyglot Persistence & Multi-Model Databases (FullStack Toronto)
ArangoDB Database
 
Processing large-scale graphs with Google(TM) Pregel
ArangoDB Database
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
ArangoDB Database
 
Handling Billions of Edges in a Graph Database
ArangoDB Database
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Big Data Spain
 
Optimal Algorithm
guest628caa
 
Lru Counter
guest628caa
 
Lru Stack
guest628caa
 
Ad

Similar to Deep dive into the native multi model database ArangoDB (20)

PDF
Oslo bekk2014
Max Neunhöffer
 
PDF
Is multi-model the future of NoSQL?
Max Neunhöffer
 
PDF
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
PDF
Multi model-databases
Michael Hackstein
 
PDF
Oslo baksia2014
Max Neunhöffer
 
PDF
An introduction to multi-model databases
ArangoDB Database
 
PDF
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
PDF
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Big Data Spain
 
PDF
Fishing Graphs in a Hadoop Data Lake
DataWorks Summit/Hadoop Summit
 
PDF
Backbone using Extensible Database APIs over HTTP
Max Neunhöffer
 
PDF
An E-commerce App in action built on top of a Multi-model Database
ArangoDB Database
 
PDF
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
PPTX
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
PPTX
MongoDB: An Introduction - june-2011
Chris Westin
 
PDF
Building powerful apps with ArangoDB & KeyLines
Cambridge Intelligence
 
PPTX
MongoDB: An Introduction - July 2011
Chris Westin
 
PDF
Mongo db transcript
foliba
 
PPTX
NOSQL and MongoDB Database
Tariqul islam
 
PPT
Wmware NoSQL
Murat Çakal
 
PDF
Implementing data center to data center replication for a distributed database
J On The Beach
 
Oslo bekk2014
Max Neunhöffer
 
Is multi-model the future of NoSQL?
Max Neunhöffer
 
Fishing Graphs in a Hadoop Data Lake
ArangoDB Database
 
Multi model-databases
Michael Hackstein
 
Oslo baksia2014
Max Neunhöffer
 
An introduction to multi-model databases
ArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
ArangoDB Database
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Big Data Spain
 
Fishing Graphs in a Hadoop Data Lake
DataWorks Summit/Hadoop Summit
 
Backbone using Extensible Database APIs over HTTP
Max Neunhöffer
 
An E-commerce App in action built on top of a Multi-model Database
ArangoDB Database
 
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
MongoDB: An Introduction - june-2011
Chris Westin
 
Building powerful apps with ArangoDB & KeyLines
Cambridge Intelligence
 
MongoDB: An Introduction - July 2011
Chris Westin
 
Mongo db transcript
foliba
 
NOSQL and MongoDB Database
Tariqul islam
 
Wmware NoSQL
Murat Çakal
 
Implementing data center to data center replication for a distributed database
J On The Beach
 
Ad

More from ArangoDB Database (18)

PPTX
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
PPTX
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
PPTX
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
PPTX
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
PDF
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
PDF
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
PDF
Graph Analytics with ArangoDB
ArangoDB Database
 
PDF
Getting Started with ArangoDB Oasis
ArangoDB Database
 
PDF
Custom Pregel Algorithms in ArangoDB
ArangoDB Database
 
PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
PDF
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
PDF
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
PDF
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
PDF
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
PDF
3.5 webinar
ArangoDB Database
 
PDF
Running complex data queries in a distributed system
ArangoDB Database
 
PPTX
Are you a Tortoise or a Hare?
ArangoDB Database
 
PDF
The Computer Science Behind a modern Distributed Database
ArangoDB Database
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
ArangoDB Database
 
Graph Analytics with ArangoDB
ArangoDB Database
 
Getting Started with ArangoDB Oasis
ArangoDB Database
 
Custom Pregel Algorithms in ArangoDB
ArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
ArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
3.5 webinar
ArangoDB Database
 
Running complex data queries in a distributed system
ArangoDB Database
 
Are you a Tortoise or a Hare?
ArangoDB Database
 
The Computer Science Behind a modern Distributed Database
ArangoDB Database
 

Recently uploaded (20)

PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

Deep dive into the native multi model database ArangoDB

  • 1. Deep dive into the native multi-model database ArangoDB Frank Celler Percona Live 2016, Santa Clara, 20 April 2016 www.arangodb.com
  • 3. is a multi-model Database Features is a document store, a key/value store and a graph database, offers convenient queries (via HTTP/REST and AQL), including joins between different collections, and graph queries, with configurable consistency guarantees using transactions.
  • 4. is a multi-model Database Features is a document store, a key/value store and a graph database, offers convenient queries (via HTTP/REST and AQL), including joins between different collections, and graph queries, with configurable consistency guarantees using transactions. =⇒ Allows polyglot persistence with multiple instances of a single technology.
  • 5. is extensible by JavaScript Code The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine.
  • 6. is extensible by JavaScript Code The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: custom-made complex queries or authorizations schema-validation push feeds, etc.
  • 7. is a Data Center Operating System App These days, computing clusters run Data Center Operating Systems.
  • 8. is a Data Center Operating System App These days, computing clusters run Data Center Operating Systems. Idea Distributed applications can be deployed as easily as one installs a mobile app on a phone.
  • 9. is a Data Center Operating System App These days, computing clusters run Data Center Operating Systems. Idea Distributed applications can be deployed as easily as one installs a mobile app on a phone. Cluster resource management is automatic. This leads to significantly better resource utilization. Fault tolerance, self-healing and automatic failover is guaranteed.
  • 11. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models.
  • 12. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models. Important: is able to compete with specialised products on their turf allows for polyglot persistence using a single database technology In a microservice architecture, there will be several different deployments.
  • 14. Why is multi-model possible at all? Document stores and key/value stores Document stores: have primary key, are key/value stores. Without using secondary indexes, performance is nearly as good as with opaque data instead of JSON. Good horizontal scalability can be achieved for key lookups.
  • 15. horizontal scalability Experiment: Single document writes (1kB / doc) on cluster of sizes 8 to 80 machi- nes (64 to 640 vCPUs), another 4 to 40 load servers, running on AWS. https://mesosphere.com/blog/2015/11/30/arangodb-benchmark-dcos/
  • 16. Why is multi-model possible at all? Document stores and graph databases Graph database: would like to associate arbitrary data with vertices and edges, so JSON documents are a good choice. A good edge index, giving fast access to neighbours. This can be a secondary index. Graph support in the query language. Implementations of graph algorithms in the DB engine. https://www.arangodb.com/2016/04/ index-free-adjacency-hybrid-indexes-graph-databases/
  • 17. Replication and Sharding ArangoDB provides (Version 2.8, January 2016) Sharding with automatic data distribution, easy setup of (asynchronous) replication (cluster and single), fault tolerance by automatic failover, full integration with Apache Mesos and Mesosphere DC/OS.
  • 18. Replication and Sharding ArangoDB provides (Version 2.8, January 2016) Sharding with automatic data distribution, easy setup of (asynchronous) replication (cluster and single), fault tolerance by automatic failover, full integration with Apache Mesos and Mesosphere DC/OS. Work in progress (Version 3.0, RC in April 2016): synchronous replication in cluster mode, zero administration by a self-repairing and self-balancing cluster architecture.
  • 19. Data-Center Operating Systems Resource Management Installation should be as easy as possible integration into the resource management of data-center gives better resource utilisation, full integration with Apache Mesos and Mesosphere DC/OS
  • 20. Data-Center Operating Systems Resource Management Installation should be as easy as possible integration into the resource management of data-center gives better resource utilisation, full integration with Apache Mesos and Mesosphere DC/OS Work in progress Mesosphere DC/OS a very mature, Open-Source solution later this year integration also for Kubernetes, Docker-Swarm
  • 24. Powerful query language AQL The built in Arango Query Language allows complex, powerful and convenient queries, with transaction semantics, allowing to do joins, AQL is independent of the driver used and offers protection against injections by design.
  • 25. Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine.
  • 26. Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: complex queries or authorizations, schema-validation, push feeds, etc. easy deployment via web interface or REST API, automatic API description through Swagger =⇒ discoverability of services.
  • 28. Use case: Aircraft fleet management
  • 29. Use case: Aircraft fleet management One of our customers uses ArangoDB to store each part, component, unit or aircraft as a document model containment as a graph thus can easily find all parts of some component keep track of maintenance intervals perform queries orthogonal to the graph structure thereby getting good efficiency for all needed queries http://radar.oreilly.com/2015/07/ data-modeling-with-multi-model-databases.html
  • 30. Use case: rights management
  • 31. Use case: rights management Right managements in relational model is hard: looks like a forest at first then exceptions pop-up one company sub-contracts another for a special station an engineer works for two companies some-one needs special permissions when being a proxy much easier expressed as graph structure
  • 33. Use case: e-commerce AboutYou uses ArangoDB to create channels showing new products allow recommendation to friends celebrities presenting new fashion blog about fashion products nightly business analysis news stream https://www.arangodb.com/case-studies/ aboutyou-data-driven-personalization-with-arangodb/
  • 35. First deployment: a simple key/value store A key/value store One collection “data”, indexes on “value” (sorted) and “name” (hash). Single document requests Indexes possible Range queries possible
  • 36. Second deployment: a Microservice as a Foxx app A Foxx Microservice Simple TODO app, deployed from app store with web UI. REST/JSON API available Swagger generates API description automatically
  • 37. Third deployment: a single server graph database A Graph Database Graph “worldCountry” with vertex collection “worldVertex” and edge collection “worldEdges”, links from cities to countries to continents to world. Show some graph traversals. Show graph viewer.
  • 38. Fourth deployment: a multi-model application A multi-model database Some data from a web shop. Show some queries.
  • 40. Life of a query Text and query parameters come from user
  • 41. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST)
  • 42. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters
  • 43. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc.
  • 44. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP)
  • 45. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs
  • 46. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster
  • 47. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs
  • 48. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost
  • 49. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine
  • 50. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers
  • 51. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers Execute plan, provide cursor API
  • 52. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP
  • 53. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies
  • 54. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline
  • 55. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API
  • 56. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline
  • 57. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”???
  • 58. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline.
  • 59. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame
  • 60. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan
  • 61. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons
  • 62. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x
  • 63. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10
  • 64. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10
  • 65. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10
  • 66. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10
  • 67. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
  • 68. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
  • 69. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
  • 70. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...}
  • 71. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a
  • 72. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a
  • 73. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton Sort a.y, a.x Return a IndexRange a
  • 74. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton Return a IndexRange a
  • 75. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers.
  • 76. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution