Mongo DB schema design patterns

MongoDB PRESENTED BY Jörg Reichert Licensed under cc-by v3.0 (any jurisdiction)

Introduction ● Name derived from humongous (= gigantic) ● NoSQL (= not only SQL) database ● Document oriented database – documents stored as binary JSON (BSON) ● Ad-hoc queries ● Server side Javascript execution ● Aggregation / MapReduce ● High performance, availability, scalability

MongoDB Relational vs. document based: concepts SQL Person Name AddressId MongoDB 1 2 Mueller 1 Id Address City Street 1 2 <null> 2 Leipzig Burgstr. 1 Dresden <null> Person { _id: ObjectId(“...“), Name: “Mueller“, Address: { City: “Leipzig“, Street: “Burgstr. 1“, }, }, { _id: ObjectId(“...“), Address: { City: “Leipzig“, }, } DB DB Table CollectionColumn Row Document Key: Value FieldPK FK Relation Embedded document PK PK: primary key, FK: foreign key

MongoDB SELECT * FROM Person; SELECT * FROM Person WHERE name = “Mueller“; SELECT * FROM Person WHERE name like “M%“; SELECT name FROM Person; SELECT distinct(name) FROM Person WHERE name = “Mueller“; Relational vs. document based: syntax (1/3) db.getCollection(“Person“).find(); db.Person.find({ “name“: "Mueller“ }); db.Person.find({ “name“: /M.*/ }); db.Person.find({}, {name: 1, _id: 0}); db.Person.distinct( “name“, { “name“: "Mueller“ });

MongoDB SELECT * FROM Person WHERE id > 10 AND name <> “Mueller“; SELECT p.name FROM Person p JOIN Address a ON p.address = a.id WHERE a.city = “Leipzig“ ORDER BY p.name DESC; SELECT * FROM WHERE name IS NOT NULL; SELECT COUNT(*) FROM PERSON WHERE name = “Mueller“; Relational vs. document based: syntax (2/3) db.Person.find({ $and: [ { _id: { $gt: ObjectId("...") }}, { name: { $ne: "Mueller" }}]}); db.Person.find( { Address.city: “Leipzig“ }, { name: 1, _id: 0 } ).sort({ name: -1 }); db.Person.find( { name: { $not: { $type: 10 }, $exists: true }}); db.Person.count({ name: “Mueller“ }); db.Person.find( { name: “Mueller“ }).count();

MongoDB UPDATE Person SET name = “Müller“ WHERE name = “Mueller“; DELETE Person WHERE name = “Mueller“; INSERT Person (name, address) VALUES (“Mueller“, 3); ALTER TABLE PERSON DROP COLUMN name; DROP TABLE PERSON; Relational vs. document based: syntax (3/3) db.Person.updateMany( { name: “Mueller“ }, { $set: { name: “Müller“} }); db.Person.remove( { name: “Mueller“ } ); db.Person.insert( { name: “Mueller“, Address: { … } }); db.Person.updateMany( {}, { $unset: { name: 1 }} ); db.Person.drop();

MongoDB ● principle of least cardinality ● Store what you query for schema design principles

MongoDB ● applicable for 1:1 and 1:n when n can‘t get to large ● Embedded document cannot get too large ● Embedded document not very likely to change ● arrays that grow without bound should never be embedded schema design: embedded document { _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ { Name: “Mueller“, }, { Name: “Schneider“, }, ] } Address

MongoDB ● applicable for :n when n can‘t get to large ● Referenced document likely to change often in future ● there are many referenced documents expected, so storing only the reference is cheaper ● there are large referenced documents expected, so storing only the reference is cheaper ● arrays that grow without bound should never be embedded ● Address should be accessible on its own schema design: referencing { _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ ObjectId(“...“), ObjectId(“...“), ] } { _id: ObjectId(“...“), Name: “Mueller“, } Address Person

MongoDB ● applicable for :n relations when n can get very large (note: a MongoDB document isn‘t allowed to exceed 16MB) ● Joins are done on application level schema design: parent-referencing { _id: ObjectId(“...“), City: “Dubai“, Street: “1 Sheikh Mohammed bin Rashid Blvd“, } { _id: ObjectId(“...“), Name: “Mueller“, Address: ObjectId(“...“), } Address Person

MongoDB ● applicable for m:n when n and m can‘t get to large and application requires to navigate both ends ● disadvantage: need to update operations when changing references schema design: two way referencing { _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ ObjectId(“...“), ObjectId(“...“), ] } { _id: ObjectId(“...“), Name: “Mueller“, Address: [ ObjectId(“...“), ObjectId(“...“), ] } Address Person

MongoDB ● queries expected to filter by certain fields of the referenced document, so including this field already in the hosts saves an additional query at application level ● disadvantage: two update operations for duplicated field ● disadvantage: additional memory consumption schema design: denormalization { _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, } { _id: ObjectId(“...“), Name: “Mueller“, Address: [ { id: ObjectId(“...“), city: “Leipzig“, }, ... ] } Address Person

MongoDB ● applicable for :n relations when n can get very large and it‘s expected that application will use pagination anyway ● DB schema will already create the chunks, the application will later query for schema design: bucketing { _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, } { _id: ObjectId(“...“), Address: ObjectId(“...“), Page: 13, Count: 50, Persons: [ { Name: “Mueller“ }, ... ] } Address Person

MongoDB Aggregation Framework ● Aggregation pipeline consisting of (processing) stages – $match, $group, $project, $redact, $unwind, $lookup, $sort, ... ● Aggregation operators – Boolean: $and, $or, $not – Aggregation: $eq, $lt, $lte, $gt, $gte, $ne, $cmp – Arithmetic: $add, $substract, $multiply, $divide, ... – String: $concat, $substr, … – Array: $size, $arrayElemAt, ... – Aggregation variable: $map, $let – Group Accumulator: $sum, $avg, $addToSet, $push, $min, $max $first, $last, … – ...

MongoDB Aggregation Framework db.Person.aggregate( [ { $match: { name: { $ne: "Fischer" } } }, { $group: { _id: "$name", city_occurs: { $addToSet: "$Address.city" } } }, { $project: { _id: "$_id", city_count: { $size: "$city_occurs" } }}, { $sort: { name: 1 } } { $match: { city_count: { $gt: 1 } }}, { $out: "PersonCityCount"} ] ); PersonCityCount { _id: Mueller, city_count: 2, }, { _id: Schmidt, city_count: 3, }, ...

MongoDB Map-Reduce ● More control than aggregation framework, but slower var map = function() { if(this.name != "Fischer") emit(this.name, this.Address.city); } var reduce = function(key, values) { var distinct = []; for(value in values) { if(distinct.indexOf(value) == -1) distinct.push(value); } return distinct.length; } db.Person.mapReduce(map, reduce, { out: "PersonCityCount2" });

MongoDB ● Default _id index, assuring uniqueness ● Single field index: db.Person.createIndex( { name: 1 } ); ● Compound index: db.Address.createIndex( { city: 1, street: -1 } ); – index sorts first asc. by city then desc. by street – Index will also used when query only filters by one of the fields ● Multikey index: db.Person.createIndex( { Address.city: 1 } ) – Indexes content stored in arrays, an index entry is created foreach ● Geospatial index ● Text index ● Hashed index Indexes

MongoDB ● uniqueness: insertion of duplicate field value will be rejected ● partial index: indexes only documents matching certain filter criteria ● sparse index: indexes only documents having the indexed field ● TTL index: automatically removes documents after certain time ● Query optimization: use db.MyCollection.find({ … }).explain() to check whether query is answered using an index, and how many documents had still to be scanned ● Covered queries: if a query only contains indexed fields, the results will delivered directly from index without scanning or materializing any documents ● Index intersection: can apply different indexes to cover query parts Index properties

MongoDB ● Since MongoDB 3.0 WiredTiger is the default storage engine – locking at document level enables concurrent writes on collection – durability ensured via write-ahead transaction log and checkpoints ( Journaling) – supports compression of collections and indexes (via snappy or zlib) ● MMAPv1 was the default storage until MongoDB 3.0 – since MongoDB 3.0 supports locking at collection level, before only database level – useful for selective updates, as WiredTiger always replace the hole document in a update operation Storage engines

MongoDB Clustering, Sharding, Replication Shard 1 Primary (mongod) Secondary (mongod) Secondary (mongod) Config server (replica set) App server (mongos) Client app (driver) Heartbeat Replication Replication writes reads

MongoDB Shard key selection Shard 1 Shard 2 Shard 3 { key: 12, ... } { key: 21, ... } { key: 35, ... } min <= key < 15 15 <= key < 30 30 <= key < max Sharded Collection (Hash function)

MongoDB ● ACID → MongoDB is compliant to this only at document level – Atomicity – Consistency – Isolation – Durability ● CAP → MongoDB assures CP – Consistency – Availability – Partition tolerance transactions BASE: Basically Available, Soft state, Eventual consistency MongoDB doesn't support transactions multi document updates can be performed via Two-Phase-Commit

MongoDB ● Javascript: Mongo Node.js driver ● Java: Java MongoDB Driver ● Python: PyMongo, Motor (async) ● Ruby: MongoDB Ruby Driver ● C#: Mongo Csharp Driver ● ... Driver Object-document mappers ● Javascript: mongoose, Camo, MEAN.JS ● Java: Morphia, SpringData MongoDB ● Python: Django MongoDB engine ● Ruby: MongoMapper, Mongoid ● C#: LinQ ● ...

MongoDB ● CKAN ● MongoDB-Hadoop connector ● MongoDB Spark connector ● MongoDB ElasticSearch/Solr connector ● ... Extensions and connectors Tool support ● Robomongo ● MongoExpress ● ...

MongoDB ● Who uses MongoDB ● Case studies ● Arctic TimeSeries and Tick store ● uptime Real world examples MongoDB in Code For Germany projects ● Politik bei uns (Offenes Ratsinformationssystem), gescrapte Stadtratsdaten werden gemäß dem OParl-Format in einer MongoDB gespeichert, siehe auch Daten, Web-API und Oparl-Client

MongoDB ● Choose – mass data processing, like event data – dynamic scheme ● Not to choose – static scheme with lot of relations – strict transaction requirements When to choose, when not to choose

MongoDB ● MongoDB Schema Simulation ● 6 Rules of Thumb for MongoDB Schema Design ● MongoDB Aggregation ● MongoDB Indexes ● Sharding ● MongoDB University ● Why Relational Databases are not the Cure-All Links

Mongo DB schema design patterns

More Related Content

What's hot

Similar to Mongo DB schema design patterns

More from joergreichert

Recently uploaded

In this document

Mongo DB schema design patterns