Schema Design with MongoDB

open-source, high-performance, document-oriented database

Schema Design Basics roger@10Gen.com

This talk ‣Intro -Terms / Deﬁnitions

This talk ‣Intro -Terms / Deﬁnitions ‣Getting a ﬂavor -Creating a Schema -Indexes -Evolving the Schema

This talk ‣Intro -Terms / Deﬁnitions ‣Getting a ﬂavor -Creating a Schema -Indexes -Evolving the Schema ‣Data modeling -DBRef -Single Table Inheritance -Many - Many -Trees -Lists / Queues / Stacks

Document Oriented Basic unit of data: JSON Documents Not Relational, Key Value Not OODB - Associations implied by Document Structure - but your database schema != your program schema

Terms Table -> Collection Row(s) -> JSON Document Index -> Index Join -> Embedding and Linking across documents Partition -> Shard Partition Key -> Shard Key

Considerations What are the requirements ? - Functionality to be supported - Access Patterns ? - Data Life Cycle (insert, update, deletes) - Expected Performance / Workload ? Capabilities of the database ?

DB Considerations How can we manipulate this data ? Dynamic Queries Secondary Indexes Atomic Updates Map Reduce Access Patterns ? Read / Write Ratio Types of updates Types of queries Considerations No Joins Single Document Transactions only

Design Session Use Rich Design Documents post = {author: “kyle”, date: new Date(), text: “my blog post...”, tags: [“mongodb”, “intro”]} >db.post.save(post)

>db.posts.ﬁnd() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My ﬁrst blog", tags : [ "mongodb", "intro" ] } Notes: - ID is unique, but can be anything you’d like

Secondary index for “author” // 1 means ascending, -1 means descending >db.posts.ensureIndex({author: 1}) >db.posts.ﬁnd({author: 'kyle'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }

Verifying indexes exist >db.system.indexes.ﬁnd() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }

Verifying indexes exist >db.system.indexes.ﬁnd() // Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } } // Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // ﬁnd posts with any tags >db.posts.ﬁnd({tags: {$exists: true}})

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne, // find posts with any tags >db.posts.find({tags: {$exists: true}}) Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i }) Counting: // posts written by mike >db.posts.find({author: “mike”}).count()

Extending the Schema comment = {author: “fred”, date: new Date(), text: “super duper”} update = { ‘$push’: {comments: comment}, ‘$inc’: {comments_count: 1}} >db.posts.update({_id: “...” }, update)

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My ﬁrst blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" } ]}

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.ﬁnd({comments.author:”Fred”})

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1}) >db.posts.find({comments.author:”kyle”}) // find last 5 posts: >db.posts.find().sort({date:-1}).limit(5) // most commented post: >db.posts.find().sort({comments_count:-1}).limit(1) When sorting, check if you need an index

Map Reduce Aggregation and batch manipulation Collection in, Collection out Parallel in sharded environments

Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); } reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; } res = db.posts.mapReduce(mapFunc, reduceFunc) >db[res.result].ﬁnd() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more

Wordnik 9B records, 100M queries / week, 1,2TB { entry : { header: { id: 0, headword: "m", sourceDictionary: "GCide", textProns : [ {text: "(em)", seq:0} ], syllables: [ {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", headWord: "m", id: 1, deﬁnitions: : [ {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } } }

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB

Single Table Inheritance >db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}}) // create index >db.shapes.ensureIndex({radius: 1})

One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to ﬁnd latest comments across all documents

One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to ﬁnd latest comments across all documents - Embedded tree - Single document - Natural - Hard to query

One to Many - Embedded Array / Array Keys - slice operator to return subset of array - hard to ﬁnd latest comments across all documents - Embedded tree - Single document - Natural - Hard to query - Normalized (2 collections) - most ﬂexible - more queries

Many - Many Example: - Product can be in many categories - Category can have many products Products id | product_id | category_id Category

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.ﬁnd({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]} //All categories for a given product >db.categories.ﬁnd({product_ids: ObjectId("4c4ca23933fb5941681b912e")}) //All products for a given category >db.products.ﬁnd({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.ﬁnd({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]} categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"} // All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")}) // All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})

Trees Full Tree in Document { comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]} Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit

Trees Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //ﬁnd all descendants of b: >db.tree2.ﬁnd({ancestors: ‘b’})

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" } //find all descendants of b: >db.tree2.find({ancestors: ‘b’}) //find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})

ﬁndAndModify Queue example //Example: grab highest priority job and mark job = db.jobs.ﬁndAndModify({ query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})

More Cool Stuff • Aggregation • Capped collections • GridFS • Geo

Learn More Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083 Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight- merriman Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification

Download MongoDB and let us know what you think @mongodb http://www.mongodb.org

DBRef DBRef {$ref: collection, $id: id_value} - Think URL - YDSMV: your driver support may vary Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]} Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].ﬁndOne({_id: r.$id})); }

BSON Mongodb stores data in BSON internally Lightweight, Traversable, Efficient encoding Typed boolean, integer, ﬂoat, date, string, binary, array...

Schema Design with MongoDB

More Related Content

What's hot

Viewers also liked

Similar to Schema Design with MongoDB

Recently uploaded

Schema Design with MongoDB

Editor's Notes