MongoDB Basic Concepts Norberto Leite Senior Solutions Architect, 10gen
Agenda • Overview • Replication • Scalability • Consistency & Durability • Flexibility / Developer Experience 2
But first ...
Happy Hanukkah!!!
Who’s this guy?
Norberto Leite Senior Solutions Architect @nleite / norberto@10gen.com 6
Norberto Leite Senior Solutions Architect @nleite / norberto@10gen.com Barcelona 7
Norberto Leite Senior Solutions Architect @nleite / norberto@10gen.com Barcelona Love MongoDB 8
Norberto Leite Senior Solutions Architect @nleite / norberto@10gen.com Barcelona Love MongoDB and others ... 9
Your Data
Fundamentals Document Application High Oriented { Performance name: ‘Norberto Leite’, position: ‘SA’, nick: ‘WingMan’, based: [‘Barcelona’, ‘London’] } mongoDB mongoDB mongoDB mongoDB Fully Consistent Horizontal Scalability 13
Replication
Why do we need Replication? • Failover • Backups • Secondary Batch Jobs • High Availability 15
Outages • Planned – Hardware upgrade – OS or file-system tuning – Software upgrade – Relocation of data to new file-system / storage • Un-planed – Human Error – Hardware Failure – Data Center / Region Outage – Application Corruption 16
Replica Sets • Data Protection – Multiple copies of data – Data spread across data centers, AZ’s etc • High Availability – Automated Failover – Automated Recovery 17
Asynchronous Replication App Write Primary Read (default) Secondary Read (optional) Secondary Read (optional)
Failover App Write Primary Read (default) Secondary Read (optional) Secondary Read (optional)
Automatic Failover Primary Election App Primary Write Primary Read (default) Secondary Read (optional)
Automatic Recovery App Read Recovery Secondary (optional) Write Primary Read (default) Secondary Read (optional)
Sharding
Sharding • Data Location Transparent to Code • Data Distribution is Automatic – as well as re-distribution • Aggregation System resources Horizontally • No CODE Changes!!! 23
sh.shardCollection("test.tweets", {_id: 1} , false) Range Distribution shard01 shard02 shard03 a-i j-m n-z
Chunk Split shard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj k-m ki-m
Auto Balancing shard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj ki-m
Routed db.tweets.find( {_id: Queries ‘norberto’}) shard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj ki-m
db.tweets.find( {email: ‘norberto@10gen’}) Scatter Gather shard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj ki-m
Caching 96 GB Mem 3:1 Data/Mem shard01 a-i 300 GB Data j-r n-z 300 GB
Horizontal Distribution 96 GB Mem 96 GB Mem 96 GB Mem 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem shard01 shard02 shard03 a-i a-i j-r n-z 300 GB Data 100 GB 100 GB 100 GB
Consistency and Durability
Consistency • Eventual Consistency – Allow updates when a system as been partitioned – Resolve conflicts later – Ex: Cassandra, CouchDB • Immediate Consistency – Single Master – Avoids conflicts – Example: MongoDB 32
Durability • For how long is my data available? • When do I know my data is safe?! • Where is it safe? • MongoDB style: – Fire and Forget – Get Last Error – Journal Sync – Replica Safe 33
Durability Multiple Data Centers Memory Journal Secondary Nodes RDMS Async w=1 (default) j=true w=majority w=”tag” 34
Flexibility
Data Model • Why Json? – Well understood data format – Maps simply to objects – Linking & Embedding to describe relationships 36
JSON place1 = { : "578 Broadway 7th Floor", name : "10gen HQ", address city : "New York", zip "business", "tech" ]} : "10011", tags : [ }
Relational Way
MongoDB Way embedding linking
JSON & Scale Out • Embedding removes the need for: – Distributed Joins – Two Phase Commit • Enables data to be distributed across many nodes without penalty 40
MongoDB Basic Concepts
MongoDB Basic Concepts

MongoDB Basic Concepts