An introduction to MongoDB Rácz Gábor ELTE IK, 2013. febr. 10.
2 In Production http://www.mongodb.org/about/production-deployments/
3 NoSQL • Key-value • Graph database • Document-oriented • Column family
4 Document store RDBMS MongoDB Database Database Table, View Collection Row Document (JSON, BSON) Column Field Index Index Join Embedded Document Foreign Key Reference Partition Shard
5 Document store RDBMS MongoDB Database Database Table, View Collection Row Document (JSON, BSON) Column Field Index Index Join Embedded Document Foreign Key Reference Partition Shard > db.user.findOne({age:39}) { "_id" : ObjectId("5114e0bd42…"), "first" : "John", "last" : "Doe", "age" : 39, "interests" : [ "Reading", "Mountain Biking ] "favorites": { "color": "Blue", "sport": "Soccer"} }
6 CRUD • Create • db.collection.insert( <document> ) • db.collection.save( <document> ) • db.collection.update( <query>, <update>, { upsert: true } ) • Read • db.collection.find( <query>, <projection> ) • db.collection.findOne( <query>, <projection> ) • Update • db.collection.update( <query>, <update>, <options> ) • Delete • db.collection.remove( <query>, <justOne> )
7 CRUD example > db.user.insert({ first: "John", last : "Doe", age: 39 }) > db.user.find () { "_id" : ObjectId("51…"), "first" : "John", "last" : "Doe", "age" : 39 } > db.user.update( {"_id" : ObjectId("51…")}, { $set: { age: 40, salary: 7000} } ) > db.user.remove({ "first": /^J/ })
8 Features • Document-Oriented storege • Full Index Support • Replication & High Availability • Auto-Sharding • Querying • Fast In-Place Updates • Map/Reduce Agile Scalable
9 Memory Mapped Files • „A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource.”1 • mmap() 1: http://en.wikipedia.org/wiki/Memory-mapped_file
10 Replica Sets • Redundancy and Failover • Zero downtime for upgrades and maintaince • Master-slave replication • Strong Consistency • Delayed Consistency • Geospatial features Host1:10000 Host2:10001 Host3:10002 replica1 Client
11 Sharding • Partition your data • Scale write throughput • Increase capacity • Auto-balancing Host1:10000 Host2:10010 Host3:20000 shard1 shard2 Host4:30000 configdb Client
12 Mixed Host4:10010 Host5:20000 shard1 shardn Host6:30000 configdb Client Host1:10000 Host2:10001 Host3:10002 replica1 Host7:30000 ...
13 Map/Reduce db.collection.mapReduce( <mapfunction>, <reducefunction>, { out: <collection>, query: <>, sort: <>, limit: <number>, finalize: <function>, scope: <>, jsMode: <boolean>, verbose: <boolean> } ) var mapFunction1 = function() { emit(this.cust_id, this.price); }; var reduceFunction1 = function(keyCustId, valuesPrices) { return sum(valuesPrices); };
14 Other features • Easy to install and use • Detailed documentation • Various APIs • JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#, C++, Haskell, Erlang • Community • Open source
15 Theory of noSQL: CAP CAP Theorem: satisfying all three at the same time is impossible A P • Many nodes • Nodes contain replicas of partitions of data • Consistency • all replicas contain the same version of data • Availability • system remains operational on failing nodes • Partition tolarence • multiple entry points • system remains operational on system split C
16 Theory of noSQL: CAP CAP Theorem: satisfying all three at the same time is impossible A P • Many nodes • Nodes contain replicas of partitions of data • Consistency • all replicas contain the same version of data • Availability • system remains operational on failing nodes • Partition tolarence • multiple entry points • system remains operational on system split C
17 ACID - BASE Pritchett, D.: BASE: An Acid Alternative (queue.acm.org/detail.cfm?id=1394128) • Atomicity • Consistency • Isolation • Durability • Basically Available (CP) • Soft-state • Eventually consistent (AP)
18 Thank you for your attention!

MongoDB Basics Introduction along with MapReduce

  • 1.
    An introduction toMongoDB Rácz Gábor ELTE IK, 2013. febr. 10.
  • 2.
  • 3.
    3 NoSQL • Key-value • Graphdatabase • Document-oriented • Column family
  • 4.
    4 Document store RDBMS MongoDB DatabaseDatabase Table, View Collection Row Document (JSON, BSON) Column Field Index Index Join Embedded Document Foreign Key Reference Partition Shard
  • 5.
    5 Document store RDBMS MongoDB DatabaseDatabase Table, View Collection Row Document (JSON, BSON) Column Field Index Index Join Embedded Document Foreign Key Reference Partition Shard > db.user.findOne({age:39}) { "_id" : ObjectId("5114e0bd42…"), "first" : "John", "last" : "Doe", "age" : 39, "interests" : [ "Reading", "Mountain Biking ] "favorites": { "color": "Blue", "sport": "Soccer"} }
  • 6.
    6 CRUD • Create • db.collection.insert(<document> ) • db.collection.save( <document> ) • db.collection.update( <query>, <update>, { upsert: true } ) • Read • db.collection.find( <query>, <projection> ) • db.collection.findOne( <query>, <projection> ) • Update • db.collection.update( <query>, <update>, <options> ) • Delete • db.collection.remove( <query>, <justOne> )
  • 7.
    7 CRUD example > db.user.insert({ first:"John", last : "Doe", age: 39 }) > db.user.find () { "_id" : ObjectId("51…"), "first" : "John", "last" : "Doe", "age" : 39 } > db.user.update( {"_id" : ObjectId("51…")}, { $set: { age: 40, salary: 7000} } ) > db.user.remove({ "first": /^J/ })
  • 8.
    8 Features • Document-Oriented storege •Full Index Support • Replication & High Availability • Auto-Sharding • Querying • Fast In-Place Updates • Map/Reduce Agile Scalable
  • 9.
    9 Memory Mapped Files •„A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource.”1 • mmap() 1: http://en.wikipedia.org/wiki/Memory-mapped_file
  • 10.
    10 Replica Sets • Redundancyand Failover • Zero downtime for upgrades and maintaince • Master-slave replication • Strong Consistency • Delayed Consistency • Geospatial features Host1:10000 Host2:10001 Host3:10002 replica1 Client
  • 11.
    11 Sharding • Partition yourdata • Scale write throughput • Increase capacity • Auto-balancing Host1:10000 Host2:10010 Host3:20000 shard1 shard2 Host4:30000 configdb Client
  • 12.
  • 13.
    13 Map/Reduce db.collection.mapReduce( <mapfunction>, <reducefunction>, { out: <collection>, query: <>, sort:<>, limit: <number>, finalize: <function>, scope: <>, jsMode: <boolean>, verbose: <boolean> } ) var mapFunction1 = function() { emit(this.cust_id, this.price); }; var reduceFunction1 = function(keyCustId, valuesPrices) { return sum(valuesPrices); };
  • 14.
    14 Other features • Easyto install and use • Detailed documentation • Various APIs • JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#, C++, Haskell, Erlang • Community • Open source
  • 15.
    15 Theory of noSQL:CAP CAP Theorem: satisfying all three at the same time is impossible A P • Many nodes • Nodes contain replicas of partitions of data • Consistency • all replicas contain the same version of data • Availability • system remains operational on failing nodes • Partition tolarence • multiple entry points • system remains operational on system split C
  • 16.
    16 Theory of noSQL:CAP CAP Theorem: satisfying all three at the same time is impossible A P • Many nodes • Nodes contain replicas of partitions of data • Consistency • all replicas contain the same version of data • Availability • system remains operational on failing nodes • Partition tolarence • multiple entry points • system remains operational on system split C
  • 17.
    17 ACID - BASE Pritchett,D.: BASE: An Acid Alternative (queue.acm.org/detail.cfm?id=1394128) • Atomicity • Consistency • Isolation • Durability • Basically Available (CP) • Soft-state • Eventually consistent (AP)
  • 18.
    18 Thank you foryour attention!

Editor's Notes

  • #3 2009: Initial release At now: version 2.2.3
  • #4 Huge quantity of data => Distributed systems => expensive joins => New fields, new demands (graphs) => Different data strucutres: Simplier or more specific
  • #5 Javascript
  • #6 Flexible schema Javascript
  • #7 Create The field name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array. The field names cannot start with the $ character. The field names cannot contain the . character. Create with save If the <document> argument does not contain the _id field or contains an _id field with a value not in the collection, the save() method performs an insert of the document. Otherwise, the save() method performs an update. sds
  • #10 http://docs.mongodb.org/manual/faq/storage/
  • #11 DC – Data center
  • #12 1970 – 2000: Vertical Scalability (scale-up) Google, ~2000: Horizontal Scalability (scale-out)
  • #14 http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce
  • #17 MongoDB A: What if a primary node is down?
  • #18 ACID Atomicity. All of the operations in the transaction will complete, or none will. Consistency. The database will be in a consistent state when the transaction begins and ends. Isolation. The transaction will behave as if it is the only operation being performed upon the database. Durability. Upon completion of the transaction, the operation will not be reversed. BASE Basically Available: some parts of system remain availabe on failure Soft-state: (the information will expire unless it is refreshed ) system will change state without user intervention due to eventual consistency Eventually consistency: asynchron propagation consistancy window