open-source, high-performance, schema-free, document-oriented database
RDBMS • Great for many applications • Shortcomings • Scalability • Flexibility
CAP Theorem • Consistency • Availability • Tolerance to network Partitions • Pick two http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
ACID vs BASE • Atomicity • Basically Available • Consistency • Soft state • Isolation • Eventually consistent • Durability
Schema-free • Loosening constraints - added flexibility • Dynamically typed languages • Migrations
BigTable • Single master node • Row / Column hybrid • Versioned
BigTable • Open-source clones: • HBase • Hypertable
Dynamo • Simple Key/Value store • No master node • Write to any (many) nodes • Read from one or more nodes (balance speed vs. consistency) • Read repair
Dynamo • Open-source clones • Project Voldemort • Cassandra - data model more like BigTable • Dynomite
memcached • Used as a caching layer • Essentially a key/value store • RAM only - fast • Does away with ACID
Redis • Like memcached • Different • Values can be strings, lists, sets • Non-volatile
Tokyo Cabinet + Tyrant • Key/value store with focus on speed • Some more advanced queries • Sorting, range or prefix matching • Multiple storage engines • Hash, B-Tree, Fixed length and Table
• A lot in common with MongoDB: • Document-oriented • Schema-free • JSON-style documents
• Differences • MVCC based • Replication as path to scalability • Query through predefined views • ACID • REST
• Focus on performance • Rich dynamic queries • Secondary indexes • Replication / failover • Auto-sharding • Many platforms / languages supported
Good at • The web • Caching • High volume / low value • Scalability
Less good at • Highly transactional • Ad-hoc business intelligence • Problems that require SQL
PyMongo • Python driver for MongoDB • Pure Python, with optional C extension • Installation (setuptools): easy_install pymongo
Document • Unit of storage (think row) • Just a dictionary • Can store many Python types: • None, bool, int, float, string / unicode, dict, datetime.datetime, compiled re • Some special types: • SON, Binary, ObjectId, DBRef
Collection • Schema-free equivalent of a table • Logical groups of documents • Indexes are per-collection
_id • Special key • Present in all documents • Unique across a Collection • Any type you want
Blog back-end
Post {“author”: “mike”, “date”: datetime.datetime.utcnow(), “text”: “my blog post...”, “tags”: [“mongodb”, “python”]}
Comment {“author”: “eliot”, “date”: datetime.datetime.utcnow(), “text”: “great post!”}
New post post = {“author”: “mike”, “date”: datetime.datetime.utcnow(), “text”: “my blog post...”, “tags”: [“mongodb”, “python”]} post_id = db.posts.save(post)
Embedding a comment c = {“author”: “eliot”, “date”: datetime.datetime.utcnow(), “text”: “great post!”} db.posts.update({“_id”: post_id}, {“$push”: {“comments”: c}})
Last 10 posts query = db.posts.find() .sort(“date”, DESCENDING) .limit(10) for post in query: print post[“text”]
Posts by author db.posts.find({“author”: “mike”})
Posts in the last week last_week = datetime.datetime.utcnow() + datetime.timedelta(days=-7) db.posts.find({“date”: {“$gt”: last_week}})
Posts ending with ‘Python’ db.posts.find({“text”: re.compile(“Python$”)})
Posts with a tag db.posts.find({“tag”: “mongodb”}) ... and fast db.posts.create_index(“tag”, ASCENDING)
Counting posts db.posts.count() db.posts.find({“author”: “mike”}).count()
Basic paging page = 2 page_size = 15 db.posts.find().limit(page_size) .skip(page * page_size)
Migration: adding titles • Easy - just start adding them: post = {“author”: “mike”, “date”: datetime.datetime.utcnow(), “text”: “another blog post...”, “tags”: [“meetup”, “python”], “title”: “Document Oriented Dbs”} post_id = db.posts.save(post)
Advanced queries • $gt, $lt, $gte, $lte, $ne, $all, $in, $nin • where() db.posts.find().where(“this.author == ‘mike’”) • group()
Other cool stuff • Capped collections • Unique indexes • Mongo shell • GridFS • MongoKit (on pypi)
• Download MongoDB http://www.mongodb.org • Install PyMongo • Try it out!
• http://www.mongodb.org • irc.freenode.net#mongodb • mongodb-user on google groups • @mongodb, @mdirolf • mike@10gen.com • http://www.slideshare.net/mdirolf

MongoDB EuroPython 2009