Migrating from MySQL to MongoDB Based on a true story
Starting Point Existing System
Starting Point Requirements for replacement: • Store data • High performance o worldwide access o multiple physical locations o read heavy application • Simple to use o No DBA... • MI o real-time reporting without affecting production
• Availability and Partition Tolerance o from CAP o by default Consistency not Availability... • Document-Orientated o called from OOPHP • Simple querying using JSON o benefit over CouchDB which needs MapReduce • Simple multi-node setup • MongoLab.com o web interface for prototyping So why MongoDB?
Read Preference Defaults to primary (strongly consistent) Other options allow eventual consistency primaryPreferred secondary secondaryPreferred
New Architecture Single primary, multiple read members
What are the other members? One Delayed member provides a rolling backup by maintaining a dataset, delay is limited by the size of the oplog One Hidden member provides access for MI (and also used for dumping in this scenario)
Automatic Failover Primary is FUBAR, new primary is elected
Automatic Failover Had experience of this in production. It's magic. Remaining replica set members will vote and elect a new primary, which will start taking writes. (PHP extension issue) Replica set members that have been out of the cluster will recover automatically on rejoining o If you've gone past the oplog window then it's a manual process
Sharding, and why we didn't • Sharding divides the data set and distributes the data over multiple servers, or shards • Necessary when o data exceeds max for single instance (based on disk space) o the Working Set exceeds available RAM o single instance can’t deal with writes (would have to be a lot) • We had none of these and Sharding adds complexity
Sharding Architecture http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/
Sharding Availability Each Shard would need a Replica Set containing at least 3 members to achieve the same level of availability as our non-sharded architecture. In addition if one or two of the Config Servers are unavailable chunk migration and chunk splitting are suspended
DataBase Design • Schemaless means that the structure and the content of the “record” are held for each “row” o long key names add up o we used single chars and a lookup table • Collections o Can think of them as “buckets” rather than tables o Contain similar records by design but it’s not enforced
Management Information • Different conceptually in NoSQL databases due to the way they are queried, returning JSON • Traditional working through a query tool is less intuitive (although MongoVUE is quite good) • This lead us to replace our SQL script -> Excel -> Graph process (yuck) • MapReduce, powerful but a steep learning curve
Conclusions MongoDB is ace Questions?

Migrating from MySQL to MongoDB

  • 1.
    Migrating from MySQL toMongoDB Based on a true story
  • 2.
  • 3.
    Starting Point Requirements forreplacement: • Store data • High performance o worldwide access o multiple physical locations o read heavy application • Simple to use o No DBA... • MI o real-time reporting without affecting production
  • 4.
    • Availability andPartition Tolerance o from CAP o by default Consistency not Availability... • Document-Orientated o called from OOPHP • Simple querying using JSON o benefit over CouchDB which needs MapReduce • Simple multi-node setup • MongoLab.com o web interface for prototyping So why MongoDB?
  • 5.
    Read Preference Defaults toprimary (strongly consistent) Other options allow eventual consistency primaryPreferred secondary secondaryPreferred
  • 6.
    New Architecture Single primary,multiple read members
  • 7.
    What are theother members? One Delayed member provides a rolling backup by maintaining a dataset, delay is limited by the size of the oplog One Hidden member provides access for MI (and also used for dumping in this scenario)
  • 8.
    Automatic Failover Primary isFUBAR, new primary is elected
  • 9.
    Automatic Failover Had experienceof this in production. It's magic. Remaining replica set members will vote and elect a new primary, which will start taking writes. (PHP extension issue) Replica set members that have been out of the cluster will recover automatically on rejoining o If you've gone past the oplog window then it's a manual process
  • 10.
    Sharding, and whywe didn't • Sharding divides the data set and distributes the data over multiple servers, or shards • Necessary when o data exceeds max for single instance (based on disk space) o the Working Set exceeds available RAM o single instance can’t deal with writes (would have to be a lot) • We had none of these and Sharding adds complexity
  • 11.
  • 12.
    Sharding Availability Each Shardwould need a Replica Set containing at least 3 members to achieve the same level of availability as our non-sharded architecture. In addition if one or two of the Config Servers are unavailable chunk migration and chunk splitting are suspended
  • 13.
    DataBase Design • Schemalessmeans that the structure and the content of the “record” are held for each “row” o long key names add up o we used single chars and a lookup table • Collections o Can think of them as “buckets” rather than tables o Contain similar records by design but it’s not enforced
  • 14.
    Management Information • Differentconceptually in NoSQL databases due to the way they are queried, returning JSON • Traditional working through a query tool is less intuitive (although MongoVUE is quite good) • This lead us to replace our SQL script -> Excel -> Graph process (yuck) • MapReduce, powerful but a steep learning curve
  • 15.