Running MongoDB on AWS Mark Yalenti Senior Solutions Architect, MongoDB Inc.
3 Agenda • MongoDB Basics • Deployment Configurations • AWS EC2 Instances • Configuring Instances • Storage Considerations • Backup Considerations
MONGODB BASICS
5 MongoDB Basics • Open source • Document database • High performance • Horizontally scalable • Full featured • Built to match agile development and deployment
6 MongoDB Features • Flexible document data model • Rich ad-hoc queries • Real-time aggregation • Geospatial support (Within, Intersects and Near operators) • Text search • Pluggable Storage Engine Architecture • Built-in support for – Redundancy, failover, auto-partitioning
7 7x-10x Performance, 50%-80% Less Storage How: WiredTiger Storage Engine • Same data model, same query language, same ops • Write performance gains driven by document-level concurrency control • Storage savings driven by native compression • Non-disruptive upgrade MongoDB 3.0MongoDB 2.6 Performance
8 MMAPv1 Storage Engine • History – MMAPv0 was initial storage engine of MongoDB – Delegates memory management to operating system • New Capabilities – Collection-level concurrency control – Multiple performance enhancements – Windows performance now equivalent to Linux • Advantages – Read-intensive applications – Cache survives MongoDB restart, upgrades – Drop-in upgrade
9 Accessing MongoDB Shell Command-line shell for interacting directly with database Drivers Drivers for most popular programming languages and frameworks > db.collection.insert({product:“MongoDB”, type:“Document Database”}) > > db.collection.findOne() { “_id” : ObjectId(“5106c1c2fc629bfe52792e86”), “product” : “MongoDB” “type” : “Document Database” } Java Python Perl Ruby Haskell JavaScript
DEPLOYMENT CONFIGURATION
11 Deploying MongoDB • Single node – Development: prototyping, testing • Replica Set – Production: high availability, disaster recovery • Shard Cluster – Production: auto-partitioning, linear read/write scale
12 MongoDB: Single Node MongoDB App
13 MongoDB: Replica Sets MongoDB Primary App MongoDB Secondary MongoDB Secondary
14 MongoDB: Shard Cluster App MongoDB Primary MongoDB Secondary Shard MongoDB Secondary MongoDB Primary MongoDB Secondary Shard MongoDB Secondary MongoDB Primary MongoDB Secondary Shard MongoDB Secondary mongos config config config App mongos App mongos
AMAZON WEB SERVICES
16 EC2 Instance Types • General Purpose • Compute-optimized • GPU • Memory-optimized • Storage-optimized • Micro
17 EC2 Instance Types • General Purpose • Compute-optimized • GPU (compute resources not needed) • Memory-optimized • Storage-optimized • Micro (bursty, no sustained CPU)
18 EC2 Instance Types • General Purpose – M3, M4 – (Instance Store vs EBS) • Compute-optimized – C3, C4 – (Instance Store vs EBS) • Memory-optimized – R3 • Storage-optimized – I2, D2
19 Additional Considerations • Memory Optimized Instances for larger working set • More CPUs are suggested for WiredTiger based instances • Placement groups can be used for high-bandwidth needs
20 Components and Sizing mongod Core database process High performance Memory, CPU Storage, Network config Shard metadata Smaller m4.medium or better mongos Shard query router Deploy on app server
21 Replica Sets: Availability Zones MongoDB Primary App MongoDB Secondary MongoDB Secondary Zone 1 Zone 2 Zone 3
22 Replica Sets: Regions MongoDB Primary App MongoDB Secondary MongoDB Secondary Region 1 Region 2
23 Replica Sets: Regions and Zones MongoDB Primary App MongoDB Secondary MongoDB Secondary Region 1 Region 2
24 Shard Cluster: Regions App MongoDB Primary MongoDB Secondary Shard MongoDB Secondary MongoDB Primary MongoDB Secondary Shard MongoDB Secondary MongoDB Primary MongoDB Secondary Shard MongoDB Secondary mongos config config config App mongos App mongos Region 1 Region 2
25 Shard Cluster: Regions App MongoDB Primary MongoDB Secondary Shard MongoDB Secondary MongoDB Primary MongoDB Secondary Shard MongoDB Secondary MongoDB Primary MongoDB Secondary Shard MongoDB Secondary mongos config config config App mongos App mongos Region 1 Region 2
26 High Availability • Use Replica Sets – Deploy in odd numbers – Maintain majority • Withstand the loss of – Any single zone? – Any single region? – Deploy in 3 places • Scale – Replica Sets for HA – Shards for scale – Combine for both MongoDB Primary 1 MongoDB Secondar y 2 MongoDB Secondar y 3
BEST PRACTICES
28 Sensible Instance Defaults • Best practices are meant to be a sensible starting point • Strive for smooth and consistent performance • Tune -> Scale Vertically -> Scale Horizontally • Amazon Linux optimized for EC2 • EBS provides persistent storage • EBS-optimized allocates additional NIC for storage • Provisioned IOPS provides consistent EBS performance • Use separate PIOPS volumes for data, log, journal
29 Instance Configuration Best Practices • Install via yum for flexibility and simplicity – See mongodb.org for details • Update system settings (Don’t forget about NTP!) • Use EXT4 or XFS (WiredTiger runs best on XFS) • Set read ahead (default is too high) • Update ulimits (default is too low) • Update TCP KeepAlive https://docs.mongodb.org/manual/administration/production-notes/
30 Data Safety • What’s your backup plan? • Have you tested restoring? • Is your data highly available? • How do you recover from disaster?
31 Protecting Your Data • Replica Sets – Proper deployments provide HA and DR • Manual backup/restore – Scriptable, tunable • Cloud Manager Backup – Continuous, secure backup
32 Manual Backup Considerations • Consider Journaling (Write Ahead Log)– on by default • Allow for DB durability in case of a fault • With Journaling a snapshot technology can be used with MMAPv1 • MMAP v1 does in-place updates – fsync is required if you don’t use journaling • WiredTiger does not require fsync as it effectively does write ahead natively • Journaling with WiredTiger is still a good idea
33 MongoDB Cloud Manager Single-click provisioning, scaling & upgrades, admin tasks – including instance deployment on EC2 Monitoring, with charts, dashboards and alerts on 100+ metrics Backup and restore, with point-in-time recovery, support for shard clusters The Best Way to Manage MongoDB In Your Data Center Up to 95% Reduction in Operational Overhead
34 Resources • MongoDB on AWS best practices: – http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/ • MongoDB production Notes – http://docs.mongodb.org/manual/administration/production-notes/ • MongoDB docs – http://docs.mongodb.org
QUESTIONS?
Running MongoDB 3.0 on AWS

Running MongoDB 3.0 on AWS