Compare – DynamoDb vs. MongoDB Higher Ed
2 Requirements  Unstructured data storage  ACID compliance not necessary  Fast read/write  Ability to index data and search  Full text search (?)  Java/Spring support  JavaScript support  REST API  Community support  Scaling up and maintenance
3 Shard – when database grows large  Horizontal partitioning of database where rows are held in separate database servers  Compare that to normalization or vertical partitioning where data is split into columns  Advantages • Reduces index size in each table in each database (performance +) • Load can be spread out over multiple machines (performance ++)  Disadvantages • Increased reliance on interconnected servers • Query latency when more than one shard is searched • Issues with consistency and durability
4 DynamoDb Internals  Key/Value Pair • Uses JSON only as a transport protocol • Data is not being stored "on-disk" in the JSON data format • Applications that use DynamoDB must either implement their own JSON parsing or use a library like one of the AWS SDKs to do this parsing for them.  Data Types • Scalar – string, number and binary (BLOB and CLOB) • Multivalued – string set, number set and binary set
5 DynamoDb Internals  Data Model • Table – no fixed schema (columns, datatype etc)  Needs a fixed primary key, its data type and secondary index (if necessary)  Limit to 256 tables per region per account • Items - individual records in a table  Limited to 400 kb • Attributes • Support one-to-one, one-to-many and many-to-many relationship
6 DynamoDb Internals  Keys - need to create at the table creation time • Primary Keys – Hash, Hash and Range keys • Local Secondary Keys – can access only single partition  Limit – 5 indexes per table/ 20 attributes max • Global Secondary Keys – can access any partition  Limit – 5 indexes per table  Creating a secondary index, you define the alternate key for the index, along with any other attributes that you want to be projected in the index. DynamoDB copies these attributes into the index, along with the primary key attributes from the table  Add, update, delete action on table is automatically reflected on the index
7 DynamoDb Internals  Throughput • A read capacity unit size is 4 kb • A write capacity unit size is 1 kb • To read an item of 5kb the # of read capacity unit required = 2 • These units are defined while creating a table • AWS sends alerts when these limits are exceeded • AWS also throttles further request beyond the capacity defined
8 DynamoDb Operations  Table level – create, update, delete, list, describe  Item/attribute level – add, update, delete  Query – query a table with hash key and range key. Result limits to 1 MB  Scan – reads all items from a table. Slower than query  Parallel scan is also available to makes things faster  Supports pagination
9 DynamoDb Features  Fully Managed NoSql database service – handles scaling, partitioning, upgrades  Durable – automatically replicates to different availability zones  Scalable – automatically distributes data to multiple server as size grows  Fast – on EC2 instance single digit millisecond latency for item size of 1kb • 5 ms for read, 10 ms for write  Simple Administration – Amazon Web Console  Fault Tolerant – automatically replicates data  Flexible – each item in a table can have different number of attributes  Indexing – primary key of each item. Global and local secondary indexes allow user to query non-primary key attributes  Secure – authentication, use of latest cryptographic technique, ability to integrate with IAM (AWS Identity and access management)
10 DynamoDb Features  Could be Cost-Effective – per 1kb item, $0.01/hour for every 10 writes/sec • $0.01/hour for every 50 strongly consistent read/sec • $0.28 per million writes • $0.056 per million strongly consistent reads • $1.00 per GB/month for indexed storage  SDK – AWS SDK for Java/.NET/PHP etc. • Supports all table operations, query and scans  Service Oriented Architecture – Rest support – simple API, only 12 operations • Data transfers as simple GET/POST/DELETE  Large items can be stored in S3 buckets, thereby reducing cost  Monitoring – AWS management console, Cloudwatch, Command line tool
11 DynamoDb Features  Can be integrated with RedShift – a data warehousing tool  DynamoDb Local - small client-side database and server that mimics the DynamoDB service. Available as a .jar file
12 MongoDb Internals (derived from humongous)  Document Oriented database • Data is stored in BSON format (Binary JSON) • Supports up-to 100 levels of nesting  Data Types – BSON • String, Integer, Boolean, Double, Arrays, Date*, Timestamp, Binary *, Null • Min/Max keys – compare against lowest and highest BSON elements • Object – embedded documents • ObjectId* – store document’s ID • Regular Expression * • JavaScript code * • Symbol – reserved for languages that use specific symbol type * Indicates non-JSON types
13 MongoDb Internals (derived from humongous)  Data Model • Collections – documents that share similar structure • Document – similar to rows in RDBMS  Maximum BSON document size is 16 MB • Field – similar to columns in RDBMS
14 MongoDb Query  Query • Key/value – key can be any field in the document, including the primary key • Range – greater than, less than or equal to, between • Geospatial – proximity criteria, intersection and inclusion • Text search – result shows relevance order • Aggregation – count, min, max, average etc • Map reduce  Covered Queries – queries that return only indexed fields  Query Optimization – MongoDB performs automatic optimization  When necessary developer can utilize more indexes through index intersection
15 MongoDb Index  Index • Unique • Compound • ArrayTime-to-live (TTL) • Geospatial • Sparse • Text search  Size of index entry must be less than 1024 bytes  A single collection can have no more than 64 indexes
16 MongoDb – Sample Query  Return states with populatin above 10 millions db.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] )
17 MongoDb Features  Mongo Shell – JavaScript shell that supports nearly all MongoDB commands  Auto Shard – automatically balances data in the cluster  Automatic Replica Failover  Query Router - queries that don’t use the shard key, the query router broadcasts the query to all shards and aggregate and sort the results  ACID compliant at the document level  Security - MongoDB Enterprise Advance provides extensive support authentication, authorization, auditing and encryption  MondbOps manager – deploy, upgrade (no downtime), monitor, backup and scale MongoDB instances. • Hosted MongoDB Management Service also provides many of these capabilities  Provides in-memory caching
18 MongoDb Features  Large community support, 4th largest database in use right after SQL databases  Spring Data Project for MongoDB  Pluggable storage engine • For low latency high performance – WiredTiger or in-memory • Analytical process – HDFS storage engine • Replica set automatically migrates independent of storage format – no complex ETL  Both Java and JavaScript API are available and documented  MongoDB University provides free education • https://university.mongodb.com/  Third-party hosted support exists for MongoDB with various price plans • https://mongolab.com/ • http://mongodirector.com/
19 References  http://aws.amazon.com/dynamodb/  http://www.mongodb.org/  http://docs.aws.amazon.com/amazondynamodb/latest/developerguide  http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB – little old  http://blog.cloudthat.in/5-reasons-why-dynamodb-is-better-than-mongodb/  http://www.masonzhang.com/2013/08/7-reasons-you-should-use-mongodb-over.html  http://www.mongodb.com/presentations/automate-mongodb-mongodb-management-service-0  http://www.mongodb.com/presentations/webinar-enterprise-architects-view-mongodb-0

Compare DynamoDB vs. MongoDB

  • 1.
    Compare – DynamoDbvs. MongoDB Higher Ed
  • 2.
    2 Requirements  Unstructured datastorage  ACID compliance not necessary  Fast read/write  Ability to index data and search  Full text search (?)  Java/Spring support  JavaScript support  REST API  Community support  Scaling up and maintenance
  • 3.
    3 Shard – whendatabase grows large  Horizontal partitioning of database where rows are held in separate database servers  Compare that to normalization or vertical partitioning where data is split into columns  Advantages • Reduces index size in each table in each database (performance +) • Load can be spread out over multiple machines (performance ++)  Disadvantages • Increased reliance on interconnected servers • Query latency when more than one shard is searched • Issues with consistency and durability
  • 4.
    4 DynamoDb Internals  Key/ValuePair • Uses JSON only as a transport protocol • Data is not being stored "on-disk" in the JSON data format • Applications that use DynamoDB must either implement their own JSON parsing or use a library like one of the AWS SDKs to do this parsing for them.  Data Types • Scalar – string, number and binary (BLOB and CLOB) • Multivalued – string set, number set and binary set
  • 5.
    5 DynamoDb Internals  DataModel • Table – no fixed schema (columns, datatype etc)  Needs a fixed primary key, its data type and secondary index (if necessary)  Limit to 256 tables per region per account • Items - individual records in a table  Limited to 400 kb • Attributes • Support one-to-one, one-to-many and many-to-many relationship
  • 6.
    6 DynamoDb Internals  Keys- need to create at the table creation time • Primary Keys – Hash, Hash and Range keys • Local Secondary Keys – can access only single partition  Limit – 5 indexes per table/ 20 attributes max • Global Secondary Keys – can access any partition  Limit – 5 indexes per table  Creating a secondary index, you define the alternate key for the index, along with any other attributes that you want to be projected in the index. DynamoDB copies these attributes into the index, along with the primary key attributes from the table  Add, update, delete action on table is automatically reflected on the index
  • 7.
    7 DynamoDb Internals  Throughput •A read capacity unit size is 4 kb • A write capacity unit size is 1 kb • To read an item of 5kb the # of read capacity unit required = 2 • These units are defined while creating a table • AWS sends alerts when these limits are exceeded • AWS also throttles further request beyond the capacity defined
  • 8.
    8 DynamoDb Operations  Tablelevel – create, update, delete, list, describe  Item/attribute level – add, update, delete  Query – query a table with hash key and range key. Result limits to 1 MB  Scan – reads all items from a table. Slower than query  Parallel scan is also available to makes things faster  Supports pagination
  • 9.
    9 DynamoDb Features  FullyManaged NoSql database service – handles scaling, partitioning, upgrades  Durable – automatically replicates to different availability zones  Scalable – automatically distributes data to multiple server as size grows  Fast – on EC2 instance single digit millisecond latency for item size of 1kb • 5 ms for read, 10 ms for write  Simple Administration – Amazon Web Console  Fault Tolerant – automatically replicates data  Flexible – each item in a table can have different number of attributes  Indexing – primary key of each item. Global and local secondary indexes allow user to query non-primary key attributes  Secure – authentication, use of latest cryptographic technique, ability to integrate with IAM (AWS Identity and access management)
  • 10.
    10 DynamoDb Features  Couldbe Cost-Effective – per 1kb item, $0.01/hour for every 10 writes/sec • $0.01/hour for every 50 strongly consistent read/sec • $0.28 per million writes • $0.056 per million strongly consistent reads • $1.00 per GB/month for indexed storage  SDK – AWS SDK for Java/.NET/PHP etc. • Supports all table operations, query and scans  Service Oriented Architecture – Rest support – simple API, only 12 operations • Data transfers as simple GET/POST/DELETE  Large items can be stored in S3 buckets, thereby reducing cost  Monitoring – AWS management console, Cloudwatch, Command line tool
  • 11.
    11 DynamoDb Features  Canbe integrated with RedShift – a data warehousing tool  DynamoDb Local - small client-side database and server that mimics the DynamoDB service. Available as a .jar file
  • 12.
    12 MongoDb Internals (derivedfrom humongous)  Document Oriented database • Data is stored in BSON format (Binary JSON) • Supports up-to 100 levels of nesting  Data Types – BSON • String, Integer, Boolean, Double, Arrays, Date*, Timestamp, Binary *, Null • Min/Max keys – compare against lowest and highest BSON elements • Object – embedded documents • ObjectId* – store document’s ID • Regular Expression * • JavaScript code * • Symbol – reserved for languages that use specific symbol type * Indicates non-JSON types
  • 13.
    13 MongoDb Internals (derivedfrom humongous)  Data Model • Collections – documents that share similar structure • Document – similar to rows in RDBMS  Maximum BSON document size is 16 MB • Field – similar to columns in RDBMS
  • 14.
    14 MongoDb Query  Query •Key/value – key can be any field in the document, including the primary key • Range – greater than, less than or equal to, between • Geospatial – proximity criteria, intersection and inclusion • Text search – result shows relevance order • Aggregation – count, min, max, average etc • Map reduce  Covered Queries – queries that return only indexed fields  Query Optimization – MongoDB performs automatic optimization  When necessary developer can utilize more indexes through index intersection
  • 15.
    15 MongoDb Index  Index •Unique • Compound • ArrayTime-to-live (TTL) • Geospatial • Sparse • Text search  Size of index entry must be less than 1024 bytes  A single collection can have no more than 64 indexes
  • 16.
    16 MongoDb – SampleQuery  Return states with populatin above 10 millions db.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] )
  • 17.
    17 MongoDb Features  MongoShell – JavaScript shell that supports nearly all MongoDB commands  Auto Shard – automatically balances data in the cluster  Automatic Replica Failover  Query Router - queries that don’t use the shard key, the query router broadcasts the query to all shards and aggregate and sort the results  ACID compliant at the document level  Security - MongoDB Enterprise Advance provides extensive support authentication, authorization, auditing and encryption  MondbOps manager – deploy, upgrade (no downtime), monitor, backup and scale MongoDB instances. • Hosted MongoDB Management Service also provides many of these capabilities  Provides in-memory caching
  • 18.
    18 MongoDb Features  Largecommunity support, 4th largest database in use right after SQL databases  Spring Data Project for MongoDB  Pluggable storage engine • For low latency high performance – WiredTiger or in-memory • Analytical process – HDFS storage engine • Replica set automatically migrates independent of storage format – no complex ETL  Both Java and JavaScript API are available and documented  MongoDB University provides free education • https://university.mongodb.com/  Third-party hosted support exists for MongoDB with various price plans • https://mongolab.com/ • http://mongodirector.com/
  • 19.
    19 References  http://aws.amazon.com/dynamodb/  http://www.mongodb.org/ http://docs.aws.amazon.com/amazondynamodb/latest/developerguide  http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB – little old  http://blog.cloudthat.in/5-reasons-why-dynamodb-is-better-than-mongodb/  http://www.masonzhang.com/2013/08/7-reasons-you-should-use-mongodb-over.html  http://www.mongodb.com/presentations/automate-mongodb-mongodb-management-service-0  http://www.mongodb.com/presentations/webinar-enterprise-architects-view-mongodb-0