MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB

#MDBlocal A Complete Methodology of Data Modeling for MongoDB Daniel Coupal Education, MongoDB CHICAGO

@ #MDBlocal Daniel Coupal Senior Curriculum Engineer, Education, MongoDB danielcoupal

Goals of the Presentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply

Document versus Tabular Recognize the differences when modeling for a Document Database versus a Relational/Tabular Database

#MDBLocal Thinking in Documents • Polymorphism • different documents may contain different fields • Array • represent a "one-to-many" relation • index entry separately • Sub Document • grouping some fields together • JSON/BSON • documents shown as JSON • BSON is the physical format

#MDBLocal Example: Modeling a blog

#MDBLocal CRDs: Collection-Relationship-Diagrams for two solutions ORSolution A Solution B Queries by articles or users Queries by articles Duplication of users information Simpler

#MDBLocal Example: Modeling a Social Network Solution A Solution B

#MDBLocal Example: Modeling a Social Network ü Slower writes ü More storage space ü Duplication ü Faster reads Pre-aggregated Data Solution A Solution B (Fan Out on writes)(Fan Out on reads)

#MDBLocal Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Differences: Tabular vs Document

#MDBLocal Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Differences: Tabular vs Document

#MDBLocal Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Differences: Tabular vs Document

#MDBLocal Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Schema evolution • difficult and not optimal • likely downtime • easy • no downtime Differences: Tabular vs Document

#MDBLocal Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Schema evolution • difficult and not optimal • likely downtime • easy • no downtime Performance • mediocre • optimized Differences: Tabular vs Document

Methodology Summarize the steps of a methodology when modeling for MongoDB

#MDBLocal Main Tradeoff in Modeling

Methodology 1. Describe the Workload

Methodology 1. Describe the Workload 2. Identify and Model the Relationships

#MDBLocal Actors, Movies and Reviews actor_name date_of_birth movie_title revenues reviewer_name rating

#MDBLocal Actors, Movies and Reviews actors name date_of_birth movies title revenues reviews name rating actor_name date_of_birth movie_title revenues reviewer_name rating

#MDBLocal Actors, Movies and Reviews actors name date_of_birth movies : [ .. ] movies title revenues actors: [ ..] name rating actor_name date_of_birth movie_title revenues reviewer_name rating

Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns

#MDBLocal Flexible Methodology

Use Case Let's start a franchise of coffee shops…

#MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee

#MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in the United States

#MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in the United States • … then we expend to the rest of the World

#MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in the United States • … then we expand to the rest of the World Keys to success: 1. Best coffee in the world

#MDBLocal Case Study: Coffee Shop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in the United States • … then we expand to the rest of the World Keys to success: 1. Best coffee in the world 2. Best Technology

#MDBLocal Make the Best Coffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.

#MDBLocal Key to Success 2: Best Technology a) Intelligent Shelves • Measure inventory in real time

#MDBLocal Key to Success 2: Best Technology a) Intelligent Shelves • Measure inventory in real time b) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection

#MDBLocal Key to Success 2: Best Technology a) Intelligent Shelves • Measure inventory in real time b) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection c) Intelligent Data Storage • MongoDB

#MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed

#MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days

#MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics

#MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup

#MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics

#MDBLocal 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees

#MDBLocal 1 – Workload: quantify/qualify the queries Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s

#MDBLocal Disk Space Cups of coffee • one year of data • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • one year of data • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)

#MDBLocal 2 - Relations are still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads

#MDBLocal 2 - Entities for Beyond the Stars Coffee Entities: • Coffee cups • Stores • Coffee machines • Shelves • Weighings • Coffee bags

Patterns Recognize the need and when to apply Schema Design Patterns

#MDBLocal Schema Design Patterns Resources A. Advanced Schema Design Patterns • MongoDB World 2017 B. Blogs on Patterns, Ken Alger & Daniel Coupal • https://www.mongodb.com/blog/post/building- with-patterns-a-summary C. MongoDB University: M320 – Data Modeling • https://university.mongodb.com/courses/M320/about

#MDBLocal Bucket Pattern { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02"), "temp": [ [ 20.0, 20.1, 20.2, ... ], [ 22.1, 22.1, 22.0, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-03"), "temp": [ [ 20.1, 20.2, 20.3, ... ], [ 22.4, 22.4, 22.3, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T13"), "temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... } } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T14"), "temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... } } Bucket per Day Bucket per Hour

#MDBLocal Solution with Patterns • Schema Versioning • Computed • Subset • Bucket

#MDBLocal https://university.mongodb.com/courses/M320/about Data Modeling Patterns Use Cases

Takeaways from the Presentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply

Thank you for taking our FREE MongoDB classes at university.mongodb.com

Register Now! https://university.mongodb.com/courses/M320/about

#MDBlocal A Complete Methodology of Data Modeling for MongoDB [DEV] Daniel Coupal

Appendix A Schema Versioning Pattern

#MDBLocal Nightmare: Alter Table

#MDBLocal This is what your dreams should be when thinking about a schema upgrade !

#MDBLocal Schema Revision Relational MongoDB Versioned Unit Schema Document Migration Procedure Difficult Easy Service Uptime Interrupted No interruption Rollback Difficult to nightmare-ish Easy

#MDBLocal Application Lifecycle Modify Application • Can read/process all versions of documents • Have different handler per version • Reshape the document before processing it Update all Application servers • Install updated application • Remove old processes Once migration completed • remove the code to process old versions.

#MDBLocal Document Lifecycle New Documents: • Application writes them in latest version Existing Documents A) Use updates to documents • to transform to latest version • keep forever documents that never need an update B) or transform all documents in batch • no worry even if process takes days

#MDBLocal Timeline of the migration

#MDBLocal Problem Solution Use Cases Examples Benefits and Trade-Offs Schema Versioning Pattern ● Avoid downtime while doing schema upgrades ● Upgrading all documents can take hours, days or even weeks when dealing with big data ● Don't want to update all documents No downtime needed Feel in control of the migration Less future technical debt ! May need 2 indexes for same field while in migration period ● Each document gets a "schema_version" field ● Application can handle all versions ● Choose your strategy to migrate the documents ● Every application that use a database, deployed in production and heavily used. ● System with a lot of legacy data

#MDBLocal Mathematical Operations

#MDBLocal "Fan Out" Operations

#MDBLocal "Roll Up" Operations

#MDBLocal Problem Solution Use Cases Examples Benefits and Trade-Offs Computed Pattern ● Costly computation or manipulation of data ● Executed frequently on the same data, producing the same result Read queries are faster Saving on resources like CPU and Disk ! May be difficult to identify the need ! Avoid applying or overusing it unless needed ● Perform the operation and store the result in the appropriate document and collection ● If need to redo the operations, keep the source of them ● Internet Of Things (IOT) ● Event Sourcing ● Time Series Data ● Frequent Aggregation Framework queries

MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB

More Related Content

Similar to MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB

More from MongoDB

Recently uploaded

MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB