MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
The document outlines a comprehensive methodology for data modeling in MongoDB, contrasting document-based with tabular modeling approaches. It details various modeling patterns, such as the use of fields, arrays, and sub-documents, through specific use cases like a coffee shop franchise and social networks. The document also emphasizes the importance of schema design patterns and practical considerations for schema evolution and workload management.
Introduction to the methodology for data modeling in MongoDB focusing on differences between document and tabular databases, methodology steps, and use case.
Explains the distinctions when modeling for Document databases versus Relational/Tabular databases.
Details on document model components: fields, arrays for one-to-many relations, and sub-documents for one-to-one relationships.
Contrasts data storage in relational and document databases and provides examples of modeling a blog and a social network.
Highlights differences in modeling steps for tabular and MongoDB, detailing schema creation and evolution.
Outlines a methodology for MongoDB: describing workloads, modeling relationships, and applying patterns.
Introduces a case study on a coffee shop franchise aiming for 10,000 stores, detailing objectives and success factors.
Lists queries used in operations and quantification of workloads like coffee weights and inventory anomalies.
Analyzes critical queries’ characteristics and estimated disk space utilization for coffee-related data.
Discusses the importance of relationships in document modeling and identifies key entities pertinent to the coffee shop case.
Introduces and demonstrates various schema design patterns utilized in MongoDB to enhance data modeling.
Summarizes key insights from the presentation regarding differences in data models, methodologies, and patterns.
Details schema versioning and computed patterns as part of advanced data modeling techniques in MongoDB.
Goals of thePresentation Introduction Document vs Tabular Recognize the differences
4.
Goals of thePresentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB
5.
Goals of thePresentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Use Case Franchise of coffee shops
6.
Goals of thePresentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply them Use Case Franchise of coffee shops
7.
Goals of thePresentation Introduction Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply them Use Case Franchise of coffee shops Conclusion and Questions
#MDBLocal CRDs: A fewCollection-Relationship-Diagrams Solutions Solution A Queries by users Simple
23.
#MDBLocal CRDs: A fewCollection-Relationship-Diagrams Solutions Solution A Queries by articles Queries by users Duplication of users information Simple Solution B
24.
#MDBLocal CRDs: A fewCollection-Relationship-Diagrams Solutions Solution A Solution C Queries by articles Queries by users Duplication of users information Simple Solution B
#MDBLocal Example 2: Modelinga Social Network Solution C writes reads ü Slower writes ü More storage space ü Duplication ü Faster reads Pre-aggregated Data Follower Profiles
30.
#MDBLocal Differences: Tabular vsDocument Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema
31.
#MDBLocal Differences: Tabular vsDocument Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions
32.
#MDBLocal Differences: Tabular vsDocument Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes
33.
#MDBLocal Differences: Tabular vsDocument Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Schema evolution • difficult and not optimal • likely downtime • easy • no downtime
34.
#MDBLocal Differences: Tabular vsDocument Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema • 3rd normal form • one possible solution • many possible solutions Final schema • likely denormalized • few changes Schema evolution • difficult and not optimal • likely downtime • easy • no downtime Performance • mediocre • optimized
#MDBLocal Case Study: CoffeeShop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America
48.
#MDBLocal Case Study: CoffeeShop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America • … then we expend to the rest of the World
49.
#MDBLocal Case Study: CoffeeShop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America • … then we expand to the rest of the World Keys to success: 1. Best coffee in the world
50.
#MDBLocal Case Study: CoffeeShop Franchises Name: Beyond the Stars Coffee Objective: • 10 000 stores in North America • … then we expand to the rest of the World Keys to success: 1. Best coffee in the world 2. Best Technology
51.
#MDBLocal First Key toSuccess: Make the Best Coffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.
52.
#MDBLocal Second Key toSuccess: Use the Best Technology a) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection
53.
#MDBLocal Key to Success2: Best Technology a) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection b) Intelligent Shelves • Measure inventory in real time
54.
#MDBLocal Key to Success2: Best Technology a) Intelligent Coffee Machines • Weightings, temperature, time to produce, … • Coffee perfection b) Intelligent Shelves • Measure inventory in real time c) Intelligent Data Storage • MongoDB
#MDBLocal 1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed
57.
#MDBLocal 1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days
58.
#MDBLocal 1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics
59.
#MDBLocal 1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup
60.
#MDBLocal 1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics
61.
#MDBLocal 1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees
62.
#MDBLocal 1 – Workload:quantify/qualify the queries Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s
63.
#MDBLocal 1 – Workload:quantify/qualify the queries Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s
64.
#MDBLocal 1 – Workload:details of the most important queries Attribute Value Description Making a cup of coffee at rush hour Type Write Frequency 3 000 000 writes/hr 833 writes/sec Size 100 bytes Consistency/Integrity weak Latency < 10 sec Durability weak Life/Duration 1 year Security None
65.
#MDBLocal Disk Space Cups ofcoffee • one year of data • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • one year of data • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)
#MDBLocal 2 - Relationsare still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads
#MDBLocal Schema Design PatternsResources A. Advanced Schema Design Patterns, Daniel Coupal • MongoDB World 2017 B. Blogs on Patterns, Ken Alger & Daniel Coupal • https://www.mongodb.com/blog/post/building- with-patterns-a-summary C. MongoDB University: M320 – Data Modeling • https://university.mongodb.com/courses/M320/about
Takeaways from thePresentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply
84.
Takeaways from thePresentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply
85.
Takeaways from thePresentation Document vs Tabular Recognize the differences Methodology Summarize the steps when modeling for MongoDB Patterns Recognize when to apply
86.
Thank you fortaking our FREE MongoDB classes at university.mongodb.com
#MDBlocal Every session yourate enters you into a drawing for a gift card and TWO passes to MongoDB World 2020! A Complete Methodology of Data Modeling with MongoDB https://www.surveymonkey.com/r/W8N6DLY
#MDBLocal Application Lifecycle Modify Application •Can read/process all versions of documents • Have different handler per version • Reshape the document before processing it Update all Application servers • Install updated application • Remove old processes Once migration completed • remove the code to process old versions.
97.
#MDBLocal Document Lifecycle New Documents: •Application writes them in latest version Existing Documents A) Use updates to documents • to transform to latest version • keep forever documents that never need an update B) or transform all documents in batch • no worry even if process takes days
#MDBLocal Problem Solution Use CasesExamples Benefits and Trade-Offs Schema Versioning Pattern • Avoid downtime while doing schema upgrades • Upgrading all documents can take hours, days or even weeks when dealing with big data • Don't want to update all documents No downtime needed Feel in control of the migration Less future technical debt 🆇 May need 2 indexes for same field while in migration period • Each document gets a "schema_version" field • Application can handle all versions • Choose your strategy to migrate the documents • Every application that use a database, deployed in production and heavily used. • System with a lot of legacy data
#MDBLocal Problem Solution Use CasesExamples Benefits and Trade-Offs Computed Pattern • Costly computation or manipulation of data • Executed frequently on the same data, producing the same result Read queries are faster Saving on resources like CPU and Disk 🆇 May be difficult to identify the need 🆇 Avoid applying or overusing it unless needed • Perform the operation and store the result in the appropriate document and collection • If need to redo the operations, keep the source of them • Internet Of Things (IOT) • Event Sourcing • Time Series Data • Frequent Aggregation Framework queries