Data Modelling for MongoDB Norberto Leite MongoDB May 14th, 2019 Tel Aviv, Israel
Norberto Leite Lead Engineer - Curriculum @ MongoDB norberto@mongodb.com New York @nleite
https://university.mongodb.com
Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
Differences when Modelling for a Document Database versus a Relational Database
Thinking in Documents 1. Polymorphism • different documents may contain different fields 2. Array • represent a "one-to-many" relation • index is on all entries 3. Sub Document • grouping some fields together 4. JSON/BSON • documents are often shown as JSON • BSON is the physical format
Example: modelling a blog
… 5 tabes become 1 or 2 collections
Example: Modelling a Social Network
Tabular MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema 3rd normal form One solution many solutions possible Final schema likely denormalized few changes Schema evolution difficult and not optimal Likely downtime easy and no downtime Performance mediocre optimized Differences: Relational/Tabular vs Document
Other Considerations for the Model 1. one-to-many relationships where "many" is a humongous number 2. Embed or Reference • Joins via $lookup • Transactions for multi document writes 3. Transactions available for Replica set, and soon for Sharded Clusters 4. Sharding Key 5. Indexes 6. Simple queries, or more complex ones with the Aggregation Framework
Flexible Modelling Methodology for MongoDB
Methodology
Methodology 1. Describe the Workload
Methodology 1. Describe the Workload 2. Identify and Model the Relationships
Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
Flexible Methodology
Case Study: ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬ A. Business: coffee shop franchises B. Name: Cuppa Coffee also considered: Coffee Mate, Crocodile Coffee C. Objective: • 10 000 stores in Israel, Kazakhstan, Romania, Ukraine ... • … then we invade America D. Keys to success: • Best coffee in the world • Technology
Make the Best Coffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.
Technology 1. Measure inventory in real time • Shelves with scales 2. Big Data collection on cups of coffee • weighings, temperature, time to produce, … 3. Data Analysis • Coffee perfection • Rush hours -> staffing needs 4. MongoDB
Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees
Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify
Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify
Disk Space Cups of coffee (one year of data) • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)
Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
2 - Relations are still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads
2 - Entities for ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬ - Coffee cups - Stores - Coffee machines - Shelves - Weighings - Coffee bags
Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
Schema Design Patterns
Schema Design Patterns Resources A. Advanced Schema Design Patterns • MongoDB World 2017 • Webinar B. MongoDB University • university.mongodb.com • M320 – Data Modeling (2019) C. Blogs on Schema Design Patterns https://www.mongodb.com/blog/post/building-with-patterns-a-summary
Schema Versioning
Computed Pattern
Subset Pattern
Subset Pattern
Bucket Pattern
Bucket Pattern { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02"), "temp": [ [ 20.0, 20.1, 20.2, ... ], [ 22.1, 22.1, 22.0, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-03"), "temp": [ [ 20.1, 20.2, 20.3, ... ], [ 22.4, 22.4, 22.3, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T13"), "temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... } } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T14"), "temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... } } Bucket per Day Bucket per Hour
External Reference Pattern
Cuppa Coffee - Solution with Patterns • Schema Versioning • Subset • Computed • Bucket • External Reference
Conclusion
Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database
Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns
Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns Recognize the need and when to apply Schema Design Patterns
Coming Soon … • "Data Modelling" course at: university.mongodb.com
Norberto Leite Lead Engineer norberto@mongodb.com @nleite
Data Modeling for MongoDB

Data Modeling for MongoDB

  • 1.
    Data Modelling forMongoDB Norberto Leite MongoDB May 14th, 2019 Tel Aviv, Israel
  • 2.
    Norberto Leite Lead Engineer- Curriculum @ MongoDB norberto@mongodb.com New York @nleite
  • 3.
  • 4.
    Goals of thePresentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 5.
    Goals of thePresentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 6.
    Goals of thePresentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 7.
    Differences when Modellingfor a Document Database versus a Relational Database
  • 9.
    Thinking in Documents 1.Polymorphism • different documents may contain different fields 2. Array • represent a "one-to-many" relation • index is on all entries 3. Sub Document • grouping some fields together 4. JSON/BSON • documents are often shown as JSON • BSON is the physical format
  • 10.
  • 11.
    … 5 tabesbecome 1 or 2 collections
  • 12.
    Example: Modelling aSocial Network
  • 13.
    Tabular MongoDB Steps tocreate the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema 3rd normal form One solution many solutions possible Final schema likely denormalized few changes Schema evolution difficult and not optimal Likely downtime easy and no downtime Performance mediocre optimized Differences: Relational/Tabular vs Document
  • 14.
    Other Considerations forthe Model 1. one-to-many relationships where "many" is a humongous number 2. Embed or Reference • Joins via $lookup • Transactions for multi document writes 3. Transactions available for Replica set, and soon for Sharded Clusters 4. Sharding Key 5. Indexes 6. Simple queries, or more complex ones with the Aggregation Framework
  • 15.
  • 17.
  • 18.
  • 19.
    Methodology 1. Describe the Workload 2.Identify and Model the Relationships
  • 23.
    Methodology 1. Describe the Workload 2.Identify and Model the Relationships 3. Apply Patterns
  • 24.
  • 25.
    Case Study: ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬ A.Business: coffee shop franchises B. Name: Cuppa Coffee also considered: Coffee Mate, Crocodile Coffee C. Objective: • 10 000 stores in Israel, Kazakhstan, Romania, Ukraine ... • … then we invade America D. Keys to success: • Best coffee in the world • Technology
  • 26.
    Make the BestCoffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.
  • 27.
    Technology 1. Measure inventoryin real time • Shelves with scales 2. Big Data collection on cups of coffee • weighings, temperature, time to produce, … 3. Data Analysis • Coffee perfection • Rush hours -> staffing needs 4. MongoDB
  • 28.
    Methodology 1. Describe the Workload 2.Identify and Model the Relationships 3. Apply Patterns
  • 29.
    1 – Workload:List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees
  • 30.
    Query Quantification Qualification 1.Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify
  • 31.
    Query Quantification Qualification 1.Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify
  • 32.
    Disk Space Cups ofcoffee (one year of data) • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)
  • 33.
    Methodology 1. Describe the Workload 2.Identify and Model the Relationships 3. Apply Patterns
  • 34.
    2 - Relationsare still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads
  • 35.
    2 - Entitiesfor ‫א‬‫ס‬‫פ‬‫ר‬‫ס‬‫ו‬‫א‬‫ר‬‫ו‬‫מ‬‫ט‬‫י‬ - Coffee cups - Stores - Coffee machines - Shelves - Weighings - Coffee bags
  • 36.
    Methodology 1. Describe the Workload 2.Identify and Model the Relationships 3. Apply Patterns
  • 37.
  • 38.
    Schema Design Patterns Resources A.Advanced Schema Design Patterns • MongoDB World 2017 • Webinar B. MongoDB University • university.mongodb.com • M320 – Data Modeling (2019) C. Blogs on Schema Design Patterns https://www.mongodb.com/blog/post/building-with-patterns-a-summary
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    Bucket Pattern { "device_id": 000123456, "type":"2A", "date": ISODate("2018-03-02"), "temp": [ [ 20.0, 20.1, 20.2, ... ], [ 22.1, 22.1, 22.0, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-03"), "temp": [ [ 20.1, 20.2, 20.3, ... ], [ 22.4, 22.4, 22.3, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T13"), "temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... } } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T14"), "temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... } } Bucket per Day Bucket per Hour
  • 45.
  • 46.
    Cuppa Coffee -Solution with Patterns • Schema Versioning • Subset • Computed • Bucket • External Reference
  • 47.
  • 48.
    Takeaways from thePresentation Recognize the differences when modelling for a Document Database versus a Relational Database
  • 49.
    Takeaways from thePresentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns
  • 50.
    Takeaways from thePresentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns Recognize the need and when to apply Schema Design Patterns
  • 51.
    Coming Soon … •"Data Modelling" course at: university.mongodb.com
  • 52.