Aggregation Framework MongoDB Days Munich

#mongodbdays @nleite #aggfwk Aggregation Framework Norberto Leite Senior Solutions Architect, MongoDB

Agenda • What is the Aggregation Framework? • The Aggregation Pipeline • Usage and Limitations • Aggregation and Sharding • Summary

What is the Aggregation Framework?

Aggregation in Nutshell • We're storing our data in MongoDB • Our applications need to run ad-hoc queries for grouping, summarizations, reporting, etc. • We must have a way to reshape data easily to support these access patterns • You can use Aggregation Framework for this!

MapReduce is great, but… • Extremely versatile, powerful • Overkill for simple aggregation tasks • Averages • Summation • Grouping • Reshaping • High level of complexity • Difficult to program and debug

Aggregation Framework • Executes in native code – Written in C++ – JSON parameters • Plays nice with sharding • Flexible, functional, and simple – Operation pipeline – Computational expressions

What is an Aggregation Pipeline? • A Series of Document Transformations – Executed in stages – Original input is a collection – Output as a cursor or a collection $match $project $group $sort • Rich Library of Functions – Filter, compute, group, and summarize data – Output of one stage sent to input of next – Operations executed in sequential order

Pipeline Operators • $sort • Order documents • $limit / $skip • Paginate documents • $redact • Restrict documents • $geoNear • Proximity sort documents • $let, $map • Define variables • $match • Filter documents • $project • Reshape documents • $group • Summarize documents • $unwind • Expand documents

Our Example Data { "_id" : ObjectId("54523d2d25784427c6fabce1"), "From" : "norberto@mongodb.com", "To" : "mongodb-user@googlegroups.com", "Date" : ISODate("2012-08-15T22:32:34Z"), "body" : { "text/plain" : ”Hello Munich, nice to see yalll!" }, "Subject" : ”Live From MongoDB World" }

$match • Filter documents – Uses existing query syntax – No $where (server side Javascript)

Matching Field Values { subject: "Hello There", words: 218, from: "norberto@mongodb.com" } { $match: { from: "hipster@somemail.com" }} { subject: "I love Hofbrauhaus", words: 90, from: "norberto@mongodb.com" } { subject: "MongoDB Rules!", words: 100, from: "hipster@somemail.com" } { subject: "MongoDB Rules!", words: 100, from: "hipster@somemail.com" }

Matching with Query Operators { subject: "Hello There", words: 218, from: "norberto@mongodb.com" } { $match: { words: {$gt: 100} }} { subject: "I love Hofbrauhaus", words: 90, from: "norberto@mongodb.com" } { subject: "MongoDB Rules!", words: 100, from: "hipster@somemail.com" } { subject: "MongoDB Rules!", words: 100, from: "hipster@somemail.com" } { subject: "Hello There", words: 218, from: "norberto@mongodb.com" }

$project • Reshape Documents – Include, exclude or rename fields – Inject computed fields – Create sub-document fields

Including and Excluding Fields { _id: 12345, subject: "Hello There", words: 218, from:"norberto@mongodb.com" to: [ "marc@mongodb.com", "sam@mongodb.com" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... } { $project: { _id: 0, subject: 1, from: 1 }} { subject: "Hello There", from:"norberto@mongodb.com" }

Renaming and Computing Fields { $project: { spamIndex: { $mul: ["$words", "$replies"] }, user: "$from" }} { _id: 12345, spamIndex: 72.6666 , user: "norberto@mongodb.com" } { _id: 12345, subject: "Hello There", words: 218, from:"norberto@mongodb.com" to: [ "marc@mongodb.com", "sam@mongodb.com" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... }

Creating Sub-Document Fields { $project: { subject: 1, stats: { replies: "$replies", from: "$from", date: "$date" }}} { _id: 375, subject: "Hello There", stats: { replies: 3, from: "norberto@mongodb.com", date: ISODate("2012-08-05") }} { _id: 12345, subject: "Hello There", words: 218, from:"norberto@mongodb.com" to: [ "marc@mongodb.com", "sam@mongodb.com" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... }

$group • Group documents by value – Field reference, object, constant – Other output fields are computed • $max, $min, $avg, $sum • $addToSet, $push • $first, $last – Processes all data in memory by default

Calculating An Average { $group: { _id: "$from", avgWords: { $avg: "$words" } }} { _id: "norberto@mongodb.com", avgPages: 154 } { _id: "hipster@somemail.com", avgPages: 100 } { subject: "Hello There", words: 218, from: "norberto@mongodb.com" } { subject: "I love Hofbrauhaus", words: 90, from: "norberto@mongodb.com" } { subject: "MongoDB Rules!", words: 100, from: "hipster@somemail.com" }

Summing Fields and Counting { $group: { _id: "$from", words: { $sum: "$words" }, mails: { $sum: 1 } }} { _id: "norberto@mongodb.com", words: 308, mails: 2 } { _id: "hipster@somemail.com", words: 100, mails: 1 } { subject: "Hello There", words: 218, from: "norberto@mongodb.com" } { subject: "I love Hofbrauhaus", words: 90, from: "norberto@mongodb.com" } { subject: "MongoDB Rules!", words: 100, from: "hipster@somemail.com" }

$unwind • Operate on an array field – Create documents from array elements • Array replaced by element value • Missing/empty fields → no output • Non-array fields → error – Pipe to $group to aggregate

Collecting Distinct Values { $unwind: "$to" } { subject: "2.8 will be great!", to: "marc@mongodb.com", account : "mongodb mail” } { _id: 2222, subject: "2.8 will be great!", to: [ "marc@mongodb.com", "eliot@mongodb.com", "asya@mongodb.com", ], account: "mongodb mail" } { subject: "2.8 will be great!", to: "eliot@mongodb.com", account : "mongodb mail” } { subject: "2.8 will be great!", to: "asya@mongodb.com", account : "mongodb mail” }

$sort, $limit, $skip • Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior

$redact • Restrict access to Documents – Use document fields to define privileges – Apply conditional queries to validate users • Field Level Access Control – $$DESCEND, $$PRUNE, $$KEEP – Applies to root and subdocument fields

$redact Example Data { _id: 375, item: "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", Manufacturer: "Sony", security: 0, quantity: 12, list: 4999, pricing: { security: 1, sale: 2698, wholesale: { security: 2, amount: 2300 } } }

Query by Security Level security = 0 db.catalog.aggregate([ { $match: {item: /^.*XBR55X900A*/} }, { $redact: { $cond: { if: { $lte: [ "$security", ?? ] }, then: "$$DESCEND", else: "$$PRUNE" } } }]) { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony”, "security" : 0, "quantity" : 12, "list" : 4999 } { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony", "security" : 0, "quantity" : 12, "list" : 4999, "pricing" : { "security" : 1, "sale" : 2698, "wholesale" : { "security" : 2, "amount" : 2300 } } } security = 2

$geoNear • Order/Filter Documents by Location – Requires a geospatial index – Output includes physical distance – Must be first aggregation stage

$geonear Example Data { "_id" : 35089, "city" : “Sony”, "loc" : [ -86.048397, 32.979068 ], "pop" : 1584, "state" : "AL” }

Query by Proximity db.catalog.aggregate([ { $geoNear : { near: [ -86.000, 33.000 ], distanceField: "dist", maxDistance: .050, spherical: true, num: 3 } }]) { "_id" : "35089", "city" : "KELLYTON", "loc" : [ -86.048397, 32.979068 ], "pop" : 1584, "state" : "AL", "dist" : 0.0007971432165364155 }, { "_id" : "35010", "city" : "NEW SITE", "loc" : [ -85.951086, 32.941445 ], "pop" : 19942, "state" : "AL", "dist" : 0.0012479615347306806 }, { "_id" : "35072", "city" : "GOODWATER", "loc" : [ -86.078149, 33.074642 ], "pop" : 3813, "state" : "AL", "dist" : 0.0017333719627032555 }

Usage • collection.aggregate([…], {<options>}) – Returns a cursor – Takes an optional document to specify aggregation options • allowDiskUse, explain – Use $out to send results to a Collection • db.runCommand({aggregate:<collection>, pipeline:[…]}) – Returns a document, limited to 16 MB

Collection db.books.aggregate([ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ]) { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 }

Database Command db.runCommand({ aggregate: "books", pipeline: [ { $project: { language: 1 }}, 1 } {} }$ group: { _id: "$language", numTitles: { $sum: ] }) { result : [ { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 } ], “ok” : 1 }

Limitations • Pipeline operator memory limits – Stages limited to 100 MB – Use “allowDiskUse” option to use disk for larger data sets • Some BSON types unsupported – Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope

Sharding Result mongos Shard 1 (Primary) $match, $project, $group Shard 2 $match, $project, $group Shard 3 excluded Shard 4 $match, $project, $group • Workload split between shards – Shards execute pipeline up to a point – Primary shard merges cursors and continues processing* – Use explain to analyze pipeline split – Early $match may excuse shards – Potential CPU and memory implications for primary shard host * Prior to v2.6 second stage pipeline processing was done by mongos

Framework Use Cases • Basic aggregation queries • Ad-hoc reporting • Real-time analytics • Visualizing and reshaping data

Extending the Framework • Adding new pipeline operators, expressions • $out and $tee for output control – https://jira.mongodb.org/browse/SERVER-3253

Future Enhancements • Automatically move $match earlier if possible • Pipeline explain facility • Memory usage improvements – Grouping input sorted by _id – Sorting with limited output

Enabling Developers • Doing more within MongoDB, faster • Refactoring MapReduce and groupings – Replace pages of JavaScript – Longer aggregation pipelines • Quick aggregations from the shell

#mongodbdays #aggfwk #devs @mongodb Obrigado! Norberto Leite SA | Eng – norberto@mongodb.com

Aggregation Framework MongoDB Days Munich

More Related Content

What's hot

Viewers also liked

Similar to Aggregation Framework MongoDB Days Munich

More from Norberto Leite

Recently uploaded

Aggregation Framework MongoDB Days Munich