MongoDB Queries and Aggregation Valeri Karpov Kernel Tools Engineer, MongoDB www.thecodebarbarian.com github.com/vkarpov15 @code_barbarian
Introducing an Awesome Data Set •Scraped basketball-reference.com •Mad props to NPM module Cheerio •Box scores for all 31,686 NBA games since 1985 •Download: http://bit.ly/1jlgs9u via S3 •Untar and run mongorestore *
Data Set Structure •Contains final score •Contains box score for teams and players *
Data Set Structure - High Level •Contains _id, date •Info on winning team and losing team *
Data Set Structure - Box •Box score contains detailed stats by team *
Data Set Structure - Box •And also for individual players: *
Queries and Aggregation •MongoDB has a rich query framework •Aggregation framework is like SQL’s group by *
Query Basics - findOne() •When was Kobe Bryant’s 81 point game? *
Query Basics - find() •Which teams have lost despite scoring more than 150 points? *
Query Basics - count() •How many games did the Lakers win in the 19992000 season? *
Query Basics - distinct() •Which teams have lost a game despite having a player make at least 10 3 pointers? *
Query Basics - $elemMatch operator •When did Michael Jordan score 60 points in a losing effort? *
Query Basics - $elemMatch operator *
Query Basics - .sort() and .limit() •What are the 5 highest point totals for a losing team? *
Query Basics - .sort() and .limit() •What are the 5 highest point totals for a losing team? *
Aggregation •Similar to SQL group by •Filters and transforms data in pipeline stages •Stages are chainable •Accessible via the .aggregate() function in shell *
Aggregation - Lakers Season PPG •How many points did the Lakers average in games they won in the 2008-2009 season? *
Aggregation - Lakers Season PPG •How many points did the Lakers average in games they won in the 2008-2009 season? *
Aggregation - $sort and $limit •Compute the teams with the 5 best records in the 1999-2000 season *
Aggregation - $sort and $limit *
Aggregation - $sort and $limit *
Aggregation - $unwind •Random statistic: player with highest scoring average in games their team lost *
Aggregation - $unwind •Random statistic: player with highest scoring average in games their team lost *
Aggregation - Fun With Steals •How often does a team win when they record more steals than the other team? *
Aggregation - Fun With Steals *
Aggregation - Fun With Steals *
Thanks for Listening! Slides on Twitter, @code_barbarian *

MongoDB: Queries and Aggregation Framework with NBA Game Data