rogue: a scala dsl for mongodb CS442 - 5/24/2011 Jorge Ortiz (@jorgeortiz85)
what is foursquare? location-based social network - “check-in” to bars, restaurants, museums, parks, etc friend-finder (where are my friends right now?) virtual game (badges, points, mayorships) city guide (local, personalized recommendations) location diary + stats engine (where was I a year ago?) specials (get rewards at your favorite restaurant)
foursquare: the numbers >9M users ~3M checkins/day >15M venues >300k merchants >60 employees
foursquare: the tech Nginx, HAProxy Scala, Lift MongoDB, PostgreSQL (legacy) (Kestrel, Munin, Ganglia, Python, Memcache, ...) All on EC2
what is mongodb? fast, schema-less document store indexes & rich queries on any attribute sharding, auto-balancing replication geo-indexes
mongodb: our numbers 8 clusters some sharded, some not some master/slave, some replica set ~40 machines (68.4GB, m2.4xl on EC2) 2.3 billion records ~15k QPS
mongodb: bson json++ binary wire format more built-in types: objectid date regex bina
mongodb: bson example { “_id” : { “oid” : “4ddbd194686148110d5c1ccc” }, “venuename” : “Starbucks”, “mayorid” : 464, “tags” : [“coffee”, “wifi”, “snacks”], “latlng” : [39.0, -74.0] }
mongodb: query example { “mayorid” : { “$lte” : 100 }, “venuename” : { “$eq” : “Starbucks” }, “tags” : { “$contains” : “wifi” }, “latlng” : { “$near” : [39.0, -74.0] } } { “_id” : -1 }
mongodb: query example val query = (BasicDBOBjectBuilder .start .push(“mayorid”) .add(“$lte”, 100) .pop .push(“veneuname”) .add(“$eq”, “Starbucks”) .pop .get)
rogue: a scala dsl for mongo type-safe all mongo query features logging & validation hooks pagination index-aware cursors http://github.com/foursquare/rogue
rogue: schema example class Venue extends MongoRecord[Venue] { object _id extends ObjectIdField(this) object venuename extends StringField(this) object mayorid extends LongField(this) object tags extends ListField[String](this) object latlng extends LatLngField(this) }
rogue: code example val vs: List[Venue] = (Venue where (_.mayorid <= 100) and (_.venuename eqs “Starbucks”) and (_.tags contains “wifi”) and (_.latlng near (39.0, -74.0, Degrees(0.2)) orderDesc (_._id) fetch (5))
rogue: BaseQuery class BaseQuery[M <: MongoRecord[M], R, Ord, Sel, Lim, Sk](...) { def where[F](clause: M => QueryClause[F]): ... ... }
rogue: QueryField class QueryField[V, M <: MongoRecord[M]] (val field: Field[V, M]) { def eqs(v: V) = new EqClause(...) def neqs(v: V) = ... def in[L <% List[V]](vs: L) = ... def nin[L <% List[V]](vs: L) = ... def exists(b: Boolean) = ... }
rogue: implicits Field[F, M] => QueryField[F, M] Field[LatLong, M] => GeoQueryField[M] Field[List[F], M] => ListQueryField[F, M] Field[String, M] => StringQueryField[M] Field[Int, M] => NumericQueryField[Int, M] Field[Double, M] => NumericQueryField[Double, M] ...
rogue: select val vs: List[(Long, Int)] = (Venue where (_.venuename eqs “Starbucks”) select (_.mayorid, _.mayorCheckins) fetch ())
rogue: select class BaseQuery[M <: MongoRecord[M], R, Ord, Sel, Lim, Sk](...) { def select[F1, F2]( f1: M => SelectField[F1, M], f2: M => SelectField[F2, M]): BaseQuery[M, (F1, F2), ...] = ... ... }
rogue: select problems val vs: List[???] = (Venue where (_.venuename eqs “Starbucks”) select (_.mayorid, _.mayorCheckins) select (_.venuename) fetch ())
rogue: order problems val vs: List[Venue] = (Venue where (_.venuename eqs “Starbucks”) orderDesc (_._id) orderAsc (_.mayorid) // (???) fetch ())
rogue: limit problems val vs: List[Venue] = (Venue where (_.venuename eqs “Starbucks”) limit (5) limit (100) // (???) fetch ())
rogue: phantom types Sel =:= Selected, Unselected Ord =:= Ordered, Unordered Lim =:= Limited, Unlimited Sk =:= Skipped, Unskipped
rogue: select phantom class BaseQuery[M <: MongoRecord[M], R, Sel, Ord, Lim, Sk](...) { def select[F1, F2]( f1: M => SelectField[F1, M], f2: M => SelectField[F2, M]) (implicit ev: Sel =:= Unselected): BaseQuery[M, (F1, F2), Selected, ...] = ... ... }
rogue: select problems val vs: List[???] = (Venue where (_.venuename eqs “Starbucks”) select (_.mayorid, _.mayorCheckins) select (_.venuename) fetch ()) // won’t compile!
rogue: logging & validation logging: slf4j Tracer validation: radius, $in size index checks
rogue: pagination val query: Query[Venue] = ... val vs: List[Venue] = (query .countPerPage(20) .setPage(5) .fetch())
rogue: cursors val query: Query[Checkin] = ... for (checkin <- query) { ... f(checkin) ... }
rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) fetch ())
rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) // hidden scan! fetch ())
rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) // known index scan (_.venueid eqs vid) // known scan fetch ())
rogue: future directions iteratees for cursors compile-time index checking select partial objects generate full javascript for mapreduce
we’re hiring (nyc & sf) http://foursquare.com/jobs jorge@foursquare.com @jorgeortiz85

CS442 - Rogue: A Scala DSL for MongoDB

  • 1.
    rogue: a scala dslfor mongodb CS442 - 5/24/2011 Jorge Ortiz (@jorgeortiz85)
  • 2.
    what is foursquare? location-basedsocial network - “check-in” to bars, restaurants, museums, parks, etc friend-finder (where are my friends right now?) virtual game (badges, points, mayorships) city guide (local, personalized recommendations) location diary + stats engine (where was I a year ago?) specials (get rewards at your favorite restaurant)
  • 3.
    foursquare: the numbers >9Musers ~3M checkins/day >15M venues >300k merchants >60 employees
  • 4.
    foursquare: the tech Nginx, HAProxy Scala, Lift MongoDB, PostgreSQL (legacy) (Kestrel, Munin, Ganglia, Python, Memcache, ...) All on EC2
  • 5.
    what is mongodb? fast,schema-less document store indexes & rich queries on any attribute sharding, auto-balancing replication geo-indexes
  • 6.
    mongodb: our numbers 8clusters some sharded, some not some master/slave, some replica set ~40 machines (68.4GB, m2.4xl on EC2) 2.3 billion records ~15k QPS
  • 7.
    mongodb: bson json++ binary wireformat more built-in types: objectid date regex bina
  • 8.
    mongodb: bson example { “_id” : { “oid” : “4ddbd194686148110d5c1ccc” }, “venuename” : “Starbucks”, “mayorid” : 464, “tags” : [“coffee”, “wifi”, “snacks”], “latlng” : [39.0, -74.0] }
  • 9.
    mongodb: query example { “mayorid” : { “$lte” : 100 }, “venuename” : { “$eq” : “Starbucks” }, “tags” : { “$contains” : “wifi” }, “latlng” : { “$near” : [39.0, -74.0] } } { “_id” : -1 }
  • 10.
    mongodb: query example valquery = (BasicDBOBjectBuilder .start .push(“mayorid”) .add(“$lte”, 100) .pop .push(“veneuname”) .add(“$eq”, “Starbucks”) .pop .get)
  • 11.
    rogue: a scaladsl for mongo type-safe all mongo query features logging & validation hooks pagination index-aware cursors http://github.com/foursquare/rogue
  • 12.
    rogue: schema example classVenue extends MongoRecord[Venue] { object _id extends ObjectIdField(this) object venuename extends StringField(this) object mayorid extends LongField(this) object tags extends ListField[String](this) object latlng extends LatLngField(this) }
  • 13.
    rogue: code example valvs: List[Venue] = (Venue where (_.mayorid <= 100) and (_.venuename eqs “Starbucks”) and (_.tags contains “wifi”) and (_.latlng near (39.0, -74.0, Degrees(0.2)) orderDesc (_._id) fetch (5))
  • 14.
    rogue: BaseQuery class BaseQuery[M<: MongoRecord[M], R, Ord, Sel, Lim, Sk](...) { def where[F](clause: M => QueryClause[F]): ... ... }
  • 15.
    rogue: QueryField class QueryField[V,M <: MongoRecord[M]] (val field: Field[V, M]) { def eqs(v: V) = new EqClause(...) def neqs(v: V) = ... def in[L <% List[V]](vs: L) = ... def nin[L <% List[V]](vs: L) = ... def exists(b: Boolean) = ... }
  • 16.
    rogue: implicits Field[F, M]=> QueryField[F, M] Field[LatLong, M] => GeoQueryField[M] Field[List[F], M] => ListQueryField[F, M] Field[String, M] => StringQueryField[M] Field[Int, M] => NumericQueryField[Int, M] Field[Double, M] => NumericQueryField[Double, M] ...
  • 17.
    rogue: select val vs:List[(Long, Int)] = (Venue where (_.venuename eqs “Starbucks”) select (_.mayorid, _.mayorCheckins) fetch ())
  • 18.
    rogue: select class BaseQuery[M<: MongoRecord[M], R, Ord, Sel, Lim, Sk](...) { def select[F1, F2]( f1: M => SelectField[F1, M], f2: M => SelectField[F2, M]): BaseQuery[M, (F1, F2), ...] = ... ... }
  • 19.
    rogue: select problems valvs: List[???] = (Venue where (_.venuename eqs “Starbucks”) select (_.mayorid, _.mayorCheckins) select (_.venuename) fetch ())
  • 20.
    rogue: order problems valvs: List[Venue] = (Venue where (_.venuename eqs “Starbucks”) orderDesc (_._id) orderAsc (_.mayorid) // (???) fetch ())
  • 21.
    rogue: limit problems valvs: List[Venue] = (Venue where (_.venuename eqs “Starbucks”) limit (5) limit (100) // (???) fetch ())
  • 22.
    rogue: phantom types Sel=:= Selected, Unselected Ord =:= Ordered, Unordered Lim =:= Limited, Unlimited Sk =:= Skipped, Unskipped
  • 23.
    rogue: select phantom classBaseQuery[M <: MongoRecord[M], R, Sel, Ord, Lim, Sk](...) { def select[F1, F2]( f1: M => SelectField[F1, M], f2: M => SelectField[F2, M]) (implicit ev: Sel =:= Unselected): BaseQuery[M, (F1, F2), Selected, ...] = ... ... }
  • 24.
    rogue: select problems valvs: List[???] = (Venue where (_.venuename eqs “Starbucks”) select (_.mayorid, _.mayorCheckins) select (_.venuename) fetch ()) // won’t compile!
  • 25.
    rogue: logging &validation logging: slf4j Tracer validation: radius, $in size index checks
  • 26.
    rogue: pagination val query:Query[Venue] = ... val vs: List[Venue] = (query .countPerPage(20) .setPage(5) .fetch())
  • 27.
    rogue: cursors val query:Query[Checkin] = ... for (checkin <- query) { ... f(checkin) ... }
  • 28.
    rogue: index-aware val vs:List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) fetch ())
  • 29.
    rogue: index-aware val vs:List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) // hidden scan! fetch ())
  • 30.
    rogue: index-aware val vs:List[Checkin] = (Checkin where (_.userid eqs 646) // known index scan (_.venueid eqs vid) // known scan fetch ())
  • 31.
    rogue: future directions iteratees for cursors compile-time index checking select partial objects generate full javascript for mapreduce
  • 32.
    we’re hiring (nyc & sf) http://foursquare.com/jobs jorge@foursquare.com @jorgeortiz85