Easy to Start, Easy to Develop, Easy to Scale MongoDB In FS 10gen, Inc. October 2012
@dmroberts daniel.roberts@10gen.com Solution(Architect Based(in(London sdf http://www.10gen.com/
10gen$is$the$company$behind MongoDB. • Founded'in'2007 • Dwight'Merriman,'Eliot'Horowitz Set$the Foster direction$& community$& • $73M+'in'funding contribute ecosystem code$to • Flybridge,'Sequoia,'Union'Square, MongoDB NEA • Worldwide'Expanding Team • 170+'employees Provide Provide • Locations: MongoDB$cloud MongoDB • New'York'&'CA,' services support services • London'&'Dublin' • Sydney
MongoDB is a.... •Document Oriented •High Performance •Highly Available •Horizontally Scalable ...Operational Datastore
Changes(impacting(the(traditional RDBMS. Agile(Development • Iterative'&'continuous • New'and'emerging'Apps Volume(and(Type(of Data • Trillions'of'records • 10’s'of'millions'of queries'per'second • Volume'of'data • Semi?structured'and unstructured'data New(Architectures • Systems'scaling'horizontally,'not vertically • Commodity'servers • Cloud'Computing 6
Technology stack adds significant complexity complexity • custom sharding • caching • vertical scaling
Technology stack reduces productivity • denormalize • remove joins • remove transactions productivity
What MongoDB solves
• memcached scalability & performance • key/value • RDBMS depth of functionality
Complex Tables to Documents { title: ‘MongoDB’, contributors: [ { name: ‘Eliot Horowitz’, email: ‘eliot@10gen.com’ }, { name: ‘Dwight Merriman’, email: ‘dwight@10gen.com’ } ], model: { relational: false, awesome: true } }
bsonspec.org
HiAv & Scale Out - Shards read shard1 shard2 shard3 node_c1 node_c2 node_c3 node_b1 node_b2 node_b3 node_a1 node_a2 node_a3 write
Key Features •Indexes on any attribute •Dynamic Query Language •Aggregation Framework •Dynamic & Flexible Schemas •Atomic Updates to documents •Impedance Mismatch removed •High Availability & Failover •Strong consistency of data •Horizonal Scale Out •Hadoop Integration
Use Cases Content%Management Operational%Intelligence E<Commerce User%Data%Management High%Volume%Data%Feeds Mobile
Financial'Services'Use'Cases •High Volume Data Feeds •Tick Data capture •Risk Analytics & Reporting •Product Catalogs & Trade Capture •P&L Reporting •Reference Data Management •Portfolio Management •Quantitative Analysis •Automated Trading
High Volume Data Feeds Use Case: •Ingesting data from different feeds sources - Internal / External •Risk data, Market data, Order data etc Example,)FIX)to)JSON •Any format Fix / FpML / Swift or own "NewOrder-Single" : { •Formats vary over time "Header" : { "BeginString" : "FIX.4.2", • Eg Fix 4.2 to 5.0 "BodyLength" : 190, "MsgType" : "D", •Tick Data "HeaderFields" : { "SenderCompID" : "Client", •Aggregate data from feeds "TargetCompID" : "TradingGateway", "MsgSeqNum" : 4, "SendingTime" : { "UTCFormat_2051-100" : Why MongoDB? "Fri Jun 01 09:36:26 BST 2012" }..... •High ingestion rates of data } } •Flexible schema - no db change when messages change •eg Single collection can maintain multiple FIX formats •Query: •Query Language / Aggregation Framework •Or use internal MR or hadoop to batch process - quantative analysis
Risk Analytics & Reporting Use Case: •Collect and aggregate risk data •Calculate risk / exposures •Potentially real time Why MongoDB? •Collect data from a single or multiple sources •Different formats •Documents used to create ‘pre-aggregated’ reports •Real Time •Aggregation Framework for reporting •e.g. exposure for a counter party •Internal MR or Hadoop connector •Batch process risk data
Product Catalogs and Trade Capture Use Case: •Catalogs of complex financial products •‘Exotics’ difficult to model in relational db. •‘On-boarding’ new products in hours. •RDBMS less flexible for complex products that may require >~50 tables. What’s the impact from technology •Once create how do we capture the details of a new trade? Why MongoDB? •Flexible schema means we don’t need to go back to the database when we have a product to sell. •Single collection for all products... even if they vary greatly •Trades potentially exist for long periods •newer trade can have different data with not impact on the db.
Portfolio / Position reporting Use Case: •Store positions or portfolio information •Query to find current positions/portfolios •Query by client or trader Why MongoDB? •Customer/client my have many different products •Aggregation Framework to calculate values and views •Work on extremely large data sets •Current and historic data
Reference Data Management Use Case: •Global distribute Reference Data across organisation •Manage replication across of data centres •Provide fast read access Why MongoDB? •Sharding and replication to distribute data •Access locally for high performance reads •Fast replication of data •Unstructured reference data easily replicated. •New items/formats replicated without schema migrations NYC LON HK Primary Secondary Secondary Secondary Primary Secondary
Quantative Analysis / Automated Trading Use Case: • Real time and history tick data. BID/OFFER)to)Candlesticks • Strategy testing { "_id" : ObjectId("4f4b8916fb1c80e141ea6201"), • Automated signals "ask" : 1.30028, "bid" : 1.3002, "ts" : ISODate("2012-02-16T12:48:00Z") } Why MongoDB? •Aggregation Framework for shape data 18018 •Bid/Offer -> Candlesticks •MR for batch processing data •Internal MR or Hadoop
Aggregations •Number of choices •MongoDB Map Reduce •Pre-Compute - Schema Design •Hadoop Connector •Aggregation Framework
Aggregation Framework •Much simpler and faster than MongoDB map reduce •Replaces common MR use cases in MongoDB •Native operators in the MongoDB core db.portfolio.aggregate( { $match : { userid : “roberts123” } } , { $group : { _id : "$position" , total : { $sum : “$val” } } } )
Sharded MongoDB + Hadoop Shard&1 Shard&2 Shard&3 Shard&4 Shard&5 c z t f v w y a s u g e h d b x Hadoop Hadoop Hadoop Hadoop Hadoop Node Node Node Node Node Hadoop Hadoop Hadoop Hadoop Node Node Node Node
Summary Document-Oriented Dynamic schema High Volume Data Feeds Agile Tick Data capture Flexible Risk Analytics High Performance Product Catalogs & Trade Highly Available Capture Horizontal Scale Out P&L Reporting Reference Data Management Portfolio Management Quantitative Analysis Automated Trading
download at mongodb.org @dmroberts daniel.roberts@10gen.com Free+online+training+1+http://education.10gen.com/ www.meetup.com/London1MongoDB1User1Group/ sdf Facebook+++++++++Twitter+++++++++LinkedIn http://bit.ly/mongodb @dmroberts http://linkd.in/joinmongo

MongoDB in FS

  • 1.
    Easy to Start,Easy to Develop, Easy to Scale MongoDB In FS 10gen, Inc. October 2012
  • 2.
    @dmroberts daniel.roberts@10gen.com Solution(Architect Based(in(London sdf http://www.10gen.com/
  • 3.
    10gen$is$the$company$behind MongoDB. • Founded'in'2007 •Dwight'Merriman,'Eliot'Horowitz Set$the Foster direction$& community$& • $73M+'in'funding contribute ecosystem code$to • Flybridge,'Sequoia,'Union'Square, MongoDB NEA • Worldwide'Expanding Team • 170+'employees Provide Provide • Locations: MongoDB$cloud MongoDB • New'York'&'CA,' services support services • London'&'Dublin' • Sydney
  • 4.
    MongoDB is a.... •DocumentOriented •High Performance •Highly Available •Horizontally Scalable ...Operational Datastore
  • 5.
    Changes(impacting(the(traditional RDBMS. Agile(Development • Iterative'&'continuous • New'and'emerging'Apps Volume(and(Type(of Data • Trillions'of'records • 10’s'of'millions'of queries'per'second • Volume'of'data • Semi?structured'and unstructured'data New(Architectures • Systems'scaling'horizontally,'not vertically • Commodity'servers • Cloud'Computing 6
  • 6.
    Technology stack adds significantcomplexity complexity • custom sharding • caching • vertical scaling
  • 7.
    Technology stack reduces productivity • denormalize • remove joins • remove transactions productivity
  • 8.
  • 9.
    • memcached scalability &performance • key/value • RDBMS depth of functionality
  • 10.
    Complex Tables toDocuments { title: ‘MongoDB’, contributors: [ { name: ‘Eliot Horowitz’, email: ‘eliot@10gen.com’ }, { name: ‘Dwight Merriman’, email: ‘dwight@10gen.com’ } ], model: { relational: false, awesome: true } }
  • 11.
  • 12.
    HiAv & ScaleOut - Shards read shard1 shard2 shard3 node_c1 node_c2 node_c3 node_b1 node_b2 node_b3 node_a1 node_a2 node_a3 write
  • 13.
    Key Features •Indexes onany attribute •Dynamic Query Language •Aggregation Framework •Dynamic & Flexible Schemas •Atomic Updates to documents •Impedance Mismatch removed •High Availability & Failover •Strong consistency of data •Horizonal Scale Out •Hadoop Integration
  • 14.
    Use Cases Content%Management Operational%Intelligence E<Commerce User%Data%Management High%Volume%Data%Feeds Mobile
  • 15.
    Financial'Services'Use'Cases •High Volume DataFeeds •Tick Data capture •Risk Analytics & Reporting •Product Catalogs & Trade Capture •P&L Reporting •Reference Data Management •Portfolio Management •Quantitative Analysis •Automated Trading
  • 16.
    High Volume DataFeeds Use Case: •Ingesting data from different feeds sources - Internal / External •Risk data, Market data, Order data etc Example,)FIX)to)JSON •Any format Fix / FpML / Swift or own "NewOrder-Single" : { •Formats vary over time "Header" : { "BeginString" : "FIX.4.2", • Eg Fix 4.2 to 5.0 "BodyLength" : 190, "MsgType" : "D", •Tick Data "HeaderFields" : { "SenderCompID" : "Client", •Aggregate data from feeds "TargetCompID" : "TradingGateway", "MsgSeqNum" : 4, "SendingTime" : { "UTCFormat_2051-100" : Why MongoDB? "Fri Jun 01 09:36:26 BST 2012" }..... •High ingestion rates of data } } •Flexible schema - no db change when messages change •eg Single collection can maintain multiple FIX formats •Query: •Query Language / Aggregation Framework •Or use internal MR or hadoop to batch process - quantative analysis
  • 17.
    Risk Analytics &Reporting Use Case: •Collect and aggregate risk data •Calculate risk / exposures •Potentially real time Why MongoDB? •Collect data from a single or multiple sources •Different formats •Documents used to create ‘pre-aggregated’ reports •Real Time •Aggregation Framework for reporting •e.g. exposure for a counter party •Internal MR or Hadoop connector •Batch process risk data
  • 18.
    Product Catalogs andTrade Capture Use Case: •Catalogs of complex financial products •‘Exotics’ difficult to model in relational db. •‘On-boarding’ new products in hours. •RDBMS less flexible for complex products that may require >~50 tables. What’s the impact from technology •Once create how do we capture the details of a new trade? Why MongoDB? •Flexible schema means we don’t need to go back to the database when we have a product to sell. •Single collection for all products... even if they vary greatly •Trades potentially exist for long periods •newer trade can have different data with not impact on the db.
  • 19.
    Portfolio / Positionreporting Use Case: •Store positions or portfolio information •Query to find current positions/portfolios •Query by client or trader Why MongoDB? •Customer/client my have many different products •Aggregation Framework to calculate values and views •Work on extremely large data sets •Current and historic data
  • 20.
    Reference Data Management UseCase: •Global distribute Reference Data across organisation •Manage replication across of data centres •Provide fast read access Why MongoDB? •Sharding and replication to distribute data •Access locally for high performance reads •Fast replication of data •Unstructured reference data easily replicated. •New items/formats replicated without schema migrations NYC LON HK Primary Secondary Secondary Secondary Primary Secondary
  • 21.
    Quantative Analysis /Automated Trading Use Case: • Real time and history tick data. BID/OFFER)to)Candlesticks • Strategy testing { "_id" : ObjectId("4f4b8916fb1c80e141ea6201"), • Automated signals "ask" : 1.30028, "bid" : 1.3002, "ts" : ISODate("2012-02-16T12:48:00Z") } Why MongoDB? •Aggregation Framework for shape data 18018 •Bid/Offer -> Candlesticks •MR for batch processing data •Internal MR or Hadoop
  • 22.
    Aggregations •Number of choices •MongoDB Map Reduce •Pre-Compute - Schema Design •Hadoop Connector •Aggregation Framework
  • 23.
    Aggregation Framework •Much simplerand faster than MongoDB map reduce •Replaces common MR use cases in MongoDB •Native operators in the MongoDB core db.portfolio.aggregate( { $match : { userid : “roberts123” } } , { $group : { _id : "$position" , total : { $sum : “$val” } } } )
  • 24.
    Sharded MongoDB +Hadoop Shard&1 Shard&2 Shard&3 Shard&4 Shard&5 c z t f v w y a s u g e h d b x Hadoop Hadoop Hadoop Hadoop Hadoop Node Node Node Node Node Hadoop Hadoop Hadoop Hadoop Node Node Node Node
  • 25.
    Summary Document-Oriented Dynamic schema High Volume Data Feeds Agile Tick Data capture Flexible Risk Analytics High Performance Product Catalogs & Trade Highly Available Capture Horizontal Scale Out P&L Reporting Reference Data Management Portfolio Management Quantitative Analysis Automated Trading
  • 26.
    download at mongodb.org @dmroberts daniel.roberts@10gen.com Free+online+training+1+http://education.10gen.com/ www.meetup.com/London1MongoDB1User1Group/ sdf Facebook+++++++++Twitter+++++++++LinkedIn http://bit.ly/mongodb @dmroberts http://linkd.in/joinmongo