Sunil Sayyaparaju, Citrusleaf Inc
Agenda  Evolution of SQL RDBMS  Need to break out  Fresh Thinking  Spectrum of databases  Future
Evolution of SQL RDBMS  Data management started with flat files  1960: Navigational DBMS  Iterate over entire file on tape. No search  1970: Relational DBMS  God sent Codd  Then came tables, keys, normalization  Adopted tuple calculus to form basis for SQL  System R and Ingres were born ○ gave birth to DB2, Sybase, Informix, Oracle  1980: Object-oriented databases  2000: In-memory, XML databases  2000: Distributed Shared-disk databases
Need to break out  More and more data continued to pour in  Storage costs went up  Were offset by cheaper and larger disks  Speed went down  Were offset by powerful machines  Were offset by several optimizations  Cost went up  Large businesses could bear it  But small businesses ???  24X7 uptime became necessary  Uhh ohhh  Flexibility of DB schema  Uhh ohhh
Distributed Shared-disk Model  Multiple machines sharing a disk  Data copies in cache, single copy on disk  Advantages  Could scale well in reads  Add/Remove individual nodes  Hauntings  Write scalability, Locking  Maintaining transaction semantics  Communication between nodes  Invalidating old replicated data on write  Workaround: Redesign applications  To exploit this model  Called well-partitioned applications  $M Question: If I redesign my application, why not a totally new model ?
Evils of 24x7 uptime  Evils :  s/w or h/w upgrades  Failures  Routine maintenance  DB Schema changes  Workaround:  Replicate data and switch  Problem: Needs manual intervention
Fresh Thinking  I want  24x7 uptime without manual intervention  Flexibility in my database schema  Speed and Predictability  Vertical and horizontal scalability  I don’t want  Splurging money on software and hardware  Overheads unrelated to my use-case  I can loose (Most important)  Attitude: I know to manage my data ○ Several applications already do that. For e.g SAP R/3  Joins, Multi-record transactions  Complex query functionality  SQL altogether
Let us do some housecleaning  Full blown RDBMS  Cutdown RDBMS Query Compilation Query Compilation Query Optimization Query Execution Query Execution Transaction Engine Transaction Engine Storage & Access Storage & Access
Who does not want features ? Formula1 Car Sedan Car Fuel Efficient ? No Yes Can it carry my family ? No Yes Does it have a 6 disk audio player ? No Yes Does it have airbags ? No Yes  Then why will someone buy F1 Car ?  Because it goes amazingly fast  Its does best what it is designed for Trivia: Why F1 cars don’t have airbags ?
Let there be NoSQL  Started as No-SQL  Some evolved into Not-Only-SQL  Horizontal scalability is assumed  Supports latest hardware like SSDs etc  Different flavors of NoSQL  Targeted for different use-cases  Key-value stores  Ordered Key-value stores  Document stores with text search  Graph databases
Spectrum of Databases NoSQL Lotus Notes Citrusleaf ObjectDB Mongo Versant Cassandra Zope Redis MySQL NDB SQL/NoSQL SQL Oracle Oracle RAC HP Nonstop DB2 Sybase SDC VoltDB MS-SQL IBM PureScale Sybase ASE ScaleDB MySQL Monolithic Distributed Distributed Shared-disk Shared-nothing Distributedness
NoSQL Datamodels
Future: Fortunate/Unfortunate ? NoSQL Citrusleaf Mongo Cassandra Redis MySQL NDB SQL/NoSQL SQL Oracle DB2 MS-SQL Sybase ASE MySQL Monolithic Distributed Distributed Shared-disk Shared-nothing Distributedness
Future: More Storage roles Application Hadoop Hadoop Hadoop Hadoop Job Job Job Job HDFS HDFS Mongo Citrusleaf
Conclusion  You cannot just replace SQL with NoSQL  You loose some features when you go to NoSQL  You have to put extra effort to use NoSQL  Make sure that NoSQL is not as fat as SQL  NoSQL solves subset of/specific problems but well  NoSQL is lean and mean  NoSQL is designed to be highly available  NoSQL does not demand powerful hardware

How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL

  • 1.
  • 2.
    Agenda  Evolution ofSQL RDBMS  Need to break out  Fresh Thinking  Spectrum of databases  Future
  • 3.
    Evolution of SQLRDBMS  Data management started with flat files  1960: Navigational DBMS  Iterate over entire file on tape. No search  1970: Relational DBMS  God sent Codd  Then came tables, keys, normalization  Adopted tuple calculus to form basis for SQL  System R and Ingres were born ○ gave birth to DB2, Sybase, Informix, Oracle  1980: Object-oriented databases  2000: In-memory, XML databases  2000: Distributed Shared-disk databases
  • 4.
    Need to breakout  More and more data continued to pour in  Storage costs went up  Were offset by cheaper and larger disks  Speed went down  Were offset by powerful machines  Were offset by several optimizations  Cost went up  Large businesses could bear it  But small businesses ???  24X7 uptime became necessary  Uhh ohhh  Flexibility of DB schema  Uhh ohhh
  • 5.
    Distributed Shared-disk Model Multiple machines sharing a disk  Data copies in cache, single copy on disk  Advantages  Could scale well in reads  Add/Remove individual nodes  Hauntings  Write scalability, Locking  Maintaining transaction semantics  Communication between nodes  Invalidating old replicated data on write  Workaround: Redesign applications  To exploit this model  Called well-partitioned applications  $M Question: If I redesign my application, why not a totally new model ?
  • 6.
    Evils of 24x7uptime  Evils :  s/w or h/w upgrades  Failures  Routine maintenance  DB Schema changes  Workaround:  Replicate data and switch  Problem: Needs manual intervention
  • 7.
    Fresh Thinking  I want  24x7 uptime without manual intervention  Flexibility in my database schema  Speed and Predictability  Vertical and horizontal scalability  I don’t want  Splurging money on software and hardware  Overheads unrelated to my use-case  I can loose (Most important)  Attitude: I know to manage my data ○ Several applications already do that. For e.g SAP R/3  Joins, Multi-record transactions  Complex query functionality  SQL altogether
  • 8.
    Let us dosome housecleaning  Full blown RDBMS  Cutdown RDBMS Query Compilation Query Compilation Query Optimization Query Execution Query Execution Transaction Engine Transaction Engine Storage & Access Storage & Access
  • 9.
    Who does notwant features ? Formula1 Car Sedan Car Fuel Efficient ? No Yes Can it carry my family ? No Yes Does it have a 6 disk audio player ? No Yes Does it have airbags ? No Yes  Then why will someone buy F1 Car ?  Because it goes amazingly fast  Its does best what it is designed for Trivia: Why F1 cars don’t have airbags ?
  • 10.
    Let there beNoSQL  Started as No-SQL  Some evolved into Not-Only-SQL  Horizontal scalability is assumed  Supports latest hardware like SSDs etc  Different flavors of NoSQL  Targeted for different use-cases  Key-value stores  Ordered Key-value stores  Document stores with text search  Graph databases
  • 11.
    Spectrum of Databases NoSQL Lotus Notes Citrusleaf ObjectDB Mongo Versant Cassandra Zope Redis MySQL NDB SQL/NoSQL SQL Oracle Oracle RAC HP Nonstop DB2 Sybase SDC VoltDB MS-SQL IBM PureScale Sybase ASE ScaleDB MySQL Monolithic Distributed Distributed Shared-disk Shared-nothing Distributedness
  • 12.
  • 13.
    Future: Fortunate/Unfortunate ? NoSQL Citrusleaf Mongo Cassandra Redis MySQL NDB SQL/NoSQL SQL Oracle DB2 MS-SQL Sybase ASE MySQL Monolithic Distributed Distributed Shared-disk Shared-nothing Distributedness
  • 14.
    Future: More Storageroles Application Hadoop Hadoop Hadoop Hadoop Job Job Job Job HDFS HDFS Mongo Citrusleaf
  • 15.
    Conclusion  You cannot just replace SQL with NoSQL  You loose some features when you go to NoSQL  You have to put extra effort to use NoSQL  Make sure that NoSQL is not as fat as SQL  NoSQL solves subset of/specific problems but well  NoSQL is lean and mean  NoSQL is designed to be highly available  NoSQL does not demand powerful hardware