introduction to cassandra eben hewitt september 29. 2010 web 2.0 expo new york city
• director, application architecture at a global corp • focus on SOA, SaaS, Events • i wrote this @ebenhewitt
agenda • context • features • data model • api
“nosql”  “big data” • mongodb • couchdb • tokyo cabinet • redis • riak • what about? – Poet, Lotus, Xindice – they’ve been around forever… – rdbms was once the new kid…
innovation at scale • google bigtable (2006) – consistency model: strong – data model: sparse map – clones: hbase, hypertable • amazon dynamo (2007) – O(1) dht – consistency model: client tune-able – clones: riak, voldemort cassandra ~= bigtable + dynamo
proven • The Facebook stores 150TB of data on 150 nodes web 2.0 • used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others
cap theorem •consistency – all clients have same view of data •availability – writeable in the face of node failure •partition tolerance – processing can continue in the face of network failure (crashed router, broken network)
daniel abadi: pacelc
write consistency Level Description ZERO Good luck with that ANY 1 replica (hints count) ONE 1 replica. read repair in bkgnd QUORUM (DCQ for RackAware) (N /2) + 1 ALL N = replication factor Level Description ZERO Ummm… ANY Try ONE instead ONE 1 replica QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1 report ALL N = replication factor read consistency
agenda • context • features • data model • api
cassandra properties • tuneably consistent • very fast writes • highly available • fault tolerant • linear, elastic scalability • decentralized/symmetric • ~12 client languages – Thrift RPC API • ~automatic provisioning of new nodes • 0(1) dht • big data
write op
Staged Event-Driven Architecture • A general-purpose framework for high concurrency & load conditioning • Decomposes applications into stages separated by queues • Adopt a structured approach to event-driven concurrency
instrumentation
data replication
partitioner smack-down Random Preserving • system will use MD5(key) to distribute data across nodes • even distribution of keys from one CF across ranges/nodes Order Preserving • key distribution determined by token • lexicographical ordering • required for range queries – scan over rows like cursor in index • can specify the token for this node to use • ‘scrabble’ distribution
agenda • context • features • data model • api
structure
keyspace • ~= database • typically one per application • some settings are configurable only per keyspace
column family • group records of similar kind • not same kind, because CFs are sparse tables • ex: – User – Address – Tweet – PointOfInterest – HotelRoom
think of cassandra as row-oriented • each row is uniquely identifiable by key • rows group columns and super columns
column family n= 42 user=eben key 123 key 456 user=alison icon= nickname= The Situation
json-like notation User { 123 : { email: alison@foo.com, icon: }, 456 : { email: eben@bar.com, location: The Danger Zone} }
0.6 example $cassandra –f $bin/cassandra-cli cassandra> connect localhost/9160 cassandra> set Keyspace1.Standard1[‘eben’] [‘age’]=‘29’ cassandra> set Keyspace1.Standard1[‘eben’] [‘email’]=‘e@e.com’ cassandra> get Keyspace1.Standard1[‘eben'][‘age'] => (column=6e616d65, value=39, timestamp=1282170655390000)
a column has 3 parts 1. name – byte[] – determines sort order – used in queries – indexed 2. value – byte[] – you don’t query on column values 3. timestamp – long (clock) – last write wins conflict resolution
column comparators • byte • utf8 • long • timeuuid • lexicaluuid • <pluggable> – ex: lat/long
super column super columns group columns under a common name
<<SCF>>PointOfInterest super column family <<SC>>Central Park 10017 <<SC>> Empire State Bldg <<SC>> Phoenix Zoo 85255 desc=Fun to walk in. phone=212. 555.11212 desc=Great view from 102nd floor!
PointOfInterest { key: 85255 { Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. }, Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx key: 10019 { Central Park { desc: Walk around. It's pretty.} , Empire State Building { phone: 212-777-7777, desc: Great view from 102nd floor. } } //end nyc } s super column super column family flexible schema key column super column family
about super column families • sub-column names in a SCF are not indexed – top level columns (SCF Name) are always indexed • often used for denormalizing data from standard CFs
agenda • context • features • data model • api
slice predicate • data structure describing columns to return – SliceRange • start column name • finish column name (can be empty to stop on count) • reverse • count (like LIMIT)
read api • get() : Column – get the Col or SC at given ColPath COSC cosc = client.get(key, path, CL); • get_slice() : List<ColumnOrSuperColumn> – get Cols in one row, specified by SlicePredicate: List<ColumnOrSuperColumn> results = client.get_slice(key, parent, predicate, CL); • multiget_slice() : Map<key, List<CoSC>> – get slices for list of keys, based on SlicePredicate Map<byte[],List<ColumnOrSuperColumn>> results = client.multiget_slice(rowKeys, parent, predicate, CL); • get_range_slices() : List<KeySlice> – returns multiple Cols according to a range – range is startkey, endkey, starttoken, endtoken: List<KeySlice> slices = client.get_range_slices( parent, predicate, keyRange, CL);
write api client.insert(userKeyBytes, parent, new Column(“band".getBytes(UTF8), “Funkadelic".getBytes(), clock), CL); batch_mutate – void batch_mutate( map<byte[], map<String, List<Mutation>>> , CL) remove – void remove(byte[], ColumnPath column_path, Clock, CL)
batch_mutate //create param Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>(); //create Cols for Muts Column nameCol = new Column("name".getBytes(UTF8), “Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime());); Mutation nameMut = new Mutation(); nameMut.column_or_supercolumn = nameCosc; //also phone, etc Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>(); List<Mutation> cols = new ArrayList<Mutation>(); cols.add(nameMut); cols.add(phoneMut); muts.put(CF, cols); //outer map key is a row key; inner map key is the CF name mutationMap.put(rowKey.getBytes(), muts); //send to server client.batch_mutate(mutationMap, CL);
raw thrift: for masochists only • pycassa (python) • fauna (ruby) • hector (java) • pelops (java) • kundera (JPA) • hectorSharp (C#)
what about… SELECT WHERE ORDER BY JOIN ON GROUP ?
rdbms: domain-based model what answers do I have? cassandra: query-based model what questions do I have?
SELECT WHERE cassandra is an index factory <<cf>>USER Key: UserID Cols: username, email, birth date, city, state How to support this query? SELECT * FROM User WHERE city = ‘Scottsdale’ Create a new CF called UserCity: <<cf>>USERCITY Key: city Cols: IDs of the users in that city. Also uses the Valueless Column pattern
• Use an aggregate key state:city: { user1, user2} • Get rows between AZ: & AZ; for all Arizona users • Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale users SELECT WHERE pt 2
ORDER BY Rows are placed according to their Partitioner: •Random: MD5 of key •Order-Preserving: actual key are sorted by key, regardless of partitioner Columns are sorted according to CompareWith or CompareSubcolumnsWith
is cassandra a good fit? • you need really fast writes • you need durability • you have lots of data > GBs >= three servers • your app is evolving – startup mode, fluid data structure • loose domain data – “points of interest” • your programmers can deal – documentation – complexity – consistency model – change – visibility tools • your operations can deal – hardware considerations – can move data – JMX monitoring
thank you! @ebenhewitt

Scaling Web Applications with Cassandra Presentation (1).ppt

  • 1.
    introduction to cassandra ebenhewitt september 29. 2010 web 2.0 expo new york city
  • 2.
    • director, applicationarchitecture at a global corp • focus on SOA, SaaS, Events • i wrote this @ebenhewitt
  • 3.
  • 4.
    “nosql”  “bigdata” • mongodb • couchdb • tokyo cabinet • redis • riak • what about? – Poet, Lotus, Xindice – they’ve been around forever… – rdbms was once the new kid…
  • 5.
    innovation at scale •google bigtable (2006) – consistency model: strong – data model: sparse map – clones: hbase, hypertable • amazon dynamo (2007) – O(1) dht – consistency model: client tune-able – clones: riak, voldemort cassandra ~= bigtable + dynamo
  • 6.
    proven • The Facebookstores 150TB of data on 150 nodes web 2.0 • used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others
  • 7.
    cap theorem •consistency – allclients have same view of data •availability – writeable in the face of node failure •partition tolerance – processing can continue in the face of network failure (crashed router, broken network)
  • 8.
  • 9.
    write consistency Level Description ZEROGood luck with that ANY 1 replica (hints count) ONE 1 replica. read repair in bkgnd QUORUM (DCQ for RackAware) (N /2) + 1 ALL N = replication factor Level Description ZERO Ummm… ANY Try ONE instead ONE 1 replica QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1 report ALL N = replication factor read consistency
  • 10.
  • 11.
    cassandra properties • tuneablyconsistent • very fast writes • highly available • fault tolerant • linear, elastic scalability • decentralized/symmetric • ~12 client languages – Thrift RPC API • ~automatic provisioning of new nodes • 0(1) dht • big data
  • 12.
  • 13.
    Staged Event-Driven Architecture •A general-purpose framework for high concurrency & load conditioning • Decomposes applications into stages separated by queues • Adopt a structured approach to event-driven concurrency
  • 14.
  • 15.
  • 16.
    partitioner smack-down Random Preserving •system will use MD5(key) to distribute data across nodes • even distribution of keys from one CF across ranges/nodes Order Preserving • key distribution determined by token • lexicographical ordering • required for range queries – scan over rows like cursor in index • can specify the token for this node to use • ‘scrabble’ distribution
  • 17.
  • 18.
  • 19.
    keyspace • ~= database •typically one per application • some settings are configurable only per keyspace
  • 20.
    column family • grouprecords of similar kind • not same kind, because CFs are sparse tables • ex: – User – Address – Tweet – PointOfInterest – HotelRoom
  • 21.
    think of cassandraas row-oriented • each row is uniquely identifiable by key • rows group columns and super columns
  • 22.
  • 23.
    json-like notation User { 123: { email: alison@foo.com, icon: }, 456 : { email: eben@bar.com, location: The Danger Zone} }
  • 24.
    0.6 example $cassandra –f $bin/cassandra-cli cassandra>connect localhost/9160 cassandra> set Keyspace1.Standard1[‘eben’] [‘age’]=‘29’ cassandra> set Keyspace1.Standard1[‘eben’] [‘email’]=‘e@e.com’ cassandra> get Keyspace1.Standard1[‘eben'][‘age'] => (column=6e616d65, value=39, timestamp=1282170655390000)
  • 25.
    a column has3 parts 1. name – byte[] – determines sort order – used in queries – indexed 2. value – byte[] – you don’t query on column values 3. timestamp – long (clock) – last write wins conflict resolution
  • 26.
    column comparators • byte •utf8 • long • timeuuid • lexicaluuid • <pluggable> – ex: lat/long
  • 27.
    super column super columnsgroup columns under a common name
  • 28.
    <<SCF>>PointOfInterest super column family <<SC>>Central Park 10017 <<SC>> EmpireState Bldg <<SC>> Phoenix Zoo 85255 desc=Fun to walk in. phone=212. 555.11212 desc=Great view from 102nd floor!
  • 29.
    PointOfInterest { key: 85255{ Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. }, Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx key: 10019 { Central Park { desc: Walk around. It's pretty.} , Empire State Building { phone: 212-777-7777, desc: Great view from 102nd floor. } } //end nyc } s super column super column family flexible schema key column super column family
  • 30.
    about super columnfamilies • sub-column names in a SCF are not indexed – top level columns (SCF Name) are always indexed • often used for denormalizing data from standard CFs
  • 31.
  • 32.
    slice predicate • datastructure describing columns to return – SliceRange • start column name • finish column name (can be empty to stop on count) • reverse • count (like LIMIT)
  • 33.
    read api • get(): Column – get the Col or SC at given ColPath COSC cosc = client.get(key, path, CL); • get_slice() : List<ColumnOrSuperColumn> – get Cols in one row, specified by SlicePredicate: List<ColumnOrSuperColumn> results = client.get_slice(key, parent, predicate, CL); • multiget_slice() : Map<key, List<CoSC>> – get slices for list of keys, based on SlicePredicate Map<byte[],List<ColumnOrSuperColumn>> results = client.multiget_slice(rowKeys, parent, predicate, CL); • get_range_slices() : List<KeySlice> – returns multiple Cols according to a range – range is startkey, endkey, starttoken, endtoken: List<KeySlice> slices = client.get_range_slices( parent, predicate, keyRange, CL);
  • 34.
    write api client.insert(userKeyBytes, parent, newColumn(“band".getBytes(UTF8), “Funkadelic".getBytes(), clock), CL); batch_mutate – void batch_mutate( map<byte[], map<String, List<Mutation>>> , CL) remove – void remove(byte[], ColumnPath column_path, Clock, CL)
  • 35.
    batch_mutate //create param Map<byte[], Map<String,List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>(); //create Cols for Muts Column nameCol = new Column("name".getBytes(UTF8), “Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime());); Mutation nameMut = new Mutation(); nameMut.column_or_supercolumn = nameCosc; //also phone, etc Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>(); List<Mutation> cols = new ArrayList<Mutation>(); cols.add(nameMut); cols.add(phoneMut); muts.put(CF, cols); //outer map key is a row key; inner map key is the CF name mutationMap.put(rowKey.getBytes(), muts); //send to server client.batch_mutate(mutationMap, CL);
  • 36.
    raw thrift: formasochists only • pycassa (python) • fauna (ruby) • hector (java) • pelops (java) • kundera (JPA) • hectorSharp (C#)
  • 37.
  • 38.
    rdbms: domain-based model whatanswers do I have? cassandra: query-based model what questions do I have?
  • 39.
    SELECT WHERE cassandra isan index factory <<cf>>USER Key: UserID Cols: username, email, birth date, city, state How to support this query? SELECT * FROM User WHERE city = ‘Scottsdale’ Create a new CF called UserCity: <<cf>>USERCITY Key: city Cols: IDs of the users in that city. Also uses the Valueless Column pattern
  • 40.
    • Use anaggregate key state:city: { user1, user2} • Get rows between AZ: & AZ; for all Arizona users • Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale users SELECT WHERE pt 2
  • 41.
    ORDER BY Rows are placedaccording to their Partitioner: •Random: MD5 of key •Order-Preserving: actual key are sorted by key, regardless of partitioner Columns are sorted according to CompareWith or CompareSubcolumnsWith
  • 44.
    is cassandra agood fit? • you need really fast writes • you need durability • you have lots of data > GBs >= three servers • your app is evolving – startup mode, fluid data structure • loose domain data – “points of interest” • your programmers can deal – documentation – complexity – consistency model – change – visibility tools • your operations can deal – hardware considerations – can move data – JMX monitoring
  • 45.