The document discusses Apache Cassandra, highlighting its evolution from earlier data models like Dynamo and Bigtable to its unique architecture that employs a distributed row-column store. It also outlines Cassandra's key features including consistent hashing, partition keys, and data replication strategies across multiple data centers, with an emphasis on its capabilities for handling time series data. Additionally, it includes examples of how to create and interact with tables using Cassandra Query Language (CQL).
Dynamo Paper(2007) • Howdo we build a data store that is: • Reliable • Performant • “Always On” • Nothing new and shiny Evolutionary. Real. Computer Science Also the basis for Riak and Voldemort
Token Server •Each partition isa 128 bit value •Consistent hash between 2-63 and 264 •Each node owns a range of those values •The token is the beginning of that range to the next node’s token value •Virtual Nodes break these down further Data Token Range 0 …
Consistency level Consistency LevelNumber of Nodes Acknowledged One One - Read repair triggered Local One One - Read repair in local DC Quorum 51% Local Quorum 51% in local DC
Table CREATE TABLE weather_station( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, PRIMARY KEY(id) ); Table Name Column Name Column CQL Type Primary Key Designation Partition Key
34.
Table CREATE TABLE daily_aggregate_precip( wsid text, year int, month int, day int, precipitation counter, PRIMARY KEY ((wsid), year, month, day) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC); Partition Key Clustering Columns Order Override
35.
Insert INSERT INTO weather_station(id, call_sign, country_code, elevation, lat, long, name, state_code) VALUES ('727930:24233', 'KSEA', 'US', 121.9, 47.467, -122.32, 'SEATTLE SEATTLE-TACOMA INTL A', ‘WA'); Table Name Fields Values Partition Key: Required
36.
Select id | call_sign| country_code | elevation | lat | long | name | state_code --------------+-----------+--------------+-----------+--------+---------+-------------------------------+------------ 727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SEATTLE SEATTLE-TACOMA INTL A | WA SELECT id, call_sign, country_code, elevation, lat, long, name, state_code FROM weather_station WHERE id = '727930:24233'; Fields Table Name Primary Key: Partition Key Required
37.
Update UPDATE weather_station SET name= 'SeaTac International Airport' WHERE id = '727930:24233'; id | call_sign | country_code | elevation | lat | long | name | state_code --------------+-----------+--------------+-----------+--------+---------+------------------------------+------------ 727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SeaTac International Airport | WA Table Name Fields to Update: Not in Primary Key Primary Key
Collections Set CREATE TABLE weather_station( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, equipment set<text> PRIMARY KEY(id) ); equipment set<text> CQL Type: For Ordering Column Name
40.
Collections Set List CREATE TABLE weather_station( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, equipment set<text>, service_dates list<timestamp>, PRIMARY KEY(id) ); equipment set<text> service_dates list<timestamp> CQL Type Column Name CQL Type: For Ordering Column Name
41.
Collections Set List Map CREATE TABLE weather_station( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, equipment set<text>, service_dates list<timestamp>, service_notes map<timestamp,text>, PRIMARY KEY(id) ); equipment set<text> service_dates list<timestamp> service_notes map<timestamp,text> CQL Type Column Name Column Name CQL Key Type CQL Value Type CQL Type: For Ordering Column Name
42.
UDF and UDA UserDefined Function CREATE OR REPLACE AGGREGATE group_and_count(text) SFUNC state_group_and_count STYPE map<text, int> INITCOND {}; CREATE FUNCTION state_group_and_count( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count++; state.put(type, count); return state; ' ; User Defined Aggregate As of Cassandra 2.2
43.
Example: Weather Station •Weather station collects data • Cassandra stores in sequence • Application reads in sequence
44.
Queries supported CREATE TABLEraw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); Get weather data given •Weather Station ID •Weather Station ID and Time •Weather Station ID and Range of Time
45.
Primary Key CREATE TABLEraw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Partition keys 10010:99999 Murmur3Hash Token = 7224631062609997448 722266:13850 Murmur3 Hash Token = -6804302034103043898 INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘722266:13850’,2005,12,1,7,-5.6); Consistent hash. 128 bit number between 2-63 and 264
52.
Partition keys 10010:99999 Murmur3Hash Token = 15 722266:13850 Murmur3 Hash Token = 77 For this example, let’s make it a reasonable number INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘722266:13850’,2005,12,1,7,-5.6);
Date Tiered CompactionStrategy •Group similar time blocks •Never compact again •Used for high density SSTable SSTable SSTable T=2015-01-01 -> 2015-01-5 T=2015-01-06 -> 2015-01-10 T=2015-01-11 -> 2015-01-15
59.
Storage Model -Logical View 2005:12:1:10 -5.6 2005:12:1:9 -5.1 2005:12:1:8 -4.9 10010:99999 10010:99999 10010:99999 wsid hour temperature 2005:12:1:7 -5.3 10010:99999 SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
60.
2005:12:1:10 -5.6 -5.3-4.9-5.1 Storage Model- Disk Layout 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
61.
2005:12:1:10 -5.6 2005:12:1:11 -4.9 -5.3-4.9-5.1 Storage Model- Disk Layout 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
62.
2005:12:1:10 -5.6 2005:12:1:11 -4.9 -5.3-4.9-5.1 Storage Model- Disk Layout 2005:12:1:9 2005:12:1:8 10010:99999 2005:12:1:7 Merged, Sorted and Stored Sequentially SELECT wsid, hour, temperature FROM raw_weather_data WHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1; 2005:12:1:12 -5.4
Query patterns • Rangequeries • “Slice” operation on disk Single seek on disk 10010:99999 Partition key for locality SELECT wsid,hour,temperature FROM raw_weather_data WHERE wsid='10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; 2005:12:1:10 -5.6 -5.3-4.9-5.1 2005:12:1:9 2005:12:1:8 2005:12:1:7
65.
Query patterns • Rangequeries • “Slice” operation on disk Programmers like this Sorted by event_time 2005:12:1:10 -5.6 2005:12:1:9 -5.1 2005:12:1:8 -4.9 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:7 -5.3 10010:99999 SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;