Matteo Baglini www.dotnettoscana.org Software Developer, Freelance matteo.baglini@gmail.com http://it.linkedin.com/in/matteobaglini http://github.cpom/bmatte
«Advanced key-value store. It is often referred to as a data structure server» 2
Key Value page:index <html><head>[...] user:123:session xDrSdEwd4dSlZkEkj+ user:123:avatar 77u/PD94bWwgdm+ Everything is a «blob» Commands, primarily, can GET and SET the values 3
Key Value Type page:index <html><head>[...] String events:timeline { «Joe logged», «File X Uploaded», …} List logged:today { 1, 2, 3, 4, 5 } Set time => 10927353 user:123:profile Hash username => bmatte joe ~ 1.3483 game:leaderboard smith ~ 293.45 Sorted Set fred ~ 83.22 Different «data type/structure» Rich set of specialized commands 4
 Everything is stored in memory  Screamingly fast performance  Persistent via snapshot or append-only log file  Replication (only Master/Slave)  Extensible via embedded scripting engine (Lua)  Rich set of client libraries  High availability (In progress) ◦ Cluster (Fault tolerance, Multi-Node consistence) ◦ Sentinel (Monitoring, Notification, Automatic failover) 5
 Created by Salvatore Sanfilippo (@antirez)  First «public release» in March 2009.  Since 2010 sponsored by VMware. Initially written to improve performance of Web Analytics product LLOOGG out of his startup 6
 Written in ANSI C  No external dependencies  Single thread (asynchronous evented I/O)  Works on all POSIX-like system  Exist unofficial build for Windows  Open-source BSD licensed  Community (list, IRC & wiki) 7
1. A DSL for Abstract Data Types. 2. Memory storage is #1. 3. Fundamental data structures for a fundamental API. 4. Code is like a poem. 5. We're against complexity. 6. Two levels of API. 7. We optimize for joy. 8
Getting Started 9
Latest stable version (2.6.*) 10
Latest unstable version (2.9.7) 11
12
13
14
15
Data Types 16
Strings 17
Any blob will do (A value can be at max 512MB) 18
Operations on strings holding an integer 19
20
 Sharing state across processes ◦ Distribute lock, Incremental ID, Time series, User session.  Web Analytics ◦ User visit (day, week, month), Feature Tracking.  Caching ◦ String values can hold arbitrary data.  Rate limiting ◦ Limit number of API calls/minute. 21
Keys 22
Any item in can be made to expire after or at a certain time. 23
24
Lists 25
Sequence of string values 26
Sequence of string values (Max length is 232 - 1 elements) 27
Prevent indefinite growth 28
29
 Events Store or Notification ◦ Logs, Social Network Timelines, Notifications.  Fixed Data ◦ Last N activity.  Message Passing ◦ Durable MQ, Job Queue.  Circular list 30
Sets 31
Unordered set of unique values 32
Unordered set of unique values (Max number of members is 232 – 1) 33
You can do unions, intersections, differences of sets in very short time. 34
35
 Web Analytics ◦ Unique Page View, IP addresses visiting.  Relations ◦ Friends, Followers, Tags.  Caching Result ◦ Store result of expensive intersection of data. 36
Sorted Set 37
Ordered set of unique values 38
Access by rank 39
Access by score 40
41
 Web Analytics ◦ Online users, Most visited pages.  Leaderbord ◦ Show top N.  Order by data ◦ Maintain a set of ordered data like user by age. 42
Hashes 43
Key → Value map (as value) 44
Set attributes (Store up to 232 - 1 field-value pairs) 45
Get attributes 46
47
 Storing Objects ◦ Hashes are maps between string fields and string values, so they are the perfect data type to represent objects. 48
Persistence 49
Dump data to disk after certain conditions are met 50
 Pro: ◦ RDB is a very compact single-file. ◦ RDB files are perfect for backups. ◦ RDB is very good for disaster recovery. ◦ RDB allows faster restarts with big datasets. ◦ RDB maximizes performances (backgr. I/O via fork(2)).  Contro: ◦ RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). ◦ Fork can be time consuming if the dataset is very big. 51
Append all write operations to a log 52
Durability depends on fsync(2) policy 53
 Pro: ◦ AOF is much more durable. ◦ AOF is an append only log, no seeks, nor corruption problems (for example after a power outage). ◦ AOF contains a log of all the operations one after the other in an easy to understand and parse format.  Contro: ◦ AOF files are usually bigger than the equivalent RDB. ◦ AOF can be slower then RDB depending on the exact fsync policy. 54
 Use both persistence methods if you want a degree of data safety comparable to what any RDBMS can provide you.  If you care a lot about your data, but still can live with a few minutes of data lose in case of disasters, you can simply use RDB alone.  There are many users using AOF alone, but we discourage it since to have an RDB snapshot from time to time is a great idea for doing database backups, for faster restarts. 55
C# Clients 56
Rich set of clients 57
58
59
Code 60
Transactions 61
Multiple commands (ACID) 62
63
 Classic scenario ◦ Multi atomic commands.  Optimistic locking ◦ Check and Set (CAS Pattern) write only if not changed. 64
Publish Subscribe 65
Provide 1-N messaging 66
Subscribe multi channels decoupled from the key space 67
Publish on some channel 68
Subscriber getting notified 69
70
 Message Passing ◦ Distribute message-oriented system, Event- Driven Architecture, Service Bus. 71
Code 72
Replication 73
One master replicate to multiple slaves 74
Slave send SYNC command and master transfers the database file to the slave 75
Slaves can perform only read operation 76
 Scalability ◦ Multiple slaves for read-only queries.  Redundancy ◦ Data replication.  Slave of Slave ◦ Graph-like structure for more scalability e redundancy. 77
Performance 78
Screamingly fast performance  ~50K read/write operations per seconds.  ~100K read/write ops per second on a regular EC2 instance. 79
redis-benchmark tool on a Ubuntu virtual machine ~36K rps 80
Application Architecture 81
Application Server SQL Redis Server 82
83
Finally 84
«I see Redis definitely more as a flexible tool than as a solution specialized to solve a specific problem: his mixed soul of cache, store, and messaging server shows this very well» Salvatore Sanfilippo 85
 http://redis.io/  http://github.com/antirez/redis  http://groups.google.com/group/redis-db 86
Key-value databases in practice Redis @ DotNetToscana

Key-value databases in practice Redis @ DotNetToscana

  • 1.
    Matteo Baglini www.dotnettoscana.org Software Developer, Freelance matteo.baglini@gmail.com http://it.linkedin.com/in/matteobaglini http://github.cpom/bmatte
  • 2.
    «Advanced key-value store. It is often referred to as a data structure server» 2
  • 3.
    Key Value page:index <html><head>[...] user:123:session xDrSdEwd4dSlZkEkj+ user:123:avatar 77u/PD94bWwgdm+ Everything is a «blob» Commands, primarily, can GET and SET the values 3
  • 4.
    Key Value Type page:index <html><head>[...] String events:timeline { «Joe logged», «File X Uploaded», …} List logged:today { 1, 2, 3, 4, 5 } Set time => 10927353 user:123:profile Hash username => bmatte joe ~ 1.3483 game:leaderboard smith ~ 293.45 Sorted Set fred ~ 83.22 Different «data type/structure» Rich set of specialized commands 4
  • 5.
    Everything is stored in memory  Screamingly fast performance  Persistent via snapshot or append-only log file  Replication (only Master/Slave)  Extensible via embedded scripting engine (Lua)  Rich set of client libraries  High availability (In progress) ◦ Cluster (Fault tolerance, Multi-Node consistence) ◦ Sentinel (Monitoring, Notification, Automatic failover) 5
  • 6.
    Created by Salvatore Sanfilippo (@antirez)  First «public release» in March 2009.  Since 2010 sponsored by VMware. Initially written to improve performance of Web Analytics product LLOOGG out of his startup 6
  • 7.
    Written in ANSI C  No external dependencies  Single thread (asynchronous evented I/O)  Works on all POSIX-like system  Exist unofficial build for Windows  Open-source BSD licensed  Community (list, IRC & wiki) 7
  • 8.
    1. A DSL for Abstract Data Types. 2. Memory storage is #1. 3. Fundamental data structures for a fundamental API. 4. Code is like a poem. 5. We're against complexity. 6. Two levels of API. 7. We optimize for joy. 8
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Any blob willdo (A value can be at max 512MB) 18
  • 19.
    Operations on stringsholding an integer 19
  • 20.
  • 21.
    Sharing state across processes ◦ Distribute lock, Incremental ID, Time series, User session.  Web Analytics ◦ User visit (day, week, month), Feature Tracking.  Caching ◦ String values can hold arbitrary data.  Rate limiting ◦ Limit number of API calls/minute. 21
  • 22.
  • 23.
    Any item incan be made to expire after or at a certain time. 23
  • 24.
  • 25.
  • 26.
  • 27.
    Sequence of stringvalues (Max length is 232 - 1 elements) 27
  • 28.
  • 29.
  • 30.
    Events Store or Notification ◦ Logs, Social Network Timelines, Notifications.  Fixed Data ◦ Last N activity.  Message Passing ◦ Durable MQ, Job Queue.  Circular list 30
  • 31.
  • 32.
    Unordered set ofunique values 32
  • 33.
    Unordered set ofunique values (Max number of members is 232 – 1) 33
  • 34.
    You can dounions, intersections, differences of sets in very short time. 34
  • 35.
  • 36.
    Web Analytics ◦ Unique Page View, IP addresses visiting.  Relations ◦ Friends, Followers, Tags.  Caching Result ◦ Store result of expensive intersection of data. 36
  • 37.
  • 38.
    Ordered set ofunique values 38
  • 39.
  • 40.
  • 41.
  • 42.
    Web Analytics ◦ Online users, Most visited pages.  Leaderbord ◦ Show top N.  Order by data ◦ Maintain a set of ordered data like user by age. 42
  • 43.
  • 44.
    Key → Valuemap (as value) 44
  • 45.
    Set attributes (Store upto 232 - 1 field-value pairs) 45
  • 46.
  • 47.
  • 48.
    Storing Objects ◦ Hashes are maps between string fields and string values, so they are the perfect data type to represent objects. 48
  • 49.
  • 50.
    Dump data todisk after certain conditions are met 50
  • 51.
    Pro: ◦ RDB is a very compact single-file. ◦ RDB files are perfect for backups. ◦ RDB is very good for disaster recovery. ◦ RDB allows faster restarts with big datasets. ◦ RDB maximizes performances (backgr. I/O via fork(2)).  Contro: ◦ RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). ◦ Fork can be time consuming if the dataset is very big. 51
  • 52.
    Append all writeoperations to a log 52
  • 53.
    Durability depends onfsync(2) policy 53
  • 54.
    Pro: ◦ AOF is much more durable. ◦ AOF is an append only log, no seeks, nor corruption problems (for example after a power outage). ◦ AOF contains a log of all the operations one after the other in an easy to understand and parse format.  Contro: ◦ AOF files are usually bigger than the equivalent RDB. ◦ AOF can be slower then RDB depending on the exact fsync policy. 54
  • 55.
    Use both persistence methods if you want a degree of data safety comparable to what any RDBMS can provide you.  If you care a lot about your data, but still can live with a few minutes of data lose in case of disasters, you can simply use RDB alone.  There are many users using AOF alone, but we discourage it since to have an RDB snapshot from time to time is a great idea for doing database backups, for faster restarts. 55
  • 56.
  • 57.
    Rich set ofclients 57
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
    Classic scenario ◦ Multi atomic commands.  Optimistic locking ◦ Check and Set (CAS Pattern) write only if not changed. 64
  • 65.
  • 66.
  • 67.
    Subscribe multi channelsdecoupled from the key space 67
  • 68.
    Publish on somechannel 68
  • 69.
  • 70.
  • 71.
    Message Passing ◦ Distribute message-oriented system, Event- Driven Architecture, Service Bus. 71
  • 72.
  • 73.
  • 74.
    One master replicateto multiple slaves 74
  • 75.
    Slave send SYNCcommand and master transfers the database file to the slave 75
  • 76.
    Slaves can performonly read operation 76
  • 77.
    Scalability ◦ Multiple slaves for read-only queries.  Redundancy ◦ Data replication.  Slave of Slave ◦ Graph-like structure for more scalability e redundancy. 77
  • 78.
  • 79.
    Screamingly fast performance ~50K read/write operations per seconds.  ~100K read/write ops per second on a regular EC2 instance. 79
  • 80.
    redis-benchmark tool ona Ubuntu virtual machine ~36K rps 80
  • 81.
  • 82.
    Application Server SQL Redis Server 82
  • 83.
  • 84.
  • 85.
    «I see Redisdefinitely more as a flexible tool than as a solution specialized to solve a specific problem: his mixed soul of cache, store, and messaging server shows this very well» Salvatore Sanfilippo 85
  • 86.
    http://redis.io/  http://github.com/antirez/redis  http://groups.google.com/group/redis-db 86