Consistence Hashing

Youtube (This is very basic implementation). Check Ketama for real implementation.

Assume we have 3 memcached servers and want to add a fourth.
The continuum approach will invalidate 1/4 or 25% of your keys.
The modulo approach will invalidate 3/4 or 75% of your keys.

Article 1 (Consistent Hashing in memcache-client) – Gives a very clean and concise way why to use consistence hashing and not the naive approach.
Minimum number of keys need to rebalance when a node is added or is removed from the cluster. We need consistence hashing to try to minimize the re-hashing/invalidation when a new server/machine is added or a machine goes down.

Read this Article (Dzone) for more clarity.
A real time library used to implement consistence hashing is Ketama
Read about Flickr Architecture using memcache which uses Ketama and consistence hashing.

consistent-hashing

This is the basic principle of Ketama and how consistence hashing is applied in real world.

Ketama solves this problem in the following way:

Take your list of servers (eg: 1.2.3.4:11211, 5.6.7.8:11211, 9.8.7.6:11211)
Hash each server string to several (100-200) unsigned ints
Conceptually, these numbers are placed on a circle called the continuum. (imagine a clock face that goes from 0 to 2^32)
Each number links to the server it was hashed from, so servers appear at several points on the continuum, by each of the numbers they hashed to.
To map a key->server, hash your key to a single unsigned int, and find the next biggest number on the continuum. The server linked to that number is the correct server for that key.
If you hash your key to a value near 2^32 and there are no points on the continuum greater than your hash, return the first server in the continuum.

Random Points –

How to avoid hotspot in the consistence hashing/ hashing – When you determine the location in the cluster based solely on the hash of the key, chances are much higher that two keys lexicographically close to each other end up on different nodes. Thus, the load is shared more evenly.
How to sync-up data across multiple nodes – Like in the upper article Flickr where they use. They use a Redis queue.
Whenever a key is updated we need to invalidate it’s copies in other memcache server. So whenever a key is updated an event is pushed to Redis Queue. This queue is processed and all the other key in other server are invalidated.
Consistence hashing makes it simple to store redundant data – we can easily find the node to store the key. Then we use the next one/two machine to store the same key(for backup). Since the machines are in arrange in a virtual circular fashion this works.

Consistence Hashing

One thought on “Consistence Hashing”

Leave a comment Cancel reply

Share this:

Related

One thought on “Consistence Hashing”

Leave a comment Cancel reply