Hashing Algorithms and its applications Introduction to data sorting and Key searching
What is Hash table? • A faster way to access data keeping table • Also known as bucket • It can be an array of objects (that has all the data) • Requires a function to convert the real data into placing position deciding data • Can use part of the data or time/space stamp to decide the position • The position is termed as HASH
Randomness of Data • Data can be of defined pattern and length for which we can have static length of hash table entries • Each hash table entry place is a position that is access by the key • The key is calculated by hash function • If we know the data pattern, we can easily solve all the key calculations and also know the number of rows in the has table • If the data is random, we don’t know the number of entries in the table
Hashing Function • Calculates the key for address assignment • Key = position Hasher (Type entry) • position StoringPosition = HashTable[Key] • Here the Hasher is “to add the ASCII values of Each letter and then take modulo 11” • Folding: • A method that divides data into chunks for processing in a Hasher
Normal allocation • The data is processed using has function • The hash that is created by the function is searched in the table • The data is placed there
Collision • The Hasher gives a key that is already occupied • Normally, the old value is discarded and the new incoming data is stored • It is probable when the table space is lesser than the incoming data • Even if table data space is greater than the incoming data, this problem is probable to occur
Solution to Collision • Open Addressing • Same level address is used but on a different location which is calculated by the user defined rule • Linear Probing (the entry is saved in the next free space) - It can create clustering • Plus 3 rehash (the key is are-calculated by adding 3 to the previous entry value) • Quadratic Probing (the entry is saved by leaving 1, 2, 4, 16 spaces on the HT) • Double Hashing (Finding the hash of the key obtained) • PreviousHashKey = Hasher(entry) • NewHash = Hasher(PreviousHashKey) • Closed Addressing • Creating chains
Data Searching and Retrieval • The data that we want to search uses the same Hash Function • After the Hasher calculates the Key, it is given directly to the table Variable and that position has the data • Search is usually O(n) for linear search and O(log(n)) for binary search but this method makes the search operation reduced to O(1) • O(1) means that we calculated the key in a fixed time process and the we go to the Array[key] to retrieve data using fixed time again • In case of chaining, we traverse on the node to reach our data so that makes it an O(n), still faster than complete search strategy.
Linear Probing
Chaining using Linked List
A good Hash Function! • Minimum Collisions • Uniform distribution of hash values/keys (avoid clustering) • Easy to calculate Hash • Resolve the collision swiftly
Summary • Used to index large data sets • Address of each key is calculated by the data itself • Collision resolved with open or closed addressing • Hashing is widely used in database indexing, compilers, caching, password authentication, Blockchain, etc. • Insert, Delete and retrieve occurs on a fixed time (mostly)

Hashing algorithms and its uses

  • 1.
    Hashing Algorithms andits applications Introduction to data sorting and Key searching
  • 2.
    What is Hashtable? • A faster way to access data keeping table • Also known as bucket • It can be an array of objects (that has all the data) • Requires a function to convert the real data into placing position deciding data • Can use part of the data or time/space stamp to decide the position • The position is termed as HASH
  • 3.
    Randomness of Data •Data can be of defined pattern and length for which we can have static length of hash table entries • Each hash table entry place is a position that is access by the key • The key is calculated by hash function • If we know the data pattern, we can easily solve all the key calculations and also know the number of rows in the has table • If the data is random, we don’t know the number of entries in the table
  • 4.
    Hashing Function • Calculatesthe key for address assignment • Key = position Hasher (Type entry) • position StoringPosition = HashTable[Key] • Here the Hasher is “to add the ASCII values of Each letter and then take modulo 11” • Folding: • A method that divides data into chunks for processing in a Hasher
  • 5.
    Normal allocation • Thedata is processed using has function • The hash that is created by the function is searched in the table • The data is placed there
  • 7.
    Collision • The Hashergives a key that is already occupied • Normally, the old value is discarded and the new incoming data is stored • It is probable when the table space is lesser than the incoming data • Even if table data space is greater than the incoming data, this problem is probable to occur
  • 8.
    Solution to Collision •Open Addressing • Same level address is used but on a different location which is calculated by the user defined rule • Linear Probing (the entry is saved in the next free space) - It can create clustering • Plus 3 rehash (the key is are-calculated by adding 3 to the previous entry value) • Quadratic Probing (the entry is saved by leaving 1, 2, 4, 16 spaces on the HT) • Double Hashing (Finding the hash of the key obtained) • PreviousHashKey = Hasher(entry) • NewHash = Hasher(PreviousHashKey) • Closed Addressing • Creating chains
  • 9.
    Data Searching andRetrieval • The data that we want to search uses the same Hash Function • After the Hasher calculates the Key, it is given directly to the table Variable and that position has the data • Search is usually O(n) for linear search and O(log(n)) for binary search but this method makes the search operation reduced to O(1) • O(1) means that we calculated the key in a fixed time process and the we go to the Array[key] to retrieve data using fixed time again • In case of chaining, we traverse on the node to reach our data so that makes it an O(n), still faster than complete search strategy.
  • 10.
  • 11.
  • 12.
    A good HashFunction! • Minimum Collisions • Uniform distribution of hash values/keys (avoid clustering) • Easy to calculate Hash • Resolve the collision swiftly
  • 13.
    Summary • Used toindex large data sets • Address of each key is calculated by the data itself • Collision resolved with open or closed addressing • Hashing is widely used in database indexing, compilers, caching, password authentication, Blockchain, etc. • Insert, Delete and retrieve occurs on a fixed time (mostly)