DATA STRUCTURES AND ALGORITHMS LAB 11 Bianca Tesila FILS, May 2014
OBJECTIVES  Dictionaries  Hash tables
DICTIONARIES: WHAT ARE THEY?  An ADT made of a collection of keys and a collection of values, in which each key has a value associated to it  A dictionary is also called associative array  Useful for searching
DICTIONARIES: OPTIMAL SEARCH  The keys must be unique  The range of the key must be severly bounded Otherwise… if the keys are not unique:  construct a set of m(keys count) lists and store the heads of these lists in the associative array(the keys)
DICTIONARIES: DUPLICATE KEYS  If we have a high number of duplicates (a lot of elements with the same key), the search time will severely increase  Solution: make a function to optimize the search criterion, h => solve collisions of keys  We will search for T[h(k)] rather than T[k] , where: T is our associative array, k is an index and h(k) is a mapping function
DICTIONARIES: IMPLEMENTATION  Hash-tables  Self-balancing binary search trees  Radix- tree  Prefix-tree  Judy arrays
DICTIONARIES: BASIC OPERATIONS  put(key, value)  Inserts the pair (key, value) in the hash table  If a pair (key, value’) (with the same key) already exists, then value’ is replaced by value  We say that the value value is associated to the key key  get(key)  Returns the value associated to the key key  If no value is associated to key, then an error occurs  hasKey(key)  Returns 1 if the key key exists in the hash table, and 0 otherwise
HASH-TABLES: INTRODUCTION  Data structure with an optimized lookup function (average search time is constant, O (1)).  How? By turning the key in a hash (code), using a hash function  The hash function must be wisely chosen in order to minimize the number of collisions (Risk: different values ​​produce the same hashes).  We cannot avoid all the collisions - they occur inherently as hash length is fixed, and storage objects can have arbitrary length and content.  In the event of a collision, the values ​​stored in the same position (the same bucket). In this case, the search is reduced to comparing the actual values ​​in the bucket.
HASH-TABLES: EXAMPLE
HASH TABLE: HASH FUNCTIONS  Deterministic: if called twice, they should return the same value  Low collision rate: buckets with small dimensions  Good dispersion between “buckets”
HASH TABLE: IMPLEMENTATION WITH LINKED LISTS  A hash implementation which solves the collisions is called direct chaining  For each bucket, we use a linked list: every list is associated to a key(hash-coded)  Inserting in hash table means finding the correct index(key) and adding the element to the list that corresponds to the found key  Deleting means searching and removing of that element from the list
HASH TABLE: ADVANTAGES AND DISADVANTAGES  Advantage: the delete operation is simple and the table resizing can be postponed a lot because (even when all positions of hash are used), performance is still good.  Disadvantage: for small amount of data, the overhead is quite large and “browsing” the data can be time consuming (the same disadvantage as in linked lists)
HASH TABLE: EXAMPLE • hmax is the maximum number of linked lists in our hash-table • the function hash will be passed as an argument (actually, a pointer to the function will be passed) • the key is not mandatory to be a number (think of a real dictionary!!!): that is why we use templates
HASH TABLE: ASSIGNMENT !!Exercise: Using the previous header, implement the hash tables data structure and test it, for a custom hash-function
HASH TABLE: ASSIGNMENT Hint:  Maintain an array H[HMAX] of linked lists  The info field of each element of a list consists of a struct containing a key and a value  Each key is mapped to a value hkey=hash(key), such that 0≤hkey≤HMAX-1  hash(key) is called the hash function and hkey is the index in a linked list  put(k, v)  Searches for the key k in the list H[hkey=hash(k)]  If the key is found, then we replace the value by v  If the key is not found, then we insert the pair (k,v) in H[hkey]  get(k)  Search for the key k in H[hkey=hash(k)]  If it finds the key, then it returns its associated value; otherwise, an error occurs  hasKey(k)  Search for the key k in H[hkey=hash(k)]  If it finds the key, then it returns 1; otherwise, it returns 0

Data structures and algorithms lab11

  • 1.
    DATA STRUCTURES ANDALGORITHMS LAB 11 Bianca Tesila FILS, May 2014
  • 2.
  • 3.
    DICTIONARIES: WHAT ARETHEY?  An ADT made of a collection of keys and a collection of values, in which each key has a value associated to it  A dictionary is also called associative array  Useful for searching
  • 4.
    DICTIONARIES: OPTIMAL SEARCH The keys must be unique  The range of the key must be severly bounded Otherwise… if the keys are not unique:  construct a set of m(keys count) lists and store the heads of these lists in the associative array(the keys)
  • 5.
    DICTIONARIES: DUPLICATE KEYS If we have a high number of duplicates (a lot of elements with the same key), the search time will severely increase  Solution: make a function to optimize the search criterion, h => solve collisions of keys  We will search for T[h(k)] rather than T[k] , where: T is our associative array, k is an index and h(k) is a mapping function
  • 6.
    DICTIONARIES: IMPLEMENTATION  Hash-tables Self-balancing binary search trees  Radix- tree  Prefix-tree  Judy arrays
  • 7.
    DICTIONARIES: BASIC OPERATIONS put(key, value)  Inserts the pair (key, value) in the hash table  If a pair (key, value’) (with the same key) already exists, then value’ is replaced by value  We say that the value value is associated to the key key  get(key)  Returns the value associated to the key key  If no value is associated to key, then an error occurs  hasKey(key)  Returns 1 if the key key exists in the hash table, and 0 otherwise
  • 8.
    HASH-TABLES: INTRODUCTION  Datastructure with an optimized lookup function (average search time is constant, O (1)).  How? By turning the key in a hash (code), using a hash function  The hash function must be wisely chosen in order to minimize the number of collisions (Risk: different values ​​produce the same hashes).  We cannot avoid all the collisions - they occur inherently as hash length is fixed, and storage objects can have arbitrary length and content.  In the event of a collision, the values ​​stored in the same position (the same bucket). In this case, the search is reduced to comparing the actual values ​​in the bucket.
  • 9.
  • 10.
    HASH TABLE: HASHFUNCTIONS  Deterministic: if called twice, they should return the same value  Low collision rate: buckets with small dimensions  Good dispersion between “buckets”
  • 11.
    HASH TABLE: IMPLEMENTATIONWITH LINKED LISTS  A hash implementation which solves the collisions is called direct chaining  For each bucket, we use a linked list: every list is associated to a key(hash-coded)  Inserting in hash table means finding the correct index(key) and adding the element to the list that corresponds to the found key  Deleting means searching and removing of that element from the list
  • 12.
    HASH TABLE: ADVANTAGESAND DISADVANTAGES  Advantage: the delete operation is simple and the table resizing can be postponed a lot because (even when all positions of hash are used), performance is still good.  Disadvantage: for small amount of data, the overhead is quite large and “browsing” the data can be time consuming (the same disadvantage as in linked lists)
  • 13.
    HASH TABLE: EXAMPLE •hmax is the maximum number of linked lists in our hash-table • the function hash will be passed as an argument (actually, a pointer to the function will be passed) • the key is not mandatory to be a number (think of a real dictionary!!!): that is why we use templates
  • 14.
    HASH TABLE: ASSIGNMENT !!Exercise:Using the previous header, implement the hash tables data structure and test it, for a custom hash-function
  • 15.
    HASH TABLE: ASSIGNMENT Hint: Maintain an array H[HMAX] of linked lists  The info field of each element of a list consists of a struct containing a key and a value  Each key is mapped to a value hkey=hash(key), such that 0≤hkey≤HMAX-1  hash(key) is called the hash function and hkey is the index in a linked list  put(k, v)  Searches for the key k in the list H[hkey=hash(k)]  If the key is found, then we replace the value by v  If the key is not found, then we insert the pair (k,v) in H[hkey]  get(k)  Search for the key k in H[hkey=hash(k)]  If it finds the key, then it returns its associated value; otherwise, an error occurs  hasKey(k)  Search for the key k in H[hkey=hash(k)]  If it finds the key, then it returns 1; otherwise, it returns 0