0% found this document useful (0 votes)
147 views4 pages

CS20A - Assignment 5

This document provides instructions for an assignment to compare the performance of different hash functions. Students are asked to: 1. Calculate the number of collisions for three given hash functions (djb2, sdbm, loselose) when hashing over 350,000 words from a dictionary file. 2. Create their own original hash function that takes an unsigned character string as input and returns an unsigned long. Test it for collisions on the dictionary to evaluate its performance.

Uploaded by

Aditi Datta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views4 pages

CS20A - Assignment 5

This document provides instructions for an assignment to compare the performance of different hash functions. Students are asked to: 1. Calculate the number of collisions for three given hash functions (djb2, sdbm, loselose) when hashing over 350,000 words from a dictionary file. 2. Create their own original hash function that takes an unsigned character string as input and returns an unsigned long. Test it for collisions on the dictionary to evaluate its performance.

Uploaded by

Aditi Datta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Santa Monica College

CS 20A Summer 2015


Assignment #5
1. Comparing Hash Functions
A good hash function distributes keys uniformly across its output range. The simplest way to compare
hashing algorithms is to compare the number of collisions resulting when hashing a set of keys.
In part 1 of this assignment, you are given three well-known hash functions. You are also given a
dictionary with over 350,000 words.
For each hash function, compute the number of collisions occurring when calculating hash codes for
each word in the dictionary. Print out a list showing each colliding hash-key and the words that collided.
// http://www.cse.yorku.ca/~oz/hash.html
// First reported by Dan Berstein in the news group comp.lang.c.
unsigned long djb2(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
// Created for sdbm (a public-domain reimplementation of ndbm) database library.
// It was found to do well in scrambling bits causing better distribution of
// the keys and fewer splits.
unsigned long sdbm(unsigned char *str)
{
unsigned long hash = 0;
int c;
while (c = *str++)
hash = c + (hash << 6) + (hash << 16) - hash;
return hash;
}
// First appeared in K&R (1st edition). Terrible hashing algorithm because it
// produces many collisions.
unsigned long loselose(unsigned char *str)
{
unsigned long hash = 0;
int c;
while (c = *str++)
hash += c;
return hash;
}

Santa Monica College


CS 20A Summer 2015
Assignment #5
Here is some code that opens the dictionary, reads each word, calculates a hash key, and then closes a
dictionary. To calculate the total number of collisions, you must keep a running count of the number of
collisions for each hash key. After reading the entire dictionary, display the total collision count; then
display one row for each colliding hash key and the words that collided.
#include
#include
#include
#include
#include
#include

<iostream>
<stdlib.h>
<string>
<cstring>
<sstream>
<fstream>

// http://www.cse.yorku.ca/~oz/hash.html
using namespace std;
unsigned long djb2(unsigned char *str);
unsigned long sdbm(unsigned char *str);
unsigned long loselose(unsigned char *str);
int main()
{
const int MAX_CHARS_PER_LINE = 512;
ifstream fin;
char buf[MAX_CHARS_PER_LINE];
fin.open("dictionary.txt");
if (!fin.good())
{
cerr << "Error Opening File" << endl;
return 1;
}
while (!fin.eof())
{
fin.getline(buf, MAX_CHARS_PER_LINE);
unsigned long hkey = djb2((unsigned char *)buf);
string word = (char *)buf;
cout << word << " " << hkey << endl;
}
fin.close();
return 0;
}

Santa Monica College


CS 20A Summer 2015
Assignment #5
Sample output:
djb2 - Collisions = 59
5863248: bi d'
5863413: gi i'
5863611: mi o'
121541278: joyful synaphea
153103337: broadened kilohm
193486210: ais c's
193487299: bis d's
193488388: cis e's
193489477: dis f's
193492729: gid i'd
193492738: gim i'm
193492744: gis i's
193493833: his j's
193498189: lis n's
193499278: mis o's
193500367: nis p's
193502545: pis r's
193505812: sis u's
193506901: tis v's
193509079: vis x's
193510168: wis y's
193511257: xis z's
226706708: mensis menu's
253411110: arris art's
253986102: baris bat's
256495158: delis den's
256861062: doris dot's
266348430: loris lot's
267026877: manis map's
267031233: maris mat's
267168447: melis men's
268702848: nobis nod's
270582462: palis pan's
270869958: pilis pin's
271301202: pulis pun's
272953215: rakis ram's
272956482: ranis rap's
874715923: heliotropes neurospora
919705060: animate animate
1006249872: appling bedaggle

Santa Monica College


CS 20A Summer 2015
Assignment #5
2. Create and test your own hash function. Please be original. (5 points; prizes for the top three hash
functions.)
Your hash function must have the following signature:
unsigned long hf(unsigned char *str);

Test your hash function for collisions using the dictionary provided in part 1 of this assignment.

You might also like