Posted on Jul 1 • Edited on Jul 3

LeetCode 705: Design HashSet

#java #leetcode #datastructures #algorithms

In this post, I break down how to implement a memory-efficient HashSet from scratch in Java using separate chaining, dynamic resizing, and amortized analysis, without relying on built-in data structures.

Problem Breakdown

In this leetcode problem, we are asked to implement a class that simulates the functionality of a HashSet without using any built-in HashSet data structure.

We are required to implement the following methods:
- void add(key) - Inserts the value key into the HashSet.
- bool contains(key) - Returns whether the value key exists in the HashSet or not.
- void remove(key) - Removes the value key from the HashSet. If key does not exist, do nothing.

At first glance, this may seem straightforward, especially if you have used HashSet in any language before. But the real challenge lies in achieving:
- O(1) average time complexity for all operations (add, contains, remove).
- Efficient memory usage, especially when we only store a small subset of values within the large keyspace that can go up to 1,000,000.

I saw many solutions essentially just initialising a boolean array of size 1,000,001 and toggle the entries based on the key. This approach works, but it is extremely wasteful in terms of space, particularly if only a few keys are used.

Hence we get to the crux of our problem: How can we minimize memory usage while still maintaining fast performance?

Lets dive into a proper design that balances time efficiency and space optimization using a real-world inspired data structure: a dynamic array of linked lists (i.e., buckets).

Solution Breakdown

Overview

I will try to explain the solution step by step with small code snippets along the way. The full code is found down below.

First, to design an efficient HashSet from scratch, we draw inspiration from how real-world hash tables work under the hood. The core idea is to:
- Use an array of buckets, where each bucket is implemented as a singly linked list.
- Store key values as Node instances within these buckets.
- Dynamically resize the hash table as more keys are added based on a load factor threshold.
- Maintain average O(1) time complexity for add, remove, and contains, while using memory more efficiently than a flat 1-million-element array.

But why use this more complicated approach? Well, simpler solutions do exist (e.g., initializing a boolean array of size 1,000,001), however, they are extremely memory-inefficient in scenarios where few keys are inserted.

Our approach aims to offer a balance of time efficiency and space optimization, using a hash-based strategy with separate chaining.

Data Structure Design

This is the given class we must implement:

class MyHashSet { public MyHashSet() { } public void add(int key) { } public void remove(int key) { } public boolean contains(int key) { } }

We begin by defining a static Node class to represent elements stored in each bucket
```
static class Node { int val; Node next; Node(int val) { this.val = val; } } 
```
- Each Node stores a key (val) and a reference to the next node in the linked list (next).
- Buckets are simply linked lists of these nodes.

The MyHashSet class maintains:
```
 private Node[] hashTable; private int size; public MyHashSet() { this.hashTable = new Node[16]; } 
```
- hashTable is the array of buckets (initially of size 16).
- size tracks the total number of keys currently inserted.

Hashing and Indexing Logic

We define a helper method to compute a stable hash value and another one to derive the target index within the hash table:
```
private int hashKey(int key) { return Math.abs(Integer.hashCode(key)); } private int getIndex(int key) { return hashKey(key) % hashTable.length; } 
```
- Integer.hashCode(key) returns the hash of the integer key (in this case, the key itself).
- Taking the modulo of the hash with the array length ensures the key maps to a valid index.
- We use Math.abs to avoid negative indices.

Load Factor and Resizing

To avoid performance degradation due to collisions, we use the load factor to determine when to resize the hash table.
```
 private double getLoadFactor() { return (double) size / hashTable.length; // resize occurs at >= 0.75 } 
```
- When the load factor reaches 0.75, we double the capacity of the hash table.
- This reduces the likelihood of collisions and keeps average operation time close to O(1).
- For example, if we our hash table can hold 16 elements and it currently has 12 elements stored, this gives us a load factor of exactly 0.75.
- At this point we would call the resize() function, which will now be discussed.

The resize() method handles this reallocation:
```
 private void resize() { int newSize = hashTable.length * 2; Node[] newHashTable = new Node[newSize]; // new map will be twice the size for (Node head : hashTable) { while (head != null) { Node next = head.next; int newIndex = hashKey(head.val) % newSize; head.next = newHashTable[newIndex]; // in case of a collision, we insert node at the head newHashTable[newIndex] = head; head = next; } } hashTable = newHashTable; // re-assign old hash table to new hash table } 
```
- This function creates a new hash table in memory, which is twice the size of the old hash table.
- It then loops through each linked list (Node) in the old hash table.
- For each bucket/linked list, we traverse each node using a while loop.
- The first thing we do inside the while loop is store a reference to the next node in the linked list so that we don’t lose it.
- We then compute the index of the current node in the new hash table and store this value in newIndex.
- We then insert the current node to the head of the linked list found at newHashTable[newIndex].
  - I was initially confused by this part, but we basically do this because of possible collisions that may also occur during the copying phase.
  - Meaning that newHashTable[newIndex] could already have one or more elements stored there, and so if we just do newHashTable[newIndex] = head then we would overwrite that entire linked list possibly stored at newHashTable[newIndex].
- For this reason, we first do head.next = newHashTable[newIndex] so that the current node head is pointing to the beginning of that list.
- After that we can safely do newHashTable[newIndex] = head so that a reference to head is stored at newHashTable[newIndex].
- After that, we then move to the next element of the linked list by doing head = next. See why we stored a reference to the next node earlier?
- After the entire old hash table is copied over to the new one, we make sure that hashTable points to newHashTable.

Method Implementation

Now that we understood the helper functions to be used, lets start with the add() method:
```
 public void add(int key) { if (getLoadFactor() >= 0.75) { resize(); // copy the map over to a new one } // check if node already exists in map if (contains(key)) return; // get key's index int index = getIndex(key); // insert new node at beginning of bucket at index Node newNode = new Node(key); newNode.next = hashTable[index]; hashTable[index] = newNode; size++; } 
```
- Before adding the key, we first check if we have reached the maximum threshold for the load factor. If we did, then we call resize() to create a new hash table that is double the current hash table in size and copy all the elements over (as we have seen above).
- This step is important because if we just keep adding elements to our hash table while it remains fixed in size, the number of collisions will keep increasing until it is virtually guaranteed that any new element to be inserted will map to a hash value that already exists (i.e. collision).
- Not resizing will cause our other two methods remove() and contains(), to degrade to a time complexity of O(k) instead of O(1), where k is the number of elements in a given bucket in our hash table.
- The add() method, on the other hand, is guaranteed to always run in O(1) time even if we do not resize the hash table. Adding the call to resize() makes the method occasionally run in O(n) time, but calls to resize() decrease exponentially as the hash table grows, making the add() method run in amortized O(1) time.
  - Amortization is a term we use whenever there is a fixed cost within our system, and we reduce the frequency at which this cost is incurred.
  - In our example here, we decrease the frequency at which the hash table is resized over time, thus amortizing the cost of the add() function to run in O(1) time.
- After we have checked for the load factor, we then check if the key already exists in the hash table by calling the contains() method (discussed below). For now, just know that this method also has amortized O(1) time complexity as the size of the hash table grows, but can run in O(k) time if there are collisions.
- If the key does not exist in the hash table, we get the index of the bucket/list it will be inserted in.
- We then create a new Node instance storing the key, and insert this node at the head of the linked list found at hashTable[index].

Now lets go on to the discuss the remove() method:
```
public void remove(int key) { int index = getIndex(key); if (hashTable[index] == null) return; // key does not exist Node cur = hashTable[index]; // key is the first in the bucket if (cur.val == key) { hashTable[index] = cur.next; size--; return; } // find prev to the key that needs to be removed while (cur != null && cur.next != null) { if (cur.next.val == key) { cur.next = cur.next.next; size--; return; } cur = cur.next; } // key does not exist } 
```
- The first thing we do is we get the index of key and check if an element exists. If not, that means key does not exist in our hash table.
- However, if there exists a Node at hashTable[index], then that means there is a possibility that key does exist. We are still not 100% sure due to the possibility of collisions.
- Before traversing the list, we check if key is stored at the head. If it is, we remove it from the hash table by overwriting the value stored at hashTable[index] with cur.next, which is the second element in the list. We also decrement the size of our hash table and then exit the function.
- If the key is not stored at the head of the list, we must traverse the list to find the node previous to the key.
- We are looking for the node previous to the one that stores key so that we can remove any reference to it by changing the next pointer of that previous node to point to the node after key.
  - This effectively unlinks the key node from the list.
- The garbage collector will then automatically remove any reference to key from memory since it is no longer being referenced anywhere in our program.
- If we traverse the entire list and we do not find the node storing key, then that means it does not exist.

Finally, we can now look at contains(), our last and final method:
```
 public boolean contains(int key) { int index = getIndex(key); Node cur = hashTable[index]; while (cur != null) { if (cur.val == key) return true; cur = cur.next; } return false; } 
```
- This method simply checks if a key exists in our hash table or not.
- It does that by finding the index that the key would map to.
- It then traverses the list found at hashTable[index] and checks the val property of each node with key.
- We return true if we find the node with a value equal to key; otherwise, we will return false after exiting the loop.

Full Code

class MyHashSet { static class Node { int val; Node next; Node(int val) { this.val = val; } } private Node[] hashTable; private int size; public MyHashSet() { this.hashTable = new Node[16]; } /** * Worst-case time complexity: O(k), where k is the number of nodes in a bucket (due to collisions). * Amortized time complexity: O(1) for add/remove/contains when hashTable are well-distributed. */ public void add(int key) { if (getLoadFactor() >= 0.75) { resize(); // copy the map over to a new one } // check if node already exists in map if (contains(key)) return; // get key's index int index = getIndex(key); // insert new node at beginning of bucket at index Node newNode = new Node(key); newNode.next = hashTable[index]; hashTable[index] = newNode; size++; } public void remove(int key) { int index = getIndex(key); if (hashTable[index] == null) return; // key does not exist Node cur = hashTable[index]; // key is the first in the bucket if (cur.val == key) { hashTable[index] = cur.next; size--; return; } // find prev to the key that needs to be removed while (cur != null && cur.next != null) { if (cur.next.val == key) { cur.next = cur.next.next; size--; return; } cur = cur.next; } // key does not exist } public boolean contains(int key) { int index = getIndex(key); Node cur = hashTable[index]; while (cur != null) { if (cur.val == key) return true; cur = cur.next; } return false; } private int getIndex(int key) { return hashKey(key) % hashTable.length; } private int hashKey(int key) { return Math.abs(Integer.hashCode(key)); } private double getLoadFactor() { return (double) size / hashTable.length; // resize occurs at >= 0.75 } private void resize() { int newSize = hashTable.length * 2; Node[] newHashTable = new Node[newSize]; // new map will be twice the size for (Node head : hashTable) { while (head != null) { Node next = head.next; int newIndex = hashKey(head.val) % newSize; head.next = newHashTable[newIndex]; // in case of a collision, we insert node at the head newHashTable[newIndex] = head; head = next; } } hashTable = newHashTable; // re-assign old hash table to new hash table } }

Complexity Analysis

The hash set uses separate chaining to handle collisions, storing elements in linked lists within each bucket. Operations like add, remove, and contains are generally O(1) on average when keys are well-distributed.

In the worst case, when all keys hash to the same bucket, operations degrade to O(k), where k is the number of elements in that bucket as we have already mentioned before.

The hash table resizes (doubles in capacity) when the load factor reaches 0.75, which ensures low collision probability.

Resizing is O(n) but occurs infrequently, so its cost is amortized over multiple operations.

Space complexity accounts for both the hash table array (m buckets) and all inserted elements (n keys).

Time Complexity Table

Operation	Average Case	Worst Case
`add()`	O(1) (amortized)	O(k)
`remove()`	O(1) (amortized)	O(k)
`contains()`	O(1) (amortized)	O(k)

Space Complexity Table

Component	Space Used	Explanation
Hash Table Array	O(m)	`m` is the number of buckets (doubles on resize)
Stored Elements	O(n)	Each inserted key is stored as a node in a linked list
Total Space	O(n + m)	Accounts for both table size and number of elements

Possible Improvements

In production-grade hash tables like Java's HashMap, buckets that exceed a certain threshold (typically 8 nodes) are converted from linked lists into balanced trees (e.g., red-black trees) to improve worst-case lookup time from O(k) to O(log k), where k is the total number of elements in the bucket.

Memory usage can be further optimized by shrinking the hash table when the load factor drops below a lower threshold (commonly 0.25), which helps avoid wasted space after many deletions.

Advanced hash functions or open addressing (e.g., linear probing, quadratic probing) could be explored as alternatives to separate chaining, though they come with their own trade-offs.

If you found this breakdown helpful or took a different approach to designing HashSet, I’d love to hear your thoughts!
Feel free to drop a comment, leave a ❤️, or share how you'd optimize it further.

Follow me for more deep dives into LeetCode design problems and backend engineering concepts.

DEV Community