String Hashing using the Polynomial Rolling Hash Function

17 Mar 2025 | 4 min read

Introduction:

String matching calculations have essentially affected the field of software engineering, assuming a fundamental part in tackling reasonable issues across different spaces. Their proficiency is especially clear in undertakings that include looking for a particular string inside another. String matching methods find applications in different regions for example Database schema design and Network systems. These calculations add to optimizing the presentation of tasks, demonstrating their versatility and relevance in addressing real-world challenges. Problem with string matching:

Time Complexity: Given any 2 strings s1 and s2 of equal length (let's say n). The time complexity to compare (s1==s2) the two strings is O(n).

Hash Function:

A hash function serves as a tool that transforms data of varying sizes into consistent, fixed-size values. These resulting values are commonly referred to as hash values. The primary purpose of a hash function is to generate a unique identifier for a given set of data, providing a concise representation regardless of the original data's size or complexity.

Solution using Hashing:

Time Complexity: Given 2 Strings s1 and s2 of equal length (Let's say n). Now time complexity to compare (s1==s2) the 2 strings is O(1)(ideal case) using hash comparison.

String Hashing:

String -> Hash function -> Hash value/Key

The hash function above mentioned will take the string as its input and produce a unique value known as the hash value or key.

String Hashing using the Polynomial Rolling Hash Function

Example:

Let's consider we have given strings s1, s2, and s3 as our input to the hash value and thus generated values 109469, 236853, and 945739 respectively.

Now to compare the strings instead of comparing them directly (which will take O(max([s1],[s2]))) we simply compare their hash values which is O(1).

Important points:

The same strings must have the same hash value.
Same hash values mean strings may be the same.

Two unique strings might have a similar hash value. At the point when two unique strings have a similar hash value, it is known as collision.

Polynomial Rolling Hash Function:

We want to compare strings efficiently. The idea is simple, convert strings into integers (hash value) and compare them.

To convert them into integers, we will use polynomial rolling hash as a hash function. The hash value of similar strings ought to be similar.

The polynomial moving hash function is a hash capability that utilizes just increases and increments. The following is the function.

Here p >= size of the character set.

P is any prime number.

For instance, hash ("abc") = 1+2.5¹+3.5²=90

In this a is mapped to 1, b is mapped to 2 and so on and we could see p = 5 which is a prime number.

Why should we use Modulo?

Since the hash function is polynomial, so hash values increase exponentially

Integer: 10 characters

Long Long int: 20 characters

As p: 11

Why p should be greater than |character set|?

It ought to be more than the length of the charset to decrease collisions. In the event that we take lesser values, there are more chances for collisions.

Simple code implementation for polynomial rolling hash function:

Output:

Collisions in Polynomial Rolling Hash Function and its resolution:

The Hash function, which outputs an integer in the range [0, m), can lead to collisions, where different strings produce the same hash value. For instance, when using p = 37 and m = 10^9 + 9, the strings "answers" and "stead" result in the same hash value. Achieving a perfect one-to-one mapping is challenging within the given range of [0, m).

While a larger m reduces the chances of collisions, it also slows down the algorithm. Practical constraints, such as integer size limits in languages like C, C++, and Java, restrict the increase of m beyond certain limits.To mitigate collision probabilities, a strategy involves generating a pair of hashes for a given string using different parameter pairs (p, m). This approach doesn't eliminate collisions entirely but significantly reduces their probability.

Conclusion:

Hash String technique, employing the Polynomial Rolling Hash Function, transforms strings into integers for efficient comparisons. This function relies on multiplications and additions for simplicity and effectiveness. The choice of prime numbers as parameters has a significant impact on hash values. The modulo operation is crucial for maintaining the exponential growth in hash values. When used with well-chosen parameters and collision-resolving strategies, the Polynomial Rolling Hash Function enhances the efficiency and reliability of string-matching algorithms, making them valuable tools in various computational applications.

Next TopicThe Great Tree-List Recursion Problem

← prev next →

String Hashing using the Polynomial Rolling Hash Function

Introduction:

Hash Function:

String Hashing:

Polynomial Rolling Hash Function:

Why should we use Modulo?

Why p should be greater than |character set|?

Collisions in Polynomial Rolling Hash Function and its resolution:

Conclusion:

Contact info

Follow us

Tutorials

Interview Questions

Online Compiler

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

DS Tutorial

DS Array

DS Linked List

DS Stack

DS Queue

DS Tree

DS Graph

DS Searching

DS Sorting

Hashing & Heap

Differences

Binary Tree

Binary Search Tree

AVL Tree

Singly Linked List

Doubly Linked List

Circular Linked List

Circular Doubly List

DS MCQ

Misc

String Hashing using the Polynomial Rolling Hash Function

Introduction:

Hash Function:

String Hashing:

Polynomial Rolling Hash Function:

Why should we use Modulo?

Why p should be greater than |character set|?

Collisions in Polynomial Rolling Hash Function and its resolution:

Conclusion:

Related Posts

AVL Tree Advantages

Handshaking Lemma and Interesting Tree Properties -DSA

Find the number of colored nodes according to given queries in a full Binary Tree

Splay Tree

Connect n ropes with minimum cost

Merge Without Extra Space problem in DSA

Introduction to Monotonic Stacks

Top Data Structures That Every Programmer Must Know

Partition to K Equal Sum Subsets

Garbage Collection in DS

Subscribe to Tpoint Tech

Contact info

Follow us

Tutorials

Interview Questions

Online Compiler