Ever tried finding a word in a huge chunk of text and thought,
There has to be a faster way than checking every single character!๐ฉ
Enter the Rabin-Karp Algorithm โ a smart and efficient way to search for patterns in strings using a clever trick called rolling hash ๐ช
Letโs break it down step by step โ no jargon, no confusion, just clarity and fun. ๐ง โจ
๐ค The Problem
Youโre given:
Text โ "geeksforgeeks"
Pattern โ "geek"
And your task is to find all positions where the pattern appears in the text.
โ
Expected Output:
[1, 9] // Using 1-based indexing (because humans count from 1 ๐
)
๐ง The Big Idea (Why Rabin-Karp Is Smart?)
Instead of comparing every letter of the pattern with every possible substring of the text, what if we could:
1.Convert the pattern into a number (hash)
2.Convert parts of the text into numbers too
3.Compare those numbers instead of characters
โ Much faster than checking each letter!
If the hashes match, we double-check the actual characters โ just to be sure (because different strings can sometimes have the same hash โ that's called a collision).
๐ข What's a Hash Anyway?
Think of a hash as a stringโs fingerprint:
Same strings โ same hash
Different strings โ usually different hashes
Rabin-Karp uses a rolling hash, which is just a clever way of updating the hash as we move through the text, without recalculating everything from scratch.
๐Rolling Hash โ The Star of the Show๐
Letโs say weโve already calculated the hash for "geek" โ now we want the hash for the next window: "eeks".
Instead of this:
Recalculate hash("eeks") from scratch ๐ฉ
We do this:
t_new = (d * (t_old - old_char * h) + new_char) % q
Where:
-t_old is the previous windowโs hash
-old_char is the character sliding out of the window
-new_char is the character sliding into the window
-d is the number of characters in the alphabet (e.g., 256 for extended -ASCII)
-h = d^(M-1) % q is used to remove the effect of the oldest character
-q is a large prime number used to reduce hash collisions
We move the window and get a new hash in constant time! ๐ฅ
๐ฅWhy Rabin-Karp is Awesome
โ
Fast search using hashing
โ
Great for multiple pattern searches
โ
Average time complexity: O(N + M)
โ
Used in:
1.Plagiarism detection ๐งโ๐
2.Big text search engines
3.DNA sequence matching ๐งฌ
4.Intrusion detection systems ๐
๐งพ Dry Run: Letโs See It in Action
๐ธ Input:
Text: "geeksforgeeks"
Pattern: "geek"
Step 1: Initialize Variables
M = 4 // length of pattern
N = 13 // length of text
d = 256 // characters in the alphabet
q = 101 // prime number for hashing
Step 2: Compute Hash Multiplier
h = pow(d, M-1) % q = (256^3) % 101 = 88
Step 3: Calculate Initial Hashes
p = hash("geek") % q = 27
t = hash("geek") % q = 27
Step 4: Slide the Window and Compare
Iteration 1: "geek" โ Match found โ
at index 1
Iteration 2: "eeks" โ Hash โ 27 โ skip โ
Iteration 3: "eksf" โ Hash โ 27 โ skip โ
...
Iteration 9: "geek" โ Match found โ
at index 9
๐งโ๐ปFinal Result:
[1, 9]
๐จโ๐ป C++ Code: Rabin-Karp Algorithm
#include <iostream> #include <vector> using namespace std; // Your Rabin-Karp search function vector<int> search(string pattern, string text) { vector<int> result; int d = 256; // Total characters (ASCII) int q = 101; // Prime number to avoid collision int M = pattern.length(); int N = text.length(); int p = 0; // hash for pattern int t = 0; // hash for text window int h = 1; // Step 1: Calculate hash multiplier h = pow(d, M-1) % q for (int i = 0; i < M - 1; i++) h = (h * d) % q; // Step 2: Initial hash values for (int i = 0; i < M; i++) { p = (d * p + pattern[i]) % q; t = (d * t + text[i]) % q; } // Step 3: Slide pattern over text for (int i = 0; i <= N - M; i++) { // If hash values match, check characters one by one if (p == t) { bool match = true; for (int j = 0; j < M; j++) { if (text[i + j] != pattern[j]) { match = false; break; } } if (match) result.push_back(i + 1); // 1-based indexing } // Step 4: Update hash using rolling hash if (i < N - M) { t = (d * (t - text[i] * h) + text[i + M]) % q; if (t < 0) t = t + q; // Make sure hash stays positive } } return result; } // Main Function int main() { string text = "ababcabcabababd"; string pattern = "ababd"; vector<int> positions = search(pattern, text); if (!positions.empty()) { cout << "Pattern found at position(s): "; for (int pos : positions) cout << pos << " "; cout << endl; } else { cout << "Pattern not found in the text." << endl; } return 0; }
๐ง Key Takeaways
Rabin-Karp is a hash-based string searching algorithm.
It uses a rolling hash to speed up window comparisons.
Perfect for situations where you need to find multiple patterns efficiently.
Top comments (1)
Nice explanation!