The Z algorithm is a powerful string-matching algorithm used to find all occurrences of a pattern within a text. It operates efficiently, with a linear time complexity of O(n+m), where n is the length of the text and m is the length of the pattern. This makes it particularly useful for problems involving large texts. In this article, we'll explore the Z algorithm, understand its underlying concepts, and learn how to implement it in Python.
The Z algorithm computes an array, known as the Z-array, for a given string. The Z-array at position i stores the length of the longest substring starting from i that is also a prefix of the string. This information can then be used to efficiently search for a pattern within a text.
Z-array Definition:
Given a string S of length n, the Z-array Z is defined as follows: Z[i] is the length of the longest substring starting from S[i] which is also a prefix of S.
Example:
Consider the string S = "aabcaabxaaaz". The Z-array for S is calculated as follows:
- Z[0] = n (since the entire string is a prefix of itself)
- Z[1] = 1 (the substring starting at index 1 is "a", which is a prefix of length 1)
- Z[2] = 0 (the substring starting at index 2 is "b", which is not a prefix)
- Z[3] = 1 (the substring starting at index 3 is "c", which is a prefix of length 1)
- and so on.
- The Z-array for S would be [12, 1, 0, 1, 3, 1, 0, 0, 3, 0, 0, 1].
The Z Algorithm: Step-by-Step
Here's a detailed breakdown of how the Z algorithm works:
- Initialization:
- Start with the entire string S, and initialize the Z-array Z with zeroes.
- Set the variables L and R to 0. These variables will define a window in S where S[L:R+1] matches the prefix of S.
- Iterate through the string: For each position i in the string S:
- Case 1: If i > R, then there is no Z-box (a substring matching the prefix of S that starts before i and ends after i).
- Set L = R = i and extend the window R to the right as long as S[R] == S[R-L].
- Set Z[i] = R - L and decrement R.
- Case 2: If i ≤ R, then i falls within a Z-box. Use the previously computed Z-values to determine the value of Z[i]:
- Sub-case 2a: If Z[i-L] < R - i + 1, then Z[i] = Z[i-L].
- Sub-case 2b: If Z[i-L] ≥ R - i + 1, then set L = i and extend the window R as long as S[R] == S[R-L]. Set Z[i] = R - L and decrement R.
- Output the Z-array: After processing all positions in the string, the Z-array contains the lengths of the longest substrings starting from each position that match the prefix of S.
Implementing the Z Algorithm in Python:
To understand the Z algorithm better, let's break down the implementation step by step.
- calculate_z(s):
- This function computes the Z-array for a given string
s
. - The Z-array is an array where the value at each position
i
indicates the length of the longest substring starting from s[i]
which is also a prefix of s
.
- z_algorithm(pattern, text):
- This function uses the Z Algorithm to search for all occurrences of
pattern
in text
. - It concatenates the pattern, a unique delimiter (
$
), and the text to create a combined string. - It then computes the Z-array for the combined string and checks for positions in the Z-array where the Z-value equals the length of the pattern, indicating a match.
Below is the implementation of the above approach:
Python def calculate_z(s): n = len(s) # Length of the input string z = [0] * n # Initialize Z-array with zeros l, r, k = 0, 0, 0 # Initialize left and right boundary of Z-box for i in range(1, n): # Case 1: i is outside the current Z-box if i > r: l, r = i, i while r < n and s[r] == s[r - l]: r += 1 z[i] = r - l r -= 1 # Case 2: i is inside the current Z-box else: k = i - l # Case 2a: Value does not stretch outside the Z-box if z[k] < r - i + 1: z[i] = z[k] # Case 2b: Value stretches outside the Z-box else: # Case 2b: Value stretches outside the Z-box l = i while r < n and s[r] == s[r - l]: r += 1 z[i] = r - l r -= 1 return z def z_algorithm(pattern, text): # Concatenate pattern, delimiter, and text combined = pattern + "$" + text # Calculate Z-array for the combined string z = calculate_z(combined) # Length of the pattern pattern_length = len(pattern) # List to store the result indices result = [] for i in range(len(z)): # If Z-value equals pattern length, pattern is found if z[i] == pattern_length: # Append starting index to result result.append(i - pattern_length - 1) return result # Example usage: pattern = "abc" text = "ababcabc" result = z_algorithm(pattern, text) print("Pattern found at indices:", result) # Output should be [2, 5]
OutputPattern found at indices: [2, 5]
Time Complexity: O(n), where n is the length of the text. This is because the algorithm only needs to iterate through the text once to compute the Z array, and then it can use the Z array to find all occurrences of the pattern in the text.
Auxiliary Space: O(n), where n is the length of the text. This is because the algorithm needs to store the Z array, which has the same length as the text.
Similar Reads
DSA Tutorial - Learn Data Structures and Algorithms DSA (Data Structures and Algorithms) is the study of organizing data efficiently using data structures like arrays, stacks, and trees, paired with step-by-step procedures (or algorithms) to solve problems effectively. Data structures manage how data is stored and accessed, while algorithms focus on
7 min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Quick Sort QuickSort is a sorting algorithm based on the Divide and Conquer that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array. It works on the principle of divide and conquer, breaking down the problem into s
12 min read
Merge Sort - Data Structure and Algorithms Tutorials Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows the divide-and-conquer approach. It works by recursively dividing the input array into two halves, recursively sorting the two halves and finally merging them back together to obtain the sorted array. Merge
14 min read
Data Structures Tutorial Data structures are the fundamental building blocks of computer programming. They define how data is organized, stored, and manipulated within a program. Understanding data structures is very important for developing efficient and effective algorithms. What is Data Structure?A data structure is a st
2 min read
Bubble Sort Algorithm Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order. This algorithm is not suitable for large data sets as its average and worst-case time complexity are quite high.We sort the array using multiple passes. After the fir
8 min read
Breadth First Search or BFS for a Graph Given a undirected graph represented by an adjacency list adj, where each adj[i] represents the list of vertices connected to vertex i. Perform a Breadth First Search (BFS) traversal starting from vertex 0, visiting vertices from left to right according to the adjacency list, and return a list conta
15+ min read
Binary Search Algorithm - Iterative and Recursive Implementation Binary Search Algorithm is a searching algorithm used in a sorted array by repeatedly dividing the search interval in half. The idea of binary search is to use the information that the array is sorted and reduce the time complexity to O(log N). Binary Search AlgorithmConditions to apply Binary Searc
15 min read
Insertion Sort Algorithm Insertion sort is a simple sorting algorithm that works by iteratively inserting each element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting playing cards in your hands. You split the cards into two groups: the sorted cards and the unsorted cards. T
9 min read
Array Data Structure Guide In this article, we introduce array, implementation in different popular languages, its basic operations and commonly seen problems / interview questions. An array stores items (in case of C/C++ and Java Primitive Arrays) or their references (in case of Python, JS, Java Non-Primitive) at contiguous
4 min read