Advanced Algorithm Design
String Matching •String Matching is the process of finding occurrences of a pattern (substring) within a text (main string). •Used in text processing, search engines, and bioinformatics. •Key Problem: Find all positions in the text where the pattern occurs.
String Matching •We formalize the string-matching problem as follows. We assume that the text is an array T[1.. n] of length n and that the pattern is an array P[ 1..m] of length m≤ n. •We further assume that the elements of P and T are characters drawn from a finite alphabet . For example, we may have = {0,1} or = ,a, b, …, z-. The character arrays P and T are often called strings of characters.
String Matching we say that pattern P occurs with shift s in text T (or, equivalently, that pattern P occurs beginning at position s + 1 in text T ) if 0 ≤ s ≤ n-m and T*s+1… s +m+ = P*1…m+ (that is, if T[s+j] = P[j], for 1 ≤ j ≤ m). If P occurs with shift s in T , then we call s a valid shift; otherwise, we call s an invalid shift. The string matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T
Naïve String Matching •A simple approach to string matching. •Compares the pattern with the text character by character at all possible positions. •Works by sliding the pattern one position at a time over the text. •No preprocessing required.
The naive string-matching algorithm
Example Text: "AABAACAADAABAABA" Pattern: "AABA“ Step-by-Step Process: •Compare "AABA" with the first four characters of the text: "AABA". (Match found at index 0) •Slide the pattern one position to compare "AABA" with "ABAAC". (No match) •Slide again and repeat comparisons until the end of the text. Output: Pattern found at indices: 0, 9, 12
All Matches
Rabin karp Algorithm Like the Naive Algorithm, the Rabin-Karp algorithm also check every substring. But unlike the Naive algorithm, the Rabin Karp algorithm matches the hash value of the pattern with the hash value of the current substring of text, and if the hash values match then only it starts matching individual characters. So Rabin Karp algorithm needs to calculate hash values for the following strings.
Time complexity RABIN-KARP-MATCHER takes Ө(m) preprocessing time, and its matching time is Ө(n-m+1)m) in the worst case, since (like the naive string-matching algorithm) the Rabin-Karp algorithm expl In many applications, we expect few valid shifts— perhaps some constant c of them. In such applications, the expected matching time of the algorithm is only • O((n-m+1)+ cm) = O(n + m)=O(n) icitly verifies every valid shift.
Problem: Find occurrences of the pattern "abc" in the text "abcxabcdabxabcd". Initialize parameters: Pattern: P="abc“ Text: T="abcxabcdabxabcd“ Pattern length: m=3 Text length: n=15 Prime number for hashing: q=101 Radix: d=256
Example 2: Multiple Pattern Occurrences Problem: Find all occurrences of the pattern "101" in the text "10111010101". Initialize parameters: Pattern: P="101“ Text: T="10111010101" Pattern length: m=3 Text length: n=11 Prime number for hashing: q=13 Radix: d=2 (binary strings).
String Matching Algorithms: Naive, KMP, Rabin-Karp
String Matching Algorithms: Naive, KMP, Rabin-Karp
String Matching Algorithms: Naive, KMP, Rabin-Karp

String Matching Algorithms: Naive, KMP, Rabin-Karp

  • 1.
  • 2.
    String Matching •String Matchingis the process of finding occurrences of a pattern (substring) within a text (main string). •Used in text processing, search engines, and bioinformatics. •Key Problem: Find all positions in the text where the pattern occurs.
  • 3.
    String Matching •We formalizethe string-matching problem as follows. We assume that the text is an array T[1.. n] of length n and that the pattern is an array P[ 1..m] of length m≤ n. •We further assume that the elements of P and T are characters drawn from a finite alphabet . For example, we may have = {0,1} or = ,a, b, …, z-. The character arrays P and T are often called strings of characters.
  • 4.
    String Matching we saythat pattern P occurs with shift s in text T (or, equivalently, that pattern P occurs beginning at position s + 1 in text T ) if 0 ≤ s ≤ n-m and T*s+1… s +m+ = P*1…m+ (that is, if T[s+j] = P[j], for 1 ≤ j ≤ m). If P occurs with shift s in T , then we call s a valid shift; otherwise, we call s an invalid shift. The string matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T
  • 5.
    Naïve String Matching •Asimple approach to string matching. •Compares the pattern with the text character by character at all possible positions. •Works by sliding the pattern one position at a time over the text. •No preprocessing required.
  • 6.
  • 7.
    Example Text: "AABAACAADAABAABA" Pattern: "AABA“ Step-by-StepProcess: •Compare "AABA" with the first four characters of the text: "AABA". (Match found at index 0) •Slide the pattern one position to compare "AABA" with "ABAAC". (No match) •Slide again and repeat comparisons until the end of the text. Output: Pattern found at indices: 0, 9, 12
  • 8.
  • 10.
    Rabin karp Algorithm Likethe Naive Algorithm, the Rabin-Karp algorithm also check every substring. But unlike the Naive algorithm, the Rabin Karp algorithm matches the hash value of the pattern with the hash value of the current substring of text, and if the hash values match then only it starts matching individual characters. So Rabin Karp algorithm needs to calculate hash values for the following strings.
  • 14.
    Time complexity RABIN-KARP-MATCHER takesӨ(m) preprocessing time, and its matching time is Ө(n-m+1)m) in the worst case, since (like the naive string-matching algorithm) the Rabin-Karp algorithm expl In many applications, we expect few valid shifts— perhaps some constant c of them. In such applications, the expected matching time of the algorithm is only • O((n-m+1)+ cm) = O(n + m)=O(n) icitly verifies every valid shift.
  • 15.
    Problem: Find occurrences ofthe pattern "abc" in the text "abcxabcdabxabcd". Initialize parameters: Pattern: P="abc“ Text: T="abcxabcdabxabcd“ Pattern length: m=3 Text length: n=15 Prime number for hashing: q=101 Radix: d=256
  • 17.
    Example 2: MultiplePattern Occurrences Problem: Find all occurrences of the pattern "101" in the text "10111010101". Initialize parameters: Pattern: P="101“ Text: T="10111010101" Pattern length: m=3 Text length: n=11 Prime number for hashing: q=13 Radix: d=2 (binary strings).