Skip to content

The project analyzes the performance of pattern-matching algorithms on pseudo-randomly generated texts of 500,000 characters. Tests were conducted using alphabets of sizes 2 and 70, with pattern lengths varying from 4 to 50.

License

Notifications You must be signed in to change notification settings

unixisking/pattern-matching-algorithms

Repository files navigation

Table of Contents

  1. Overview
  2. Features
  3. Project Structure
  4. String Matching Algorithms
  5. Usage

Trie Data Structure

This repository contains an implementation of a Trie (prefix tree) in C, a data structure used to store and search strings efficiently, especially useful in applications like autocomplete and spell-check. Additionally, the repository includes implementations of various pattern matching algorithms.

Features

  • Insert words into the Trie.
  • Search words to check their existence in the Trie.
  • Add Prefixes of a word
  • Add Suffixes of a word
  • Add Factors of a word
Trie createTrie(int maxNode); void insertInTrie(Trie trie, unsigned char *w); int isInTrie(Trie trie, unsigned char *w); void insertPrefixes(Trie trie, unsigned char *w); void insertSuffixes(Trie trie, unsigned char *w); void insertFactors(Trie trie, unsigned char *w);

Project Structure

Transition table implementation

  • trie-tm.h: Header file containing Trie structure definitions and function prototypes.
  • trie-tm.c: Source file with function implementations for the Trie operations.
  • main_tm.c: Example usage by adding the word the prefixes, suffixes, and the factors of "abaababa"
struct _trie { int maxNode; /* maximum number of trie nodes */ int nextNode; /* Index of next available node */ int **transitions; /* transition matrix */ char *finite; /* finite states */ }; typedef struct _trie *Trie;

String Matching Algorithms

The repository includes implementations of several string matching algorithms:

  • Brute Force Algorithms: Simple and direct methods to find matches in a string.
  • Morris-Pratt (MP): Efficient searching algorithm using a prefix function.
  • Knuth-Morris-Pratt (KMP): Extension of MP that optimizes prefix matching.
  • Boyer-Moore: A highly efficient algorithm leveraging bad character and good suffix heuristics.
/**  *  * Brute force algorithm, no fast loop & no sentinelle.  */ int bruteFS1(const char *text, int n, const char *pattern, int m); /**  *  * Brute force algorithm, with fast loop & no sentinelle.  */ int bruteFS2(const char *text, int n, const char *pattern, int m); /**  *  * Brute force algorithm, with fast loop & sentinelle.  */ int bruteFS3(char *text, int n, const char *pattern, int m); /**  * Brute force algorithm, with strncmp, no fast loop  */ int searchWithHelperFunc1(const char *text, int n, const char *pattern, int m); /**  * Brute force algorithm, with strncmp, with fast loop  */ int searchWithHelperFunc2(const char *text, int n, const char *pattern, int m); /**  * Brute force algorithm, with strncmp, with fast loop & sentinelle  */ int searchWithHelperFunc3(char *text, int n, const char *pattern, int m); /**  * Implements the Morris-Pratt algorithm for pattern matching.  */ void mp(const char *text, int n, const char *pattern, int m); /**  * Implements the Knuth-Morris-Pratt algorithm for efficient pattern matching.  */ void kmp(const char *text, int n, const char *pattern, int m); /**  * Implements the Boyer-Moore algorithm for advanced string searching.  */ void BM(const char *text, int n, const char *pattern, int m); /**  * Implements the Horspool algorithm, a simplified variant of Boyer-Moore.  */ void horspool(const char *text, int n, const char *pattern, int m); /**  * Implements a quick search algorithm for approximate string matching.  */ int quick_search(const char *text, const char *pattern);

Usage

And to run the transition table implementation :

make trie_tm && ./trie_tm

Hashtable based implementation

  • trie-ht.h: Header file containing Trie structure definitions and function prototypes.
  • trie-ht.c: Source file with function implementations for the Trie operations.
  • main_ht.c: Example usage by adding the word the prefixes, suffixes, and the factors of "abaababa"
struct _list { int startNode, targetNode; unsigned char letter; struct _list *next; }; typedef struct _list *List; struct _trie { int maxNode, /* Max amount of trie nodes */ nextNode; /* Index of next available node */ List **transitions; /* Adjacency list */ char *finite; /* Finite states */ }; typedef struct _trie *Trie;

Run the code

To run the hash table implementation :

make trie_ht && ./trie_ht

To clean the project from executables :

make clean

About

The project analyzes the performance of pattern-matching algorithms on pseudo-randomly generated texts of 500,000 characters. Tests were conducted using alphabets of sizes 2 and 70, with pattern lengths varying from 4 to 50.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published