DEV Community

Cover image for Spell Checker-Predicting Correct Word-NLP-Part 1
datatoinfinity
datatoinfinity

Posted on

Spell Checker-Predicting Correct Word-NLP-Part 1

Till now what we have done.

  1. Imported Corpus.
  2. Tokenize the text to word.
  3. Made list of tokenize word
  4. Operation on text

From that corpus we have made a dictionary of words or we can say a list of words where can check if the word exists or not, is the spelling is correct or not.

 def edit(word): return set(insert(word)+delete(word)+swap(word)+replace(word)) 
  1. Calling your four functions:
    • insert(word)
    • delete(word)
    • swap(word)
    • replace(word)

Each one returns a list of new words created by:

  • inserting one character
  • deleting one character
  • swapping two adjacent characters
  • replacing one character
  1. Combining them:
  • insert(...) + delete(...) + ... creates a big list of all variations
  1. Converting to a set:
* Removes duplicates * Gives you unique words that are one change away 
Enter fullscreen mode Exit fullscreen mode

Example: edit("lve")

  • insert("lve") might return: 'alve', 'blve', ..., 'lave', 'love', ..., 'lvez' (104 total)

  • delete("lve") might return: 've', 'le', 'lv', ...

  • swap("lve") might return: 'vle', 'lev'

  • replace("lve") might return: 'ave', 'bve', ..., 'lve', 'lze' (130 total)

  • Then the combined set(...) removes overlaps.

Function:

  • Simulates all typos a human might make by one small mistake.
  • Generates all possible "fixes" for a misspelled word.
  • You then compare these "fixes" with your real dictionary (word_probability) to find the best suggestion.

Top comments (0)