Posted on Jul 2

Spell Checker-Predicting Correct Word-NLP-Part 1

Till now what we have done.

Imported Corpus.
Tokenize the text to word.
Made list of tokenize word
Operation on text

From that corpus we have made a dictionary of words or we can say a list of words where can check if the word exists or not, is the spelling is correct or not.

 def edit(word): return set(insert(word)+delete(word)+swap(word)+replace(word))

Calling your four functions:
- insert(word)
- delete(word)
- swap(word)
- replace(word)

Each one returns a list of new words created by:

inserting one character
deleting one character
swapping two adjacent characters
replacing one character

Combining them:

insert(...) + delete(...) + ... creates a big list of all variations

Converting to a set:

* Removes duplicates * Gives you unique words that are one change away

Example: edit("lve")

insert("lve") might return: 'alve', 'blve', ..., 'lave', 'love', ..., 'lvez' (104 total)
delete("lve") might return: 've', 'le', 'lv', ...
swap("lve") might return: 'vle', 'lev'
replace("lve") might return: 'ave', 'bve', ..., 'lve', 'lze' (130 total)
Then the combined set(...) removes overlaps.

Function:

Simulates all typos a human might make by one small mistake.
Generates all possible "fixes" for a misspelled word.
You then compare these "fixes" with your real dictionary (word_probability) to find the best suggestion.

DEV Community

Spell Checker-Predicting Correct Word-NLP-Part 1

Top comments (0)