Till now what we have done.
- Imported Corpus.
- Tokenize the text to word.
- Made list of tokenize word
- Operation on text
From that corpus we have made a dictionary of words or we can say a list of words where can check if the word exists or not, is the spelling is correct or not.
def edit(word): return set(insert(word)+delete(word)+swap(word)+replace(word))
- Calling your four functions:
- insert(word)
- delete(word)
- swap(word)
- replace(word)
Each one returns a list of new words created by:
- inserting one character
- deleting one character
- swapping two adjacent characters
- replacing one character
- Combining them:
- insert(...) + delete(...) + ... creates a big list of all variations
- Converting to a set:
* Removes duplicates * Gives you unique words that are one change away
Example: edit("lve")
insert("lve")
might return:'alve'
,'blve'
, ...,'lave'
,'love'
, ...,'lvez'
(104 total)delete("lve")
might return:'ve'
,'le'
,'lv'
, ...swap("lve")
might return:'vle'
,'lev'
replace("lve")
might return:'ave'
,'bve'
, ...,'lve'
,'lze'
(130 total)Then the combined
set(...)
removes overlaps.
Function:
- Simulates all typos a human might make by one small mistake.
- Generates all possible "fixes" for a misspelled word.
- You then compare these "fixes" with your real dictionary (word_probability) to find the best suggestion.
Top comments (0)