Posted on Jul 2

Spell Checker-Predicting Correct Word-NLP-Part 2

 def spell_checker(word,count=5): output=[] suggested_words=edit(word) for wrd in suggested_words: if wrd in word_probability.keys(): output.append([wrd,word_probability[wrd]]) return list(pd.DataFrame(output,columns=['word','prob']).sort_values(by='prob',ascending=False).head(count)['word'].values)

Let's break it down step by step.

 def spell_checker(word,count=5):

Defines a function called spell_checker.
word is the misspelled word you want to correct.
count=5 is the number of top suggestions you want to return (default = 5).

 output=[]

Initializes an empty list to store valid suggested words with their probabilities.

 suggested_words=edit(word)

Calls the edit() function which is defined earlier.

 def edit(word): return set(insert(word) + delete(word) + swap(word) + replace(word))

This returns a set of all words that are one edit away from the input word.
Examples: For "lve" → ['love', 'live', 'lave', ...]

 for wrd in suggested_words: if wrd in word_probability.keys(): output.append([wrd, word_probability[wrd]])

What happens here:

Loops through each wrd in the list of suggested words.
Checks: Is wrd a real word?
- If yes (i.e., it's in word_probability, which comes from your big.txt dictionary),
Then it appends a pair [wrd, probability] to the output list.

Example:

If 'love' is in the corpus and has probability 0.0042:

 Output: [['love', 0.0042], ['live', 0.0021], ...]

 return list(pd.DataFrame(output, columns=['word', 'prob']).sort_values(by='prob', ascending=False).head(count)['word'].values)

pd.DataFrame(output, columns=['word', 'prob'])

Converts the list of [word, prob] pairs into a pandas DataFrame:

 word prob 0 love 0.0042 1 live 0.0021

.sort_values(by='prob', ascending=False)

Sorts the DataFrame so the most frequent (most likely correct) words come first.

.head(count)
- Selects the top count words (default = 5)
['word'].values and list(...)

* Extracts just the `"word"` column as a list.

 spell_checker('famili')

If the top edits (like family, familiar, fail, etc.) exist in the corpus and are frequent, you might get:

 ['family', 'familiar', 'fail', 'facility', 'famine']

DEV Community

Spell Checker-Predicting Correct Word-NLP-Part 2

Top comments (0)