Python - Lemmatization Approaches with Examples

Last Updated : 23 Jul, 2025

Lemmatization is the process of reducing words to their base or dictionary form (lemma). Unlike stemming which simply cut off word endings, it uses a full vocabulary and linguistic rules to ensure accurate word reduction. For example:

meeting → meet
was → be
mice → mouse

Lets explore several popular python libraries for performing lemmatization,

1. WordNet

WordNet is a large lexical database of the English language and one of the earliest methods for lemmatization in Python. It groups words into sets of synonyms (synsets) which are related to each other. The WordNet is part of the NLTK (Natural Language Toolkit) library and it is widely used for text preprocessing tasks.

For installation run the following command:

!pip install nltk

Lets see an example,

Python

import nltk from nltk.stem import WordNetLemmatizer nltk.download('wordnet') lemmatizer = WordNetLemmatizer() word = "meeting" lemma = lemmatizer.lemmatize(word, pos='v') print(f"Lemmatized Word: {lemma}")

Output:

meet

2. WordNet with POS Tagging

By default, WordNet Lemmatizer assumes words to be nouns. For more accurate lemmatization, especially for verbs and adjectives, Part of Speech (POS) tagging is required. POS tagging tells the lemmatizer whether the word is a noun, verb or adjective. Lets see an example to understand better,

Python

from nltk import pos_tag from nltk.tokenize import word_tokenize from nltk.corpus import wordnet as wn from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() sentence = "The dogs are running" tokens = word_tokenize(sentence) tagged = pos_tag(tokens) lemmatized_words = [lemmatizer.lemmatize( word, pos='v' if tag.startswith('V') else 'n') for word, tag in tagged] print(lemmatized_words)

Output:

['The', 'dog', 'be', 'run']

3. TextBlob

TextBlob is a simpler library built on top of NLTK and Pattern. It provides a convenient API to perform common NLP tasks like lemmatization. TextBlob’s lemmatization is easy to use and requires minimal setup.

For installation run the following command:

!pip install textblob

Lets see an example,

Python

from textblob import Word word = Word("running") print(word.lemmatize("v"))

Output:

run

4. TextBlob with POS Tagging

Using POS tagging with TextBlob ensures that words are lemmatized accurately. By default, TextBlob treats every word as a noun, so for verbs and adjectives, POS tagging can significantly improve lemmatization accuracy. Lets see an example for this,

Python

from textblob import TextBlob sentence = "The dogs barking" blob = TextBlob(sentence) lemmatized_words = [word.lemmatize('v') if tag.startswith( 'VB') else word for word, tag in blob.tags] print(f"Lemmatized Sentence: {' '.join(lemmatized_words)}")

Output:

Lemmatized Sentence: The dogs bark

5. SpaCy

spaCy is one of the most powerful NLP libraries in Python, known for its speed and ease of use. It provides pre-trained models for tokenization, lemmatization, POS tagging and more. spaCy's lemmatization is highly accurate and works well with complex sentence structures.

For installation run the following command:

pip install spacy
python -m spacy download en_core_web_sm

Lets see an example,

Python

import spacy nlp = spacy.load('en_core_web_sm') doc = nlp("The cats are sitting") for token in doc: print(token.text, token.lemma_)

Output:

The the
cats cat
are be
sitting sit

6. Gensim

Gensim is widely used for topic modeling, document similarity and lemmatization tasks in large text corpora. Its lemmatization relies on the Pattern library and focuses on processing tokens like nouns, verbs, adjectives and adverbs. It is suitable for large-scale text processing.

Installation:

!pip install gensim nltk

Lets see an example,

Python

import nltk from nltk.stem import WordNetLemmatizer from gensim.utils import simple_preprocess nltk.download('wordnet') nltk.download('omw-1.4') lemmatizer = WordNetLemmatizer() text = "The cats are running and the dogs were barking." tokens = simple_preprocess(text) lemmatized_tokens = [lemmatizer.lemmatize(word) for word in tokens] print("Original Tokens:", tokens) print("Lemmatized Tokens:", lemmatized_tokens)

Output:

Original Tokens: ['the', 'cats', 'are', 'running', 'and', 'the', 'dogs', 'were', 'barking']
Lemmatized Tokens: ['the', 'cat', 'are', 'runn', 'and', 'the', 'dog', 'were', 'bark']

With all these techniques we can easily do Lemmatization in Python and can make real world projects.

prakharr0y

Article Tags :

Python - Lemmatization Approaches with Examples

1. WordNet

2. WordNet with POS Tagging

3. TextBlob

4. TextBlob with POS Tagging

5. SpaCy

6. Gensim

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

My Profile

Thank You!

What kind of Experience do you want to share?