Link

Lemmatization

Table of contents


Description

Lemmatization maps a word to its lemma (dictionary form). For instance, the word was is mapped to the word be.

NameAnnotator class nameRequirementGenerated AnnotationDescription
lemmaMorphaAnnotatorTokensAnnotation, SentencesAnnotation, PartOfSpeechAnnotationLemmaAnnotationDetermine lemmas for each token.

Lemmatization From The Command Line

This command will find lemmas for the input text:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos,lemma -file input.txt 

Other output formats include conllu, conll, json, and serialized.

Lemmatization From Java

package edu.stanford.nlp.examples;  import edu.stanford.nlp.ling.*; import edu.stanford.nlp.pipeline.*;  import java.util.*;   public class LemmatizingExample {    public static String text = "Marie was born in Paris.";    public static void main(String[] args) {     // set up pipeline properties     Properties props = new Properties();     // set the list of annotators to run     props.setProperty("annotators", "tokenize,pos,lemma");     // build pipeline     StanfordCoreNLP pipeline = new StanfordCoreNLP(props);     // create a document object     CoreDocument document = pipeline.processToCoreDocument(text);     // display tokens     for (CoreLabel tok : document.tokens()) {       System.out.println(String.format("%s\t%s", tok.word(), tok.lemma()));     }   }  } 

This demo code will print out the lemmas for each token:

Marie	Marie was	be born	bear in	in Paris	Paris .	. 

Copyright © 2020 Stanford NLP Group.