Lemmatization

Description

Lemmatization maps a word to its lemma (dictionary form). For instance, the word was is mapped to the word be.

Name	Annotator class name	Requirement	Generated Annotation	Description
lemma	MorphaAnnotator	TokensAnnotation, SentencesAnnotation, PartOfSpeechAnnotation	LemmaAnnotation	Determine lemmas for each token.

Lemmatization From The Command Line

This command will find lemmas for the input text:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos,lemma -file input.txt 

Other output formats include conllu, conll, json, and serialized.

Lemmatization From Java

package edu.stanford.nlp.examples;  import edu.stanford.nlp.ling.*; import edu.stanford.nlp.pipeline.*;  import java.util.*;   public class LemmatizingExample {    public static String text = "Marie was born in Paris.";    public static void main(String[] args) {     // set up pipeline properties     Properties props = new Properties();     // set the list of annotators to run     props.setProperty("annotators", "tokenize,pos,lemma");     // build pipeline     StanfordCoreNLP pipeline = new StanfordCoreNLP(props);     // create a document object     CoreDocument document = pipeline.processToCoreDocument(text);     // display tokens     for (CoreLabel tok : document.tokens()) {       System.out.println(String.format("%s\t%s", tok.word(), tok.lemma()));     }   }  } 

This demo code will print out the lemmas for each token:

Marie	Marie was	be born	bear in	in Paris	Paris .	.

Lemmatization

Table of contents

Description

Lemmatization From The Command Line

Lemmatization From Java