Lemmatization
Table of contents
Description
Lemmatization maps a word to its lemma (dictionary form). For instance, the word was is mapped to the word be.
| Name | Annotator class name | Requirement | Generated Annotation | Description |
|---|---|---|---|---|
| lemma | MorphaAnnotator | TokensAnnotation, SentencesAnnotation, PartOfSpeechAnnotation | LemmaAnnotation | Determine lemmas for each token. |
Lemmatization From The Command Line
This command will find lemmas for the input text:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos,lemma -file input.txt Other output formats include conllu, conll, json, and serialized.
Lemmatization From Java
package edu.stanford.nlp.examples; import edu.stanford.nlp.ling.*; import edu.stanford.nlp.pipeline.*; import java.util.*; public class LemmatizingExample { public static String text = "Marie was born in Paris."; public static void main(String[] args) { // set up pipeline properties Properties props = new Properties(); // set the list of annotators to run props.setProperty("annotators", "tokenize,pos,lemma"); // build pipeline StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // create a document object CoreDocument document = pipeline.processToCoreDocument(text); // display tokens for (CoreLabel tok : document.tokens()) { System.out.println(String.format("%s\t%s", tok.word(), tok.lemma())); } } } This demo code will print out the lemmas for each token:
Marie Marie was be born bear in in Paris Paris . .