This project collects a number of core libraries for Natural Language Processing (NLP) developed by Cognitive Computation Group.
Depending on what you are after, follow one of the items:
- If you want to annotate your raw text (i.e. no need to open the annotator boxes to retrain them) you should look into the pipeline.
- If you want to train and test an NLP annotator (i.e. you want to open an annotator box), see the list of components below and choose the desired one.
- If you want to read a corpus you should look into the corpus-readers module.
- If you want to do feature-extraction you should look into edison module.
Each library contains detailed readme and instructions on how to use it. In addition the javadoc of the whole project is available here.
| Module | Description |
|---|---|
| nlp-pipeline | Provides an end-to-end NLP processing application that runs a variety of NLP tools on input text. |
| core-utilities | Provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc. |
| corpusreaders | Provides classes to read documents from corpora into core-utilities data structures. |
| curator | Supports use of CogComp NLP Curator, a tool to run NLP applications as services. |
| edison | A library for feature extraction from core-utilities data structures. |
| lemmatizer | An application that uses WordNet and simple rules to find the root forms of words in plain text. |
| tokenizer | An application that identifies sentence and word boundaries in plain text. |
| transliteration | An application that transliterates names between different scripts. |
| pos | An application that identifies the part of speech (e.g. verb + tense, noun + number) of each word in plain text. |
| ner | An application that identifies named entities in plain text according to two different sets of categories. |
| md | An application that identifies entity mentions in plain text. |
| relation-extraction | An application that identifies entity mentions, then identify relation pairs among the mentions detected. |
| quantifier | This tool detects mentions of quantities in the text, as well as normalizes it to a standard form. |
| inference | A suite of unified wrappers to a set optimization libraries, as well as some basic approximate solvers. |
| depparse | An application that identifies the dependency parse tree of a sentence. |
| verbsense | This system addresses the verb sense disambiguation (VSD) problem for English. |
| prepsrl | An application that identifies semantic relations expressed by prepositions and develops statistical learning models for predicting the relations. |
| commasrl | This software extracts relations that commas participate in. |
| similarity | This software compare objects --especially Strings-- and return a score indicating how similar they are. |
| temporal-normalizer | A temporal extractor and normalizer. |
| dataless-classifier | Classifies text into a user-specified label hierarchy from just the textual label descriptions |
| external-annotators | A collection useful external annotators. |
- Questions? Have a look at our FAQs.
To include one of the modules in your Maven project, add the following snippet with the #modulename# and #version entries replaced with the relevant module name and the version listed in this project's pom.xml file. Note that you also add to need the <repository> element for the CogComp maven repository in the <repositories> element.
<dependencies> ... <dependency> <groupId>edu.illinois.cs.cogcomp</groupId> <artifactId>#modulename#</artifactId> <version>#version#</version> </dependency> ... </dependencies> ... <repositories> <repository> <id>CogCompSoftware</id> <name>CogCompSoftware</name> <url>http://cogcomp.org/m2repo/</url> </repository> </repositories>If you are using the framework, please cite our paper:
@inproceedings{2018_lrec_cogcompnlp, author = {Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth}, title = {CogCompNLP: Your Swiss Army Knife for NLP}, booktitle = {11th Language Resources and Evaluation Conference}, year = {2018}, url = "http://cogcomp.org/papers/2018_lrec_cogcompnlp.pdf", }