- Notifications
You must be signed in to change notification settings - Fork 144
WIP: Adding SRL #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
WIP: Adding SRL #215
Changes from all commits
53c4ee1 d31108b ddef851 7e0add5 e94d688 794bd04 6eeded1 9bee145 93352eb 80fd99e e766eb4 09216e6 0bd5517 4a15867 80a30f0 b011ba6 da09783 39b1a5e 527ac81 e9a68aa f322db1 400d6fa 8a088e3 39993e1 dd07d30 8dbadf3 3722c3e 2afb706 2386ec7 e744814 760b95f b8d34db 75db17e e46916b f881cff ae4e3ce 7e2c3e7 ae89da4 9c03a57 ee4fa4b a645b1b a12aaf6 c79f40f d1ed497 d693602 550d228 5ea7da9 c60c57b 803bb24 4c98921 7f4d3b6 091cfde 99e7875 1260b53 6bfb3fa 79de23f 6827469 2923071 122767c e191bdf 298fa23 a17d815 5c1306b 1db66f9 6890b3d 9053fbb a8e3f30 4a3f0e1 db4f56d d94ab6b 8c5b106 b518ebf 4d9070a 48e310d a2c9f40 d9c1eec 9a58721 c07c1a1 fbd5602 62fe36c a31d438 1e0b6f6 5bca95d 0459b97 e2ce6a5 52cc455 025e7df 82040ec 75de089 5c6c99c 396de0d 5722539 2a04ebc 976f288 d444a13 3998c95 15cd5fd 13e2746 691aac5 ae71b69 1161b78 d6f68c2 b016cc6 4a60b49 662afdd 2a98e89 357fd39 f5b0df0 d23369b a008507 1f6beab 2ab580c 458b844 6813089 de4dab6 901998f 2653065 960e5c1 90f920e 8907a86 f196fe4 fd97201 f292cde 084d0ba 7c40a69 24ba53e 8252cb9 File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| Version 3.0.73 | ||
| Moved to the super-project and changed the versioning to the super-project versioning | ||
| | ||
| Version 5.1.12 | ||
| Added Windows support (including access to non-Gurobi solver) | ||
| | ||
| Version 5.1.4 | ||
| Switched entirely to illinois-sl for structured prediction (removed JLIS traces) | ||
| Using the latest AnnotatorService from illinois-core-utilities for both Curator & pipeline annotation | ||
| Major cleaning up | ||
| | ||
| Version 5.1 | ||
| Added JUnit tests | ||
| Removed unnecessary dependencies | ||
| Switched to illinois-nlp-pipeline-0.1.2 | ||
| Minor fixes | ||
| | ||
| Version 5.0 | ||
| Standalone SRL using illinois-nlp-pipeline | ||
| | ||
| Version 4.1.1 | ||
| Switched to edison-0.7.1 and LBJava-1.0 | ||
| Added dependency to illinois-common-resources | ||
| | ||
| Version 4.1 | ||
| Various bugfixes | ||
| | ||
| Version 4.0.2 | ||
| Updated inference dependency to latest version and modified inference | ||
| code accordingly. | ||
| | ||
| Version 4.0.1 | ||
| Removed duplicate code from JLIS-core and moved to IllinoisSL. Minor edits. | ||
| | ||
| Version 4.0 | ||
| A complete rewrite of the SRL. Includes predicate and sense detectors, | ||
| new constraints and a memory footprint of only 3GB. | ||
| | ||
| Version 3.0.3 | ||
| Minor bugfixes. Uses edison v0.2.9 | ||
| | ||
| Version 3.0.2 | ||
| Added an option to trim leading prepositions from arguments. | ||
| | ||
| Revamped the training mechanism to train using LBJ's BatchTrainer in | ||
| the code. This allows manual lexicon handling, which reduces the | ||
| memory requirements by nearly 40 percent. | ||
| | ||
| Version 3.0.1 | ||
| Minor bugfix | ||
| | ||
| Version 3.0 | ||
| A complete Java based re-implementation of the Illinois SRL from | ||
| Punyakanok 2008. This version uses LBJ to train classifiers and | ||
| for performing inference with a home-brewed beam search. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # illinois-srl: Semantic Role Labeler | ||
| | ||
| ### Running | ||
| You can use the **illinois-srl** system in either *interactive* or *annotator* mode. | ||
| #### Interactive mode | ||
| In *interactive mode* the user can input a single piece of text and get back the feedback from both | ||
| the **Nom**inal or **Verb**al SRL systems in plain text. | ||
| | ||
| To run the system in *interactive mode* see the class `edu.illinois.cs.cogcomp.srl.SemanticRoleLabeler` | ||
| or simply execute the `run-interactive` script: | ||
| | ||
| For linux: | ||
| ``` | ||
| scripts/run-interactive.sh | ||
| ``` | ||
| | ||
| For windows: | ||
| ``` | ||
| cd scripts | ||
| run-interactive-win.bat | ||
| ``` | ||
| | ||
| #### As an `Annotator` component | ||
| **illinois-srl** can also be used programmatically through the `SemanticRoleLabeler` class which implements CogComp's | ||
| [Annotator interface](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/Annotator.html). | ||
| | ||
| The main method is `getView(TextAnnotation)` inside `SemanticRoleLabeler`. This will add a new | ||
| [`PredicateArgumentView`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/PredicateArgumentView.html) | ||
| for either **Nom**inal or **Verb**al SRL. | ||
| | ||
| ### Training | ||
| To train the SRL system you will require access to the [Propbank](https://verbs.colorado.edu/~mpalmer/projects/ace.html) | ||
| or [Nombank](http://nlp.cs.nyu.edu/meyers/NomBank.html) corpora. You need to set pointers to these in the | ||
| `config/srl-config.properties` file. | ||
| (To train the system with a non-Prop/Nombank corpus, you need to extend | ||
| [`AbstractSRLAnnotationReader`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/nlp/corpusreaders/AbstractSRLAnnotationReader.html)) | ||
| | ||
| To perform the whole training/testing suite, run the `Main` class with parameters `<config-file> expt Verb|Nom true`. | ||
| This will: | ||
| | ||
| 1. Read and cache the datasets (train/test) | ||
| 2. Annotate each `TextAnnotation` with the required views | ||
| (here you can set the `useCurator` flag to false to use the CogComp's standalone NLP pipeline) | ||
| 3. Pre-extract and cache the features for the classifiers | ||
| 4. Train the classifiers | ||
| 5. Evaluate on the (cached) test corpus | ||
| | ||
| **IMPORTANT** After training, make sure you comment-out the pre-trained SRL model dependencies inside | ||
| `pom.xml` (lines 27-38). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # Available learning models: {L2LossSSVM, StructuredPerceptron} | ||
| LEARNING_MODEL = L2LossSSVM | ||
| | ||
| # Available solver types: {DCDSolver, ParallelDCDSolver, DEMIParallelDCDSolver} | ||
| L2_LOSS_SSVM_SOLVER_TYPE = ParallelDCDSolver | ||
| | ||
| NUMBER_OF_THREADS = 8 | ||
| | ||
| # Regularization parameter | ||
| C_FOR_STRUCTURE = 1.0 | ||
| | ||
| # Mini-batch for 'warm' start | ||
| TRAINMINI = true | ||
| TRAINMINI_SIZE = 10000 | ||
| | ||
| # Suppress optimatility check | ||
| CHECK_INFERENCE_OPT = false | ||
| | ||
| # Number of training rounds | ||
| MAX_NUM_ITER = 100 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| ## Flags for whether to use different annotators | ||
| usePos = true | ||
| useLemma = true | ||
| useShallowParse = true | ||
| useNerConll = true | ||
| useNerOntonotes = false | ||
| useStanfordParse = true | ||
| useStanfordDep = true | ||
| useSrlVerb = false | ||
| useSrlNom = false | ||
| | ||
| ## Flags for the Stanford parser (for pre-processing) | ||
| # Max time per sentence (in milliseconds) | ||
| stanfordMaxTimePerSentence = 1000 | ||
| | ||
| # Max sentence lenght (will throw exception if larger) | ||
| stanfordParseMaxSentenceLength = 80 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| ## Illinois SRL Configuration## | ||
| | ||
| # Whether to use the Illinois Curator to get the required annotations for training/testing | ||
| # If set to false, Illinois NLP pipeline will be used | ||
| UseCurator = false | ||
| | ||
| # The configuration of the Illinois NLP pipeline | ||
| PipelineConfig = config/pipeline.properties | ||
| | ||
| # The parser used to extract constituents and syntactic features | ||
| # Options are: Charniak, Berkeley, Stanford | ||
| # NB: Only Stanford can be used in standalone mode. | ||
| DefaultParser = Stanford | ||
| | ||
| # The configuration for the Structured learner | ||
| LearnerConfig = config/learner.properties | ||
| | ||
| # Num of threads for feat. ext. | ||
| NumFeatExtThreads = 10 | ||
| | ||
| # The ILP solver to use for the joint inference | ||
| # Options are: Gurobi, OJAlgo | ||
| ILPSolver = OJAlgo | ||
| | ||
| # The TextAnnotation caching mechanism to use | ||
| # Options are: MapDB, H2 | ||
| DatasetCache = MapDB | ||
| | ||
| ### Training corpora directories ### | ||
| # This is the directory of the merged (mrg) WSJ files | ||
| PennTreebankHome = /shared/corpora/corporaWeb/treebanks/eng/pennTreebank/treebank-3/parsed/mrg/wsj/ | ||
| PropbankHome = /shared/corpora/corporaWeb/treebanks/eng/propbank_1/data | ||
| NombankHome = /shared/corpora/corporaWeb/treebanks/eng/nombank/ | ||
| | ||
| # The directory of the sentence and pre-extracted features database (~5G of space required) | ||
| # Not used during test/working with pre-trained models | ||
| CacheDirectory = cache | ||
| | ||
| ModelsDirectory = models | ||
| | ||
| # Directory to output gold and predicted files for manual comparison | ||
| # Comment out for no output | ||
| OutputDirectory = srl-out |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
| xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
| | ||
| <parent> | ||
| <artifactId>illinois-cogcomp-nlp</artifactId> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <version>3.0.77</version> | ||
| </parent> | ||
| | ||
| <modelVersion>4.0.0</modelVersion> | ||
| <artifactId>illinois-srl</artifactId> | ||
| <packaging>jar</packaging> | ||
| <url>http://cogcomp.cs.illinois.edu</url> | ||
| | ||
| <properties> | ||
| <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
| <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> | ||
| <cogcomp-nlp-pipeline-version>0.1.24</cogcomp-nlp-pipeline-version> | ||
| </properties> | ||
| | ||
| <dependencies> | ||
| <!-- Include the pre-trained SRL models for running SemanticRoleLabeler --> | ||
| <!-- Notice that the models need to match up to the minor version number --> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-srl-models</artifactId> | ||
| <classifier>verb-stanford</classifier> | ||
| <version>5.1</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-srl-models</artifactId> | ||
| <classifier>nom-stanford</classifier> | ||
| <version>5.1</version> | ||
| </dependency> | ||
| | ||
| <!--The Illinois pipeline can be used instead --> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-nlp-pipeline</artifactId> | ||
| <version>${cogcomp-nlp-pipeline-version}</version> | ||
| <exclusions> | ||
| <exclusion> | ||
| <artifactId>illinois-srl</artifactId> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| | ||
| <!-- The following 3 projects are now developed under illinois-cogcomp-nlp --> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-core-utilities</artifactId> | ||
| <version>3.0.77</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-curator</artifactId> | ||
| <version>3.0.77</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-edison</artifactId> | ||
| <version>3.0.77</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.gurobi</groupId> | ||
| <artifactId>gurobi</artifactId> | ||
| <version>6.5</version> | ||
| <optional>true</optional> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-common-resources</artifactId> | ||
| <classifier>illinoisSRL</classifier> | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note the use of the classifier in case this needs to be addressed (see #43) Member Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh? I am not clean on the Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was using the Member Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. | ||
| <version>1.5</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-common-resources</artifactId> | ||
| <version>1.5</version> | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need both the general | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-common-resources</artifactId> | ||
| <classifier>ner</classifier> | ||
| <version>1.5</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-sl-core</artifactId> | ||
| <version>1.0.3</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>commons-lang</groupId> | ||
| <artifactId>commons-lang</artifactId> | ||
| <version>2.6</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.h2database</groupId> | ||
| <artifactId>h2</artifactId> | ||
| <version>1.4.190</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>edu.illinois.cs.cogcomp</groupId> | ||
| <artifactId>illinois-inference</artifactId> | ||
| <version>0.6.0</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.tartarus</groupId> | ||
| <artifactId>snowball</artifactId> | ||
| <version>1.0</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>junit</groupId> | ||
| <artifactId>junit</artifactId> | ||
| <version>4.12</version> | ||
| <scope>test</scope> | ||
| </dependency> | ||
| </dependencies> | ||
| | ||
| <reporting> | ||
| <excludeDefaults>true</excludeDefaults> | ||
| <plugins> | ||
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-javadoc-plugin</artifactId> | ||
| <version>2.10.3</version> | ||
| </plugin> | ||
| </plugins> | ||
| </reporting> | ||
| | ||
| </project> | ||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to change the models version (after retraining)