Skip to content

Commit 7216d03

Browse files
Merge branch 'v0.4.0'
2 parents 6a44aff + 9e028ec commit 7216d03

File tree

183 files changed

+28712
-12064
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

183 files changed

+28712
-12064
lines changed

.dockerignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
**/.git
2+
**/.gitignore
3+
**/*.md
4+
**/*~
5+
.dockerignore
6+
docker

.gitignore

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,24 @@ set_environment.sh
99
*.wiki/
1010

1111
# data
12+
*.zip
1213
DATA*
14+
*oracles/
15+
EXP*
16+
checkpoints*
17+
amr_corpus*
18+
#!EXPR
19+
*/gigaword_ref.txt
20+
file.amr
1321

1422
#
1523
preprocess/jamr
1624
preprocess/kevin
1725

26+
# run scripts logs
27+
logs*
28+
*.log
29+
1830
# external tools
1931
smatch*
2032
amr-evaluation/
@@ -24,21 +36,28 @@ fairseq-*
2436
.python-version
2537
venv*/
2638
cenv*/
39+
amr0.4_ody/
40+
amr0.4_o8/
41+
amr0.4_draft/
2742

2843
# debug
2944
PROGRESS
3045
*.lprof
3146
debug*
3247
tmp*
3348
TMP*
49+
PROGRESS
3450

3551
# other
3652
__pycache__/
53+
*.ipynb_checkpoints/
3754
transition_amr_parser.egg-info/
3855
# assumed used to store models
3956
models/
57+
!fairseq_ext/models
4058
# assumed where data stored
4159
data/
60+
!fairseq_ext/data/
4261

4362
# python package
4463
dist/
@@ -50,3 +69,4 @@ jbsub_logs/
5069
# vim
5170
.vim/
5271
*.swp
72+
*~

README.md

Lines changed: 40 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,119 +1,94 @@
11
Transition-based AMR Parser
22
============================
33

4-
Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. The code includes two fundamental components.
5-
6-
1. A State machine and oracle transforming the sequence-to-graph task into a sequence-to-sequence problem. This follows the AMR oracles in [(Ballesteros and Al-Onaizan 2017)](https://arxiv.org/abs/1707.07755v1) with improvements from [(Naseem et al 2019)](https://arxiv.org/abs/1905.13370) and [(Fernandez Astudillo et al 2020)](https://openreview.net/pdf?id=b36spsuUAde)
7-
8-
2. The stack-Transformer [(Fernandez Astudillo et al 2020)](https://openreview.net/pdf?id=b36spsuUAde). A sequence to sequence model that also encodes stack and buffer state of the parser into its attention heads.
9-
10-
Current version is `0.3.3` and yields `80.5` Smatch on the AMR2.0 test-set using the default stack-Transformer configuration. Aside from listed [contributors](https://github.com/IBM/transition-amr-parser/graphs/contributors), the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM.
4+
Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. Current version (`v0.4.0`) implements the `action-pointer` model [(Zhou et al 2021)](https://openreview.net/forum?id=X9KK-SCmKWn). For the `stack-Transformer` model [(Fernandez Astudillo et al 2020)](https://arxiv.org/abs/2010.10669) checkout `v0.3.3`. Aside from listed [contributors](https://github.com/IBM/transition-amr-parser/graphs/contributors), the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM.
115

126
## IBM Internal Features
137

148
Check [Parsing Services](https://github.ibm.com/mnlp/transition-amr-parser/wiki/Parsing-Services) for the endpoint URLs and Docker instructions. If you have acess to CCC and LDC data, we have available both the train data and trained models.
159

16-
## Manual Installation
17-
18-
Clone the repository
10+
## Installation
1911

12+
We use a `set_environment.sh` script to activate conda/pyenv and virtual
13+
environments. You can leave this empty if you dont want to use it, but scripts
14+
will assume at least an empty file exists.
2015
```bash
2116
git clone git@github.ibm.com:mnlp/transition-amr-parser.git
2217
cd transition-amr-parser
23-
```
24-
25-
The code has been tested on Python `3.6.9`. We use a script to activate
26-
conda/pyenv and virtual environments. If you prefer to handle this yourself
27-
just create an empty file (the training scripts will assume it exists in any
28-
case).
29-
30-
```bash
3118
touch set_environment.sh
19+
. set_environment.sh
20+
pip install .
3221
```
3322

34-
Then for `pip` only install do
23+
The AMR aligner uses additional tools that can be donwloaded and installed with
3524

3625
```
37-
. set_environment.sh
38-
pip install -r scripts/stack-transformer/requirements.txt
39-
bash scripts/download_and_patch_fairseq.sh
40-
pip install --no-deps --editable fairseq-stack-transformer
41-
pip install --editable .
26+
bash preprocess/install_alignment_tools.sh
4227
```
4328

44-
Alternatively for a `conda` install do
29+
If you use already aligned AMR, you will not need this.
4530

31+
## Installation Details
32+
33+
An example of `set_environment.sh`
4634
```
47-
. set_environment.sh
48-
conda env update -f scripts/stack-transformer/environment.yml
49-
pip install spacy==2.2.3 smatch==1.0.4 ipdb
50-
bash scripts/download_and_patch_fairseq.sh
51-
pip install --no-deps --editable fairseq-stack-transformer
52-
pip install --editable .
35+
# Activate conda and local virtualenv for this machine
36+
eval "$(/path/to/miniconda3/bin/conda shell.bash hook)"
37+
[ ! -d cenv_x86 ] && conda create -y -p ./cenv_x86
38+
conda activate ./cenv_x86
5339
```
5440

55-
If you are installing in PowerPCs, you will have to use the conda option. Also
56-
spacy has to be installed with conda instead of pip (2.2.3 version will not be
57-
available, which affects the lematizer behaviour)
58-
59-
To check if install worked do
41+
The code has been tested on Python `3.6` and `3.7` (x86 only). Alternatively,
42+
you may pre-install some of the packages with conda, if this works better on
43+
your achitecture, and the do the pip install above. You will need this for PPC
44+
instals.
45+
```
46+
conda install pytorch=1.3.0 -y -c pytorch
47+
conda install -c conda-forge nvidia-apex -y
48+
```
6049

50+
To test if install worked
6151
```bash
62-
. set_environment.sh
63-
python tests/correctly_installed.py
52+
bash tests/correctly_installed.sh
6453
```
65-
66-
As a further check, you can do a mini test with 25 annotated sentences that we
67-
provide under DATA/, you can use this
68-
54+
To do a mini-test with 25 annotated sentences that we provide. This should take 1-3 minutes. It wont learn anything but at least will run all stages.
6955
```bash
7056
bash tests/minimal_test.sh
7157
```
7258

73-
This runs a full train test excluding alignment and should take around a
74-
minute. Note that the model will not be able to learn from only 25 sentences.
75-
76-
The AMR aligner uses additional tools that can be donwloaded and installed with
77-
78-
```
79-
bash preprocess/install_alignment_tools.sh
80-
```
81-
8259
## Training a model
8360

8461
You first need to preprocess and align the data. For AMR2.0 do
8562

8663
```bash
8764
. set_environment.sh
88-
python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR/corpora/amr2.0/
65+
python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR2.0/corpora/
8966
```
9067

91-
The same for AMR1.0
68+
You will also need to unzip the precomputed BLINK cache
9269

9370
```
94-
python preprocess/merge_files.py /path/to/LDC2014T12/data/amrs/split/ DATA/AMR/corpora/amr1.0/
71+
unzip /dccstor/ykt-parse/SHARED/CORPORA/EL/linkcache.zip
9572
```
9673

97-
You will also need to unzip the precomputed BLINK cache
74+
To launch train/test use
9875

9976
```
100-
unzip /dccstor/ykt-parse/SHARED/CORPORA/EL/linkcache.zip
77+
bash run/run_experiment.sh configs/amr2.0-action-pointer.sh
10178
```
10279

103-
Then just call a config to carry a desired experiment
80+
you can check training status with
10481

105-
```bash
106-
bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
82+
```
83+
python run/status.py --config configs/amr2.0-action-pointer.sh
10784
```
10885

109-
To display the results use
86+
Note that for CCC there is a version using `jbsub` that split the task into
87+
multiple sequential jobs and supports multiple seeds and testing in paralell
11088

111-
```bash
112-
python scripts/stack-transformer/rank_results.py --seed-average
11389
```
114-
115-
Note that there is cluster version of this script, currently only supporting
116-
LSF but easily adaptable to e.g. Slurm
90+
bash run/lsf/run_experiment.sh configs/amr2.0-action-pointer.sh
91+
```
11792

11893
## Decode with Pre-trained model
11994

0 commit comments

Comments
 (0)