|
1 | 1 | Transition-based AMR Parser |
2 | 2 | ============================ |
3 | 3 |
|
4 | | -Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. The code includes two fundamental components. |
5 | | - |
6 | | -1. A State machine and oracle transforming the sequence-to-graph task into a sequence-to-sequence problem. This follows the AMR oracles in [(Ballesteros and Al-Onaizan 2017)](https://arxiv.org/abs/1707.07755v1) with improvements from [(Naseem et al 2019)](https://arxiv.org/abs/1905.13370) and [(Fernandez Astudillo et al 2020)](https://openreview.net/pdf?id=b36spsuUAde) |
7 | | - |
8 | | -2. The stack-Transformer [(Fernandez Astudillo et al 2020)](https://openreview.net/pdf?id=b36spsuUAde). A sequence to sequence model that also encodes stack and buffer state of the parser into its attention heads. |
9 | | - |
10 | | -Current version is `0.3.3` and yields `80.5` Smatch on the AMR2.0 test-set using the default stack-Transformer configuration. Aside from listed [contributors](https://github.com/IBM/transition-amr-parser/graphs/contributors), the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM. |
| 4 | +Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. Current version (`v0.4.0`) implements the `action-pointer` model [(Zhou et al 2021)](https://openreview.net/forum?id=X9KK-SCmKWn). For the `stack-Transformer` model [(Fernandez Astudillo et al 2020)](https://arxiv.org/abs/2010.10669) checkout `v0.3.3`. Aside from listed [contributors](https://github.com/IBM/transition-amr-parser/graphs/contributors), the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM. |
11 | 5 |
|
12 | 6 | ## IBM Internal Features |
13 | 7 |
|
14 | 8 | Check [Parsing Services](https://github.ibm.com/mnlp/transition-amr-parser/wiki/Parsing-Services) for the endpoint URLs and Docker instructions. If you have acess to CCC and LDC data, we have available both the train data and trained models. |
15 | 9 |
|
16 | | -## Manual Installation |
17 | | - |
18 | | -Clone the repository |
| 10 | +## Installation |
19 | 11 |
|
| 12 | +We use a `set_environment.sh` script to activate conda/pyenv and virtual |
| 13 | +environments. You can leave this empty if you dont want to use it, but scripts |
| 14 | +will assume at least an empty file exists. |
20 | 15 | ```bash |
21 | 16 | git clone git@github.ibm.com:mnlp/transition-amr-parser.git |
22 | 17 | cd transition-amr-parser |
23 | | -``` |
24 | | - |
25 | | -The code has been tested on Python `3.6.9`. We use a script to activate |
26 | | -conda/pyenv and virtual environments. If you prefer to handle this yourself |
27 | | -just create an empty file (the training scripts will assume it exists in any |
28 | | -case). |
29 | | - |
30 | | -```bash |
31 | 18 | touch set_environment.sh |
| 19 | +. set_environment.sh |
| 20 | +pip install . |
32 | 21 | ``` |
33 | 22 |
|
34 | | -Then for `pip` only install do |
| 23 | +The AMR aligner uses additional tools that can be donwloaded and installed with |
35 | 24 |
|
36 | 25 | ``` |
37 | | -. set_environment.sh |
38 | | -pip install -r scripts/stack-transformer/requirements.txt |
39 | | -bash scripts/download_and_patch_fairseq.sh |
40 | | -pip install --no-deps --editable fairseq-stack-transformer |
41 | | -pip install --editable . |
| 26 | +bash preprocess/install_alignment_tools.sh |
42 | 27 | ``` |
43 | 28 |
|
44 | | -Alternatively for a `conda` install do |
| 29 | +If you use already aligned AMR, you will not need this. |
45 | 30 |
|
| 31 | +## Installation Details |
| 32 | + |
| 33 | +An example of `set_environment.sh` |
46 | 34 | ``` |
47 | | -. set_environment.sh |
48 | | -conda env update -f scripts/stack-transformer/environment.yml |
49 | | -pip install spacy==2.2.3 smatch==1.0.4 ipdb |
50 | | -bash scripts/download_and_patch_fairseq.sh |
51 | | -pip install --no-deps --editable fairseq-stack-transformer |
52 | | -pip install --editable . |
| 35 | +# Activate conda and local virtualenv for this machine |
| 36 | +eval "$(/path/to/miniconda3/bin/conda shell.bash hook)" |
| 37 | +[ ! -d cenv_x86 ] && conda create -y -p ./cenv_x86 |
| 38 | +conda activate ./cenv_x86 |
53 | 39 | ``` |
54 | 40 |
|
55 | | -If you are installing in PowerPCs, you will have to use the conda option. Also |
56 | | -spacy has to be installed with conda instead of pip (2.2.3 version will not be |
57 | | -available, which affects the lematizer behaviour) |
58 | | - |
59 | | -To check if install worked do |
| 41 | +The code has been tested on Python `3.6` and `3.7` (x86 only). Alternatively, |
| 42 | +you may pre-install some of the packages with conda, if this works better on |
| 43 | +your achitecture, and the do the pip install above. You will need this for PPC |
| 44 | +instals. |
| 45 | +``` |
| 46 | +conda install pytorch=1.3.0 -y -c pytorch |
| 47 | +conda install -c conda-forge nvidia-apex -y |
| 48 | +``` |
60 | 49 |
|
| 50 | +To test if install worked |
61 | 51 | ```bash |
62 | | -. set_environment.sh |
63 | | -python tests/correctly_installed.py |
| 52 | +bash tests/correctly_installed.sh |
64 | 53 | ``` |
65 | | - |
66 | | -As a further check, you can do a mini test with 25 annotated sentences that we |
67 | | -provide under DATA/, you can use this |
68 | | - |
| 54 | +To do a mini-test with 25 annotated sentences that we provide. This should take 1-3 minutes. It wont learn anything but at least will run all stages. |
69 | 55 | ```bash |
70 | 56 | bash tests/minimal_test.sh |
71 | 57 | ``` |
72 | 58 |
|
73 | | -This runs a full train test excluding alignment and should take around a |
74 | | -minute. Note that the model will not be able to learn from only 25 sentences. |
75 | | - |
76 | | -The AMR aligner uses additional tools that can be donwloaded and installed with |
77 | | - |
78 | | -``` |
79 | | -bash preprocess/install_alignment_tools.sh |
80 | | -``` |
81 | | - |
82 | 59 | ## Training a model |
83 | 60 |
|
84 | 61 | You first need to preprocess and align the data. For AMR2.0 do |
85 | 62 |
|
86 | 63 | ```bash |
87 | 64 | . set_environment.sh |
88 | | -python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR/corpora/amr2.0/ |
| 65 | +python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR2.0/corpora/ |
89 | 66 | ``` |
90 | 67 |
|
91 | | -The same for AMR1.0 |
| 68 | +You will also need to unzip the precomputed BLINK cache |
92 | 69 |
|
93 | 70 | ``` |
94 | | -python preprocess/merge_files.py /path/to/LDC2014T12/data/amrs/split/ DATA/AMR/corpora/amr1.0/ |
| 71 | +unzip /dccstor/ykt-parse/SHARED/CORPORA/EL/linkcache.zip |
95 | 72 | ``` |
96 | 73 |
|
97 | | -You will also need to unzip the precomputed BLINK cache |
| 74 | +To launch train/test use |
98 | 75 |
|
99 | 76 | ``` |
100 | | -unzip /dccstor/ykt-parse/SHARED/CORPORA/EL/linkcache.zip |
| 77 | +bash run/run_experiment.sh configs/amr2.0-action-pointer.sh |
101 | 78 | ``` |
102 | 79 |
|
103 | | -Then just call a config to carry a desired experiment |
| 80 | +you can check training status with |
104 | 81 |
|
105 | | -```bash |
106 | | -bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh |
| 82 | +``` |
| 83 | +python run/status.py --config configs/amr2.0-action-pointer.sh |
107 | 84 | ``` |
108 | 85 |
|
109 | | -To display the results use |
| 86 | +Note that for CCC there is a version using `jbsub` that split the task into |
| 87 | +multiple sequential jobs and supports multiple seeds and testing in paralell |
110 | 88 |
|
111 | | -```bash |
112 | | -python scripts/stack-transformer/rank_results.py --seed-average |
113 | 89 | ``` |
114 | | - |
115 | | -Note that there is cluster version of this script, currently only supporting |
116 | | -LSF but easily adaptable to e.g. Slurm |
| 90 | +bash run/lsf/run_experiment.sh configs/amr2.0-action-pointer.sh |
| 91 | +``` |
117 | 92 |
|
118 | 93 | ## Decode with Pre-trained model |
119 | 94 |
|
|
0 commit comments