IBM
diff --git a/‎.dockerignore‎
Lines changed: 6 additions & 0 deletions b/‎.dockerignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 20 additions & 0 deletions b/‎.gitignore‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 40 additions & 65 deletions b/‎README.md‎
Lines changed: 40 additions & 65 deletions
@@ -0,0 +1,6 @@
+**/.git
+**/.gitignore
+**/*.md 
+**/*~
+.dockerignore
+docker
@@ -9,12 +9,24 @@ set_environment.sh
 *.wiki/
 
 # data
+*.zip
 DATA*
+*oracles/
+EXP*
+checkpoints*
+amr_corpus*
+#!EXPR
+*/gigaword_ref.txt
+file.amr
 
 #
 preprocess/jamr
 preprocess/kevin
 
+# run scripts logs
+logs*
+*.log
+
 # external tools
 smatch*
 amr-evaluation/
@@ -24,21 +36,28 @@ fairseq-*
 .python-version
 venv*/
 cenv*/
+amr0.4_ody/
+amr0.4_o8/
+amr0.4_draft/
 
 # debug
 PROGRESS
 *.lprof
 debug*
 tmp*
 TMP*
+PROGRESS
 
 # other
 __pycache__/
+*.ipynb_checkpoints/
 transition_amr_parser.egg-info/
 # assumed used to store models
 models/
+!fairseq_ext/models
 # assumed where data stored
 data/
+!fairseq_ext/data/
 
 # python package
 dist/
@@ -50,3 +69,4 @@ jbsub_logs/
 # vim
 .vim/
 *.swp
+*~
@@ -1,119 +1,94 @@
 Transition-based AMR Parser
 ============================
 
-Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. The code includes two fundamental components.
-
-1. A State machine and oracle transforming the sequence-to-graph task into a sequence-to-sequence problem. This follows the AMR oracles in [(Ballesteros and Al-Onaizan 2017)](https://arxiv.org/abs/1707.07755v1) with improvements from [(Naseem et al 2019)](https://arxiv.org/abs/1905.13370) and [(Fernandez Astudillo et al 2020)](https://openreview.net/pdf?id=b36spsuUAde)
-
-2. The stack-Transformer [(Fernandez Astudillo et al 2020)](https://openreview.net/pdf?id=b36spsuUAde). A sequence to sequence model that also encodes stack and buffer state of the parser into its attention heads.
-
-Current version is `0.3.3` and yields `80.5` Smatch on the AMR2.0 test-set using the default stack-Transformer configuration. Aside from listed [contributors](https://github.com/IBM/transition-amr-parser/graphs/contributors), the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM.
+Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. Current version (`v0.4.0`) implements the `action-pointer` model [(Zhou et al 2021)](https://openreview.net/forum?id=X9KK-SCmKWn). For the `stack-Transformer` model [(Fernandez Astudillo et al 2020)](https://arxiv.org/abs/2010.10669) checkout `v0.3.3`. Aside from listed [contributors](https://github.com/IBM/transition-amr-parser/graphs/contributors), the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM.
 
 ## IBM Internal Features
 
 Check [Parsing Services](https://github.ibm.com/mnlp/transition-amr-parser/wiki/Parsing-Services) for the endpoint URLs and Docker instructions. If you have acess to CCC and LDC data, we have available both the train data and trained models.
 
-## Manual Installation
-
-Clone the repository
+## Installation
 
+We use a `set_environment.sh` script to activate conda/pyenv and virtual
+environments. You can leave this empty if you dont want to use it, but scripts
+will assume at least an empty file exists.
 ```bash
 git clone git@github.ibm.com:mnlp/transition-amr-parser.git
 cd transition-amr-parser
-```
-
-The code has been tested on Python `3.6.9`. We use a script to activate
-conda/pyenv and virtual environments. If you prefer to handle this yourself
-just create an empty file (the training scripts will assume it exists in any
-case).
-
-```bash
 touch set_environment.sh
+. set_environment.sh
+pip install .
 ```
 
-Then for `pip` only install do
+The AMR aligner uses additional tools that can be donwloaded and installed with
 
 ```
-. set_environment.sh
-pip install -r scripts/stack-transformer/requirements.txt
-bash scripts/download_and_patch_fairseq.sh
-pip install --no-deps --editable fairseq-stack-transformer
-pip install --editable .
+bash preprocess/install_alignment_tools.sh
 ```
 
-Alternatively for a `conda` install do
+If you use already aligned AMR, you will not need this.
 
+## Installation Details
+
+An example of `set_environment.sh`
 ```
-. set_environment.sh
-conda env update -f scripts/stack-transformer/environment.yml
-pip install spacy==2.2.3 smatch==1.0.4 ipdb
-bash scripts/download_and_patch_fairseq.sh
-pip install --no-deps --editable fairseq-stack-transformer
-pip install --editable .
+# Activate conda and local virtualenv for this machine
+eval "$(/path/to/miniconda3/bin/conda shell.bash hook)"
+[ ! -d cenv_x86 ] && conda create -y -p ./cenv_x86
+conda activate ./cenv_x86
 ```
 
-If you are installing in PowerPCs, you will have to use the conda option. Also
-spacy has to be installed with conda instead of pip (2.2.3 version will not be
-available, which affects the lematizer behaviour)
-
-To check if install worked do
+The code has been tested on Python `3.6` and `3.7` (x86 only). Alternatively,
+you may pre-install some of the packages with conda, if this works better on
+your achitecture, and the do the pip install above. You will need this for PPC
+instals.
+```
+conda install pytorch=1.3.0 -y -c pytorch
+conda install -c conda-forge nvidia-apex -y
+```
 
+To test if install worked
 ```bash
-. set_environment.sh
-python tests/correctly_installed.py
+bash tests/correctly_installed.sh
 ```
-
-As a further check, you can do a mini test with 25 annotated sentences that we
-provide under DATA/, you can use this
-
+To do a mini-test with 25 annotated sentences that we provide. This should take 1-3 minutes. It wont learn anything but at least will run all stages.
 ```bash
 bash tests/minimal_test.sh
 ```
 
-This runs a full train test excluding alignment and should take around a
-minute. Note that the model will not be able to learn from only 25 sentences.
-
-The AMR aligner uses additional tools that can be donwloaded and installed with
-
-```
-bash preprocess/install_alignment_tools.sh
-```
-
 ## Training a model
 
 You first need to preprocess and align the data. For AMR2.0 do
 
 ```bash
 . set_environment.sh
-python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR/corpora/amr2.0/
+python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR2.0/corpora/
 ```
 
-The same for AMR1.0
+You will also need to unzip the precomputed BLINK cache
 
 ```
-python preprocess/merge_files.py /path/to/LDC2014T12/data/amrs/split/ DATA/AMR/corpora/amr1.0/
+unzip /dccstor/ykt-parse/SHARED/CORPORA/EL/linkcache.zip
 ```
 
-You will also need to unzip the precomputed BLINK cache
+To launch train/test use
 
 ```
-unzip /dccstor/ykt-parse/SHARED/CORPORA/EL/linkcache.zip
+bash run/run_experiment.sh configs/amr2.0-action-pointer.sh
 ```
 
-Then just call a config to carry a desired experiment
+you can check training status with
 
-```bash
-bash scripts/stack-transformer/experiment.sh configs/amr2_o5+Word100_roberta.large.top24_stnp6x6.sh
+```
+python run/status.py --config configs/amr2.0-action-pointer.sh
 ```
 
-To display the results use
+Note that for CCC there is a version using `jbsub` that split the task into
+multiple sequential jobs and supports multiple seeds and testing in paralell
 
-```bash
-python scripts/stack-transformer/rank_results.py --seed-average
 ```
-
-Note that there is cluster version of this script, currently only supporting
-LSF but easily adaptable to e.g. Slurm
+bash run/lsf/run_experiment.sh configs/amr2.0-action-pointer.sh
+``` 
 
 ## Decode with Pre-trained model
-Original file line number
+Diff line change
@@ @@ -0,0 +1,6 @@ @@
 +**/.git
 +**/.gitignore
 +**/*.md
 +**/*~
 +.dockerignore
 +docker