Name	Name	Last commit message	Last commit date
Latest commit History 174 Commits
.github/workflows	.github/workflows
docs	docs
images	images
writeup	writeup
.gitignore	.gitignore
AnnotatedTransformer.ipynb	AnnotatedTransformer.ipynb
LICENSE	LICENSE
Makefile	Makefile
README.md	README.md
acllatex.tpl	acllatex.tpl
jekyll.py	jekyll.py
jekyll.tpl	jekyll.tpl
requirements.txt	requirements.txt
the_annotated_transformer.py	the_annotated_transformer.py

Code for The Annotated Transformer blog post:

http://nlp.seas.harvard.edu/2018/04/03/attention.html

Package Dependencies

Use requirements.txt to install library dependencies with pip:

pip install -r requirements.txt

torchtext 0.12 unable to download IWSLT2016 - temporary workaround

Unfortunately the current version of torchtext does not download the IWSLT2016 dataset due to a bug reported here.

For the time being, follow the instructions from the issue (reproduced here) to download the dataset manually.

Download the IWSLT2016 data manually from this link. If the download worked the file size should be close to 187.6 MB in size.
When opening the annotated transformer notebook, in the jupyter lab file browser, click the upload button to upload the 2016-01.tgz file downloaded in step 1.
Open a terminal and move 2016-01.tgz from the notebook folder to ~/.torchtext/cache/IWSLT2016/ using mv 2016-01.tgz ~/.torchtext/cache/IWSLT2016/. This is the cache location where torchtext checks for the dataset if it has already been downloaded.

Once torchtext resolves the download problem we should be able to eliminate these steps, track the issue for the current status of the issue resolution.

Notebook Setup

The Annotated Transformer is created using jupytext.

Regular notebooks pose problems for source control - cell outputs end up in the repo history and diffs between commits are difficult to examine. Using jupytext, there is a python script (.py file) that is automatically kept in sync with the notebook file by the jupytext plugin.

The python script is committed contains all the cell content and can be used to generate the notebook file. The python script is a regular python source file, markdown sections are included using a standard comment convention, and outputs are not saved. The notebook itself is treated as a build artifact and is not commited to the git repository.

Prior to using this repo, make sure jupytext is installed by following the installation instructions here.

To produce the .ipynb notebook file using the markdown source, run (under the hood, the notebook build target simply runs jupytext --to ipynb the_annotated_transformer.py):

make notebook

To produce the html version of the notebook, run:

make html

make html is just a shortcut for for generating the notebook with jupytext --to ipynb the_annotated_transformer.py followed by using the jupyter nbconvert command to produce html using jupyter nbconvert --to html the_annotated_transformer.ipynb

Formatting and Linting

To keep the code formatting clean, the annotated transformer git repo has a git action to check that the code conforms to PEP8 coding standards.

To make this easier, there are two Makefile build targets to run automatic code formatting with black and flake8.

Be sure to install black and flake8.

You can then run:

make black

(or alternatively manually call black black --line-length 79 the_annotated_transformer.py) to format code automatically using black and:

make flake

(or manually call flake8 `flake8 --show-source the_annotated_transformer.py) to check for PEP8 violations.

It's recommended to run these two commands and fix any flake8 errors that arise, when submitting a PR, otherwise the github actions CI will report an error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Package Dependencies

torchtext 0.12 unable to download IWSLT2016 - temporary workaround

Notebook Setup

Formatting and Linting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Languages

License

harvardnlp/annotated-transformer

Folders and files

Latest commit

History

Repository files navigation

Package Dependencies

torchtext 0.12 unable to download IWSLT2016 - temporary workaround

Notebook Setup

Formatting and Linting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Languages

Packages