Transformer (NMT)
Model Description
The Transformer, introduced in the paper Attention Is All You Need, is a powerful sequence-to-sequence modeling architecture capable of producing state-of-the-art neural machine translation (NMT) systems.
Recently, the fairseq team has explored large-scale semi-supervised training of Transformers using back-translated data, further improving translation quality over the original model. More details can be found in this blog post.
Requirements
We require a few additional Python dependencies for preprocessing:
pip install bitarray fastBPE hydra-core omegaconf regex requests sacremoses subword_nmt
English-to-French Translation
To translate from English to French using the model from the paper Scaling Neural Machine Translation:
import torch # Load an En-Fr Transformer model trained on WMT'14 data : en2fr = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt') # Use the GPU (optional): en2fr.cuda() # Translate with beam search: fr = en2fr.translate('Hello world!', beam=5) assert fr == 'Bonjour à tous !' # Manually tokenize: en_toks = en2fr.tokenize('Hello world!') assert en_toks == 'Hello world !' # Manually apply BPE: en_bpe = en2fr.apply_bpe(en_toks) assert en_bpe == 'H@@ ello world !' # Manually binarize: en_bin = en2fr.binarize(en_bpe) assert en_bin.tolist() == [329, 14044, 682, 812, 2] # Generate five translations with top-k sampling: fr_bin = en2fr.generate(en_bin, beam=5, sampling=True, sampling_topk=20) assert len(fr_bin) == 5 # Convert one of the samples to a string and detokenize fr_sample = fr_bin[0]['tokens'] fr_bpe = en2fr.string(fr_sample) fr_toks = en2fr.remove_bpe(fr_bpe) fr = en2fr.detokenize(fr_toks) assert fr == en2fr.decode(fr_sample)
English-to-German Translation
Semi-supervised training with back-translation is an effective way of improving translation systems. In the paper Understanding Back-Translation at Scale, we back-translate over 200 million German sentences to use as additional training data. An ensemble of five of these models was the winning submission to the WMT’18 English-German news translation competition.
We can further improved this approach through noisy-channel reranking. More details can be found in this blog post. An ensemble of models trained with this technique was the winning submission to the WMT’19 English-German news translation competition.
To translate from English to German using one of the models from the winning submission:
import torch # Load an En-De Transformer model trained on WMT'19 data: en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model', tokenizer='moses', bpe='fastbpe') # Access the underlying TransformerModel assert isinstance(en2de.models[0], torch.nn.Module) # Translate from En-De de = en2de.translate('PyTorch Hub is a pre-trained model repository designed to facilitate research reproducibility.') assert de == 'PyTorch Hub ist ein vorgefertigtes Modell-Repository, das die Reproduzierbarkeit der Forschung erleichtern soll.'
We can also do a round-trip translation to create a paraphrase:
# Round-trip translations between English and German: en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model', tokenizer='moses', bpe='fastbpe') de2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.de-en.single_model', tokenizer='moses', bpe='fastbpe') paraphrase = de2en.translate(en2de.translate('PyTorch Hub is an awesome interface!')) assert paraphrase == 'PyTorch Hub is a fantastic interface!' # Compare the results with English-Russian round-trip translation: en2ru = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-ru.single_model', tokenizer='moses', bpe='fastbpe') ru2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.ru-en.single_model', tokenizer='moses', bpe='fastbpe') paraphrase = ru2en.translate(en2ru.translate('PyTorch Hub is an awesome interface!')) assert paraphrase == 'PyTorch is a great interface!'
References
