@@ -28,37 +28,34 @@ Use of OpenNMT consists of four steps:
2828
2929### 4) Evaluate.  
3030
31- ``` wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl ``` 
32- 
33- ``` perl multi-bleu.perl data/tgt-test.txt < demo_pred.txt ``` 
31+ ``` bash 
32+ wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl
33+ perl multi-bleu.perl data/tgt-test.txt <  demo_pred.txt
34+ ``` 
3435
3536## WMT'16 Multimodal Translation: Multi30k (de-en)  
3637
3738Data might not come as clean as the demo data. Here is a second example that uses the Moses tokenizer (http://www.statmt.org/moses/ ) to prepare the Multi30k data from the WMT'16 Multimodal Translation task (http://www.statmt.org/wmt16/multimodal-task.html ).
3839
3940### 0) Download the data.  
4041
41- ``` mkdir -p data/multi30k ``` 
42- 
43- ``` wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz && tar -xf training.tar.gz -C data/multi30k && rm training.tar.gz ``` 
44- 
45- ``` wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz && tar -xf validation.tar.gz -C data/multi30k && rm validation.tar.gz ``` 
46- 
47- ``` wget https://staff.fnwi.uva.nl/d.elliott/wmt16/mmt16_task1_test.tgz && tar -xf mmt16_task1_test.tgz -C data/multi30k && rm mmt16_task1_test.tgz ``` 
42+ ``` bash 
43+ mkdir -p data/multi30k
44+ wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz &&  tar -xf training.tar.gz -C data/multi30k &&  rm training.tar.gz
45+ wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz &&  tar -xf validation.tar.gz -C data/multi30k &&  rm validation.tar.gz
46+ wget https://staff.fnwi.uva.nl/d.elliott/wmt16/mmt16_task1_test.tgz &&  tar -xf mmt16_task1_test.tgz -C data/multi30k &&  rm mmt16_task1_test.tgz
47+ ``` 
4848
4949### 1) Preprocess the data.  
5050
51- ``` wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/tokenizer.perl ``` 
52- 
53- ``` sed -i "s/$RealBin\/..\/share\/nonbreaking_prefixes//" tokenizer.perl ``` 
54- 
55- ``` wget https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.de ``` 
56- 
57- ``` wget https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en ``` 
58- 
59- ``` for l in en de; do for f in data/multi30k/*.$l; do if [[ "$f" != *"test"* ]]; then sed -i "$ d" $f; fi; perl tokenizer.perl -no-escape -l $l -q < $f > $f.tok; done; done ``` 
60- 
61- ``` python preprocess.py -train_src data/multi30k/train.en.tok -train_tgt data/multi30k/train.de.tok -valid_src data/multi30k/val.en.tok -valid_tgt data/multi30k/val.de.tok -save_data data/multi30k ``` 
51+ ``` bash 
52+ wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/tokenizer.perl
53+ sed -i " s/$RealBin \/..\/share\/nonbreaking_prefixes//" 
54+ wget https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.de
55+ wget https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
56+ for  l  in  en de;  do  for  f  in  data/multi30k/* .$l ;  do  if  [[ " $f " !=  * " test" *  ]];  then  sed -i " $ d" $f ;  fi ;  perl tokenizer.perl -no-escape -l $l  -q <  $f  >  $f .tok;  done ;  done 
57+ python preprocess.py -train_src data/multi30k/train.en.tok -train_tgt data/multi30k/train.de.tok -valid_src data/multi30k/val.en.tok -valid_tgt data/multi30k/val.de.tok -save_data data/multi30k
58+ ``` 
6259
6360### 2) Train the model.  
6461
@@ -70,9 +67,10 @@ Data might not come as clean as the demo data. Here is a second example that use
7067
7168### 4) Evaluate.  
7269
73- ``` wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl ``` 
74- 
75- ``` perl multi-bleu.perl data/multi30k/test.de.tok < multi30k_pred.txt ``` 
70+ ``` bash 
71+ wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl
72+ perl multi-bleu.perl data/multi30k/test.de.tok <  multi30k_pred.txt
73+ ``` 
7674
7775## Pretrained Models  
7876
0 commit comments