[wip] Tedlium r3 recipe #2442

francoishernandez · 2018-05-22T13:02:23Z

Here is the recipe coming with the recent release of TED-LIUM corpus version 3.

I copied the ability to download the LMs from David's PR, and made the same for the RNNLM models. Could you add these models to kaldi-asr.org? (I added a dummy link in local/ted_download_rnnlm.sh for now.)

I did not add what Dan is mentioning about the ivector training here yet.

Also, please note that the "join_suffix" normalization script present in r2 has been applied to the LM data available in the r3 tarball, which explains its disappearing from this recipe.

…ium_r3_recipe

danpovey · 2018-05-23T02:12:55Z

@david-ryan-snyder, do you have time to deal with the model upload?

danpovey · 2018-05-23T02:17:57Z

egs/tedlium/s5_r3/cmd.sh

+export train_cmd="queue.pl"
+export decode_cmd="queue.pl --mem 4G"
+
+host=$(hostname -f)


Please remove the stuff from this line from this file. We really shouldn't be putting these cluster-specific things in these files. (I know this was copied from elsewhere).

Sorry about that, copied it from s5_r2, you might want to remove it from there as well.

danpovey · 2018-05-23T02:18:45Z

egs/tedlium/s5_r3/run.sh

+ # this does some data-cleaning. It actually degrades the GMM-level results
+ # slightly, but the cleaned data should be useful when we add the neural net and chain
+ # systems. If not we'll remove this stage.
+ local/run_cleanup_segmentation.sh


did you ever test whether this data cleanup is helpful?

david-ryan-snyder · 2018-05-23T02:23:28Z

Sure, I'll upload the LM.

david-ryan-snyder · 2018-05-23T02:38:47Z

@francoishernandez,

You can find the model here: http://kaldi-asr.org/models/m5
Here's a direct link: http://kaldi-asr.org/models/5/tedlium_rnnlm.tgz

I can update the Kaldi version (that currently says TODO) after this is merged.

xiaohui-zhang · 2018-06-12T02:29:22Z

@francoishernandez @vince62s any comment? I know it might not be easy to recover those s...

BWT, I've got results:
With cleanup (it helps. So we'd better keep the cleanup stage)
%WER 7.53 [ 1339 / 17783, 215 ins, 329 del, 795 sub ] exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/wer_10
%WER 7.87 [ 2164 / 27500, 297 ins, 610 del, 1257 sub ] exp/chain_cleaned/tdnnf_1a/decode_test_rescore/wer_9

Without cleanup:
%WER 7.93 [ 1411 / 17783, 210 ins, 422 del, 779 sub ] exp/chain/tdnnf_1a/decode_dev_rescore/wer_9
%WER 8.23 [ 2262 / 27500, 292 ins, 790 del, 1180 sub ] exp/chain/tdnnf_1a/decode_test_rescore/wer_9

My results are still worse than the reported numbers though... (7.2/7.5 on page 6)

francoishernandez · 2018-06-12T09:04:11Z

Hi,
I agree it might not be easy to recover those <unk>. Removing those at beginning and end, it remains 100k of them -- for a global corpus of 4.8M words. I had a look at a few random files and compared them with the original unaligned transcripts. It seems a lot of those 'intra-utterance' <unk> are punctuation so it might not interfere much.
You're right about the lexicon being noisy, this is not helping. Your work about lexicon learning looks very interesting to optimize such systems.

About the results, I checked our original run and I see two possible sources of difference:

the changes in the ivector training routine suggested by Dan (LDA --> PCA)
the LMs are downloaded from David's run here and they seem a bit different from the ones we have trained

Did you change anything else in the pipeline?

Also, I checked the tri3 numbers for the same run where we got 7.2/7.5:
%WER 18.4 | 507 17783 | 84.8 10.8 4.4 3.2 18.4 91.7 | -0.056 | exp/tri3/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 16.4 | 1155 27500 | 86.1 10.6 3.3 2.5 16.4 86.4 | -0.025 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys

So, we're also 0.4/0.3 points below, but that might be less of an issue than on tdnnf.

francoishernandez · 2018-06-12T13:23:28Z

Hey,
In fact it might come from the LM. The default script bypasses the ted_train_lm step by downloading David's LM. But in the ted_train_lm script we also take into account (new) data from the audio train set (x2 compared to s5_r2).
Could you try with our LMs instead?

xiaohui-zhang · 2018-06-12T17:43:58Z

hi @francoishernandez, with your LM I re-decoded and here are the results:
xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/tri3/decode_test_rescore/we* | utils/best_wer.sh
%WER 16.64 [ 4575 / 27500, 679 ins, 915 del, 2981 sub ]
xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/tri3/decode_dev_rescore/we* | utils/best_wer.sh
%WER 18.95 [ 3369 / 17783, 567 ins, 841 del, 1961 sub ]

xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/we* | utils/best_wer.sh
%WER 7.52 [ 1337 / 17783, 214 ins, 329 del, 794 sub ]
xzhang@b02:/export/b04/xzhang/kaldi/egs/tedlium/s5_r3$ cat exp/chain_cleaned/tdnnf_1a/decode_test_rescore/we* | utils/best_wer.sh
%WER 7.87 [ 2164 / 27500, 297 ins, 610 del, 1257 sub ]

Not better than before. I guess we should first try to match the tri3 numbers.... I didn't make other changes. Could you try starting a clean run with the checked-in scripts also?

francoishernandez · 2018-06-12T17:52:16Z

Ok thanks for these results.
Yes I already started a new run with the checked-in recipe. It's at tri2 right now, I'll keep you posted.

francoishernandez · 2018-06-15T17:55:40Z

Ok guys,

I reran the checked in recipe with the released data, and got the exact same tri3 results as @xiaohui-zhang:
%WER 18.84 [ 3351 / 17783, 553 ins, 852 del, 1946 sub ] exp/tri3/decode_dev_rescore/wer_17
%WER 16.68 [ 4586 / 27500, 682 ins, 907 del, 2997 sub ] exp/tri3/decode_test_rescore/wer_15

I noticed a difference in scores between sclite and score_basic on the same lattices (sclite being better). The original run from the paper gives these results for tri3 with score_basic:
%WER 18.75 [ 3334 / 17783, 591 ins, 792 del, 1951 sub ] exp/tri3/decode_dev_rescore/wer_14
%WER 16.58 [ 4559 / 27500, 667 ins, 909 del, 2983 sub ] exp/tri3/decode_test_rescore/wer_15

As a reminder, they were the following with sclite:
%WER 18.4 | 507 17783 | 84.8 10.8 4.4 3.2 18.4 91.7 | -0.056 | exp/tri3/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 16.4 | 1155 27500 | 86.1 10.6 3.3 2.5 16.4 86.4 | -0.025 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys

So, we're almost closing the gap here.

The thing is, right before submitting the paper and releasing the corpus, we chose to remove a few of the talks which represented unnecessary weight in the archive for only a few noisy utterances (around 200, almost empty or speech inside songs). The small remaining difference might come from there.

As for the chain results, here are the ones of the run from the paper, with score_basic:
exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/wer_10
%WER 7.37 [ 1310 / 17783, 213 ins, 316 del, 781 sub ]
exp/chain_cleaned/tdnnf_1a/decode_test_rescore/wer_9
%WER 7.55 [ 2075 / 27500, 280 ins, 588 del, 1207 sub ]

We're not closing the gap yet here. I suspect it might come from the modification in the ivector training strategy. I launched a run starting from the checked-in tri3, with the original ivector strategy to see where that leads us.

Once this is done I'll update the results based on score_basic to avoid confusion.

jtrmal · 2018-06-15T21:13:07Z

if you are comparing sctk vs kaldi scoring, the differences in scores do not have correspond to one AM being better than other -- sctk can be smart when it comes to scoring fragments and unks, the default kaldi scoring script not so much. I'd suggest looking/comparing the sctk alignment files (*pra?) vs the kaldi_scoring/wer_details/per_utt y.

…

On Fri, Jun 15, 2018 at 1:55 PM François Hernandez ***@***.***> wrote: Ok guys, I reran the checked in recipe with the released data, and got the exact same tri3 results as @xiaohui-zhang <https://github.com/xiaohui-zhang>: %WER 18.84 [ 3351 / 17783, 553 ins, 852 del, 1946 sub ] exp/tri3/decode_dev_rescore/wer_17 %WER 16.68 [ 4586 / 27500, 682 ins, 907 del, 2997 sub ] exp/tri3/decode_test_rescore/wer_15 I noticed a difference in scores between sclite and score_basic on the same lattices (sclite being better). The original run from the paper gives these results for tri3 with score_basic: %WER 18.75 [ 3334 / 17783, 591 ins, 792 del, 1951 sub ] exp/tri3/decode_dev_rescore/wer_14 %WER 16.58 [ 4559 / 27500, 667 ins, 909 del, 2983 sub ] exp/tri3/decode_test_rescore/wer_15 As a reminder, they were the following with sclite: %WER 18.4 | 507 17783 | 84.8 10.8 4.4 3.2 18.4 91.7 | -0.056 | exp/tri3/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys %WER 16.4 | 1155 27500 | 86.1 10.6 3.3 2.5 16.4 86.4 | -0.025 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys So, we're almost closing the gap here. The thing is, right before submitting the paper and releasing the corpus, we chose to remove a few of the talks which represented unnecessary weight in the archive for only a few noisy utterances (around 200, almost empty or speech inside songs). The small remaining difference might come from there. As for the chain results, here are the ones of the run from the paper, with score_basic: exp/chain_cleaned/tdnnf_1a/decode_dev_rescore/wer_10 %WER 7.37 [ 1310 / 17783, 213 ins, 316 del, 781 sub ] exp/chain_cleaned/tdnnf_1a/decode_test_rescore/wer_9 %WER 7.55 [ 2075 / 27500, 280 ins, 588 del, 1207 sub ] We're not closing the gap yet here. I suspect it might come from the modification in the ivector training strategy. I launched a run starting from the checked-in tri3, with the original ivector strategy to see where that leads us. Once this is done I'll update the results based on score_basic to avoid confusion. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX2ZXWNbZzukwLM_HoTVNBBxA5A6Vks5t8_UogaJpZM4UIma9> .

francoishernandez · 2018-06-22T10:23:13Z

Thanks Yenda for mentioning this!
I made a full run from scratch and updated the results accordingly. I will update the paper as well (final deadline is the 24th).
We chose to keep sclite scores in the paper, so I added a column in the run_tdnnf_1a.sh header to mention both sclite and score_basic results.

…ium_r3_recipe

xiaohui-zhang · 2018-06-25T22:02:50Z

Thanks @francoishernandez for updating. Maybe you can also remove those "garbage" sentences from the releases dataset?

francoishernandez · 2018-06-25T22:05:30Z

Yes that's what we did. (Original run from paper had 2397 files, released data has 2351)

xiaohui-zhang · 2018-06-26T03:55:53Z

cool thanks for clarifying

vince62s · 2018-06-27T22:55:46Z

Let us know if you guys need anything else to merge.
Cheers.
Vincent.

danpovey

I looked through it quickly and noticed a few issues.

danpovey · 2018-06-27T23:19:15Z

egs/tedlium/s5_r3/cmd.sh

+# JHU cluster (or most clusters using GridEngine, with a suitable
+# conf/queue.conf).
+export train_cmd="queue.pl"
+export decode_cmd="queue.pl --mem 4G"


please add missing newline

danpovey · 2018-06-27T23:20:22Z

egs/tedlium/s5_r3/local/chain/tuning/run_tdnn_1a.sh

@@ -0,0 +1,233 @@
+#!/bin/bash
+
+# See run_tdnnf_1a.sh for comparative results.


you should still always include the current result in any given example script, even if there is nothing to compare with. But please don't make TDNN and TDNN-F part of different sequences: call them both 'tdnn'.

danpovey · 2018-06-27T23:21:15Z

egs/tedlium/s5_r3/local/chain/tuning/run_tdnnf_1a.sh

+# Final train prob -0.0802 -0.0899
+# Final valid prob -0.0980 -0.0974
+# Final train prob (xent) -1.1450 -0.9449
+# Final valid prob (xent) -1.2498 -1.0002


please have the compare.sh script also output the number of parameters.

danpovey · 2018-06-27T23:22:47Z

egs/tedlium/s5_r3/local/chain/tuning/run_tdnnf_1a.sh

+
+ echo "$0: creating neural net configs using the xconfig parser";
+
+ num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')


I thought I mentioned this before, but this example script is kind of broken. The xconfig refers to shell variables $opts, $linear_opts, $output_opts (and maybe others) which are not defined.

Removed these in 75e9d60

danpovey · 2018-06-27T23:23:22Z

egs/tedlium/s5_r3/local/chain/tuning/run_tdnnf_1a.sh

+ --feat.cmvn-opts "--norm-means=false --norm-vars=false" \
+ --chain.xent-regularize 0.1 \
+ --chain.leaky-hmm-coefficient 0.1 \
+ --chain.l2-regularize 0.00005 \


you probably want this set to zero. (Especially after fixing the accidental omission of l2 regularziation at the model level).

danpovey · 2018-06-27T23:24:52Z

egs/tedlium/s5_r3/local/prepare_data.sh

+
+. ./path.sh
+
+export LC_ALL=C


this should be unnecessary, as path.sh should export LC_ALL=C.

danpovey · 2018-06-27T23:25:04Z

egs/tedlium/s5_r3/local/prepare_dict.sh

+dir=data/local/dict_nosp
+mkdir -p $dir
+
+srcdict=db//TEDLIUM_release-3/TEDLIUM.152k.dic


remove double slash.

danpovey · 2018-06-29T17:21:27Z

Are you OK to wait to get results with the fixed TDNN-F training run, or do you want me to merge it now and update the results later?

francoishernandez · 2018-06-29T17:47:54Z

Do you think this will change a lot the results? (The only modification in the end was the 'useless' l2 regul option, no?) I would vote to merge it as is, so that anyone can do further runs ans tuning. Le ven. 29 juin 2018 19:21, Daniel Povey <notifications@github.com> a écrit :

…

Are you OK to wait to get results with the fixed TDNN-F training run, or do you want me to merge it now and update the results later? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOMOkzqItc53kNt6ce44GaP1lXVrocDYks5uBmIpgaJpZM4UIma9> .

danpovey · 2018-06-29T17:52:03Z

The l2 is quite important: if you don't have it, it trains too slowly. I think that's why the TDNN-F barely gave any improvement. But actually that recipe could be improved substantially anyway, for other reasons. Maybe the best thing is to just remove that recipe entirely and leave the 1a; then later on, I or someone else can add a good-performing TDNN-F recipe.

…

On Fri, Jun 29, 2018 at 1:47 PM, François Hernandez < ***@***.***> wrote: Do you think this will change a lot the results? (The only modification in the end was the 'useless' l2 regul option, no?) I would vote to merge it as is, so that anyone can do further runs ans tuning. Le ven. 29 juin 2018 19:21, Daniel Povey ***@***.***> a écrit : > Are you OK to wait to get results with the fixed TDNN-F training run, or > do you want me to merge it now and update the results later? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2442 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ AOMOkzqItc53kNt6ce44GaP1lXVrocDYks5uBmIpgaJpZM4UIma9> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuw514hOoc2CzTBMabtjNg300kqR4ks5uBmhOgaJpZM4UIma9> .

francoishernandez · 2018-06-29T18:45:39Z

Ok as you wish, we just wanted to have it somewhere to go with the paper. But I can keep my branch up for that purpose. Le ven. 29 juin 2018 19:52, Daniel Povey <notifications@github.com> a écrit :

…

The l2 is quite important: if you don't have it, it trains too slowly. I think that's why the TDNN-F barely gave any improvement. But actually that recipe could be improved substantially anyway, for other reasons. Maybe the best thing is to just remove that recipe entirely and leave the 1a; then later on, I or someone else can add a good-performing TDNN-F recipe. On Fri, Jun 29, 2018 at 1:47 PM, François Hernandez < ***@***.***> wrote: > Do you think this will change a lot the results? > (The only modification in the end was the 'useless' l2 regul option, no?) > > I would vote to merge it as is, so that anyone can do further runs ans > tuning. > > Le ven. 29 juin 2018 19:21, Daniel Povey ***@***.***> a > écrit : > > > Are you OK to wait to get results with the fixed TDNN-F training run, or > > do you want me to merge it now and update the results later? > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > <#2442 (comment)>, > or mute > > the thread > > <https://github.com/notifications/unsubscribe-auth/ > AOMOkzqItc53kNt6ce44GaP1lXVrocDYks5uBmIpgaJpZM4UIma9> > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2442 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ADJVuw514hOoc2CzTBMabtjNg300kqR4ks5uBmhOgaJpZM4UIma9 > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2442 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOMOk92bDX68G2FnifoG7MeGWDx51iklks5uBmlRgaJpZM4UIma9> .

vince62s · 2018-07-12T15:31:06Z

Dan, if you don't mind I think it is good to merge as is. It will match the paper and be the baseline.
It will always be good to have baseline (even not perfect) that people can play with and try to improve.
Paper will be presented by François at SPECOM.
Thanks.

danpovey · 2018-07-12T18:48:44Z

Can you at least put a warning at the top of the TDNN-F script that the script has some problems (you can mention the specific problems f you want) and should not be used as an example to copy from?
I'll try to run a more up-to-date (and fixed) setup soon.

François Hernandez added 23 commits May 4, 2018 16:16

initial commit run.sh s5_r3 WIP

7f3d44e

add links and conf dir s5_r3

cd00b07

add tdnnf best result script TODO header

769809c

add some rnnlm scripts WIP

0bd9254

add {cmd,path,results}.sh

cafdebf

add some unchanged scripts from r2 to r3

45b300b

add download script

79d1390

local/prepare_data.sh

58f7343

local/prepare_dict.sh

abae0fb

add option to download lms

6a2ae29

remove local/join_suffix.py

1ac8696

local/run_cleanup_segmentation.sh stage 16

ceb03de

add run_tdnnf.sh link

52f70e1

clean header chain scripts

252c70d

clean chain tuning naming

6f9bd8b

some lm related scripts

c66f9c2

minor change run.sh

442d22c

reset stage run

5010911

cosmetic

dbb4440

add rnnlm results

62b8826

LM corpus for rnnlm

cc62841

Merge branch 'master' of https://github.com/kaldi-asr/kaldi into tedl…

40041db

…ium_r3_recipe

remove useless config files

c022a0a

danpovey reviewed May 23, 2018

View reviewed changes

François Hernandez added 2 commits May 23, 2018 09:39

remove host stuff from cmd.sh

774b253

change rnnlm download link

3855c7b

francoishernandez force-pushed the tedlium_r3_recipe branch from 553f7b8 to 27067fc Compare June 22, 2018 10:14

francoishernandez added 3 commits June 22, 2018 12:17

fix ted_download_rnnlm script

677f754

fix rnnlm rescoring in run.sh

b1d9300

add both sclite and score_basic scores in tdnnf script

bf68071

Merge branch 'master' of https://github.com/kaldi-asr/kaldi into tedl…

a6cbb32

…ium_r3_recipe

danpovey reviewed Jun 27, 2018

View reviewed changes

francoishernandez added 2 commits June 28, 2018 12:39

some fix to tdnn scripts

75e9d60

minor fix preparation scripts

6095425

add warning tdnnf setup

285b389

danpovey merged commit fdb6774 into kaldi-asr:master Jul 12, 2018

dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018

[egs] Add scripts for release 3 of Tedlium corpus (kaldi-asr#2442)

58e7acc

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[egs] Add scripts for release 3 of Tedlium corpus (kaldi-asr#2442)

4cc1fb8

		@@ -0,0 +1,233 @@
		#!/bin/bash

		# See run_tdnnf_1a.sh for comparative results.


		echo "$0: creating neural net configs using the xconfig parser";

		num_targets=$(tree-info $tree_dir/tree \|grep num-pdfs\|awk '{print $2}')

[wip] Tedlium r3 recipe #2442

[wip] Tedlium r3 recipe #2442

Uh oh!

Conversation

francoishernandez commented May 22, 2018

danpovey commented May 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-ryan-snyder commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

david-ryan-snyder commented May 23, 2018

xiaohui-zhang commented Jun 12, 2018

francoishernandez commented Jun 12, 2018

francoishernandez commented Jun 12, 2018

xiaohui-zhang commented Jun 12, 2018

francoishernandez commented Jun 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

francoishernandez commented Jun 15, 2018

jtrmal commented Jun 15, 2018 via email

francoishernandez commented Jun 22, 2018

xiaohui-zhang commented Jun 25, 2018

francoishernandez commented Jun 25, 2018

xiaohui-zhang commented Jun 26, 2018

vince62s commented Jun 27, 2018

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Jun 29, 2018

francoishernandez commented Jun 29, 2018 via email

danpovey commented Jun 29, 2018 via email

francoishernandez commented Jun 29, 2018 via email

vince62s commented Jul 12, 2018

danpovey commented Jul 12, 2018

Labels

7 participants

david-ryan-snyder commented May 23, 2018 •

edited

Loading

francoishernandez commented Jun 12, 2018 •

edited

Loading