Skip to content

Conversation

@hhadian
Copy link
Contributor

@hhadian hhadian commented Apr 23, 2018

This adds a few scripts (with results) which use unconstrained-egs (in PR #2341).

@danpovey
Copy link
Contributor

That's cool. Would you mind merging this branch with PR #2341 so I can merge them together?
I think I've decided to merge this as-is, and hopefully this week, separately merge @jtrmal's speed improvements when they're done.

@danpovey
Copy link
Contributor

Cool.
I'll merge this soon.

@danpovey
Copy link
Contributor

Looks like the test code doesn't compile-- my branch has the same issue. Can you please address that?

@francisr
Copy link
Contributor

I haven't followed closely the whole thing, but it seemed to me that the time constraint was found to be useful when LF-MMI was first made, why is it better without it now?

@hhadian hhadian changed the title [WIP] Add unconstrained-egs scripts+results for IAM and swbd [WIP] Add support for unconstrained-egs for chain training + example recipes Apr 24, 2018
@hhadian
Copy link
Contributor Author

hhadian commented Apr 24, 2018

The time constraints are still used but indirectly, i.e. we can still change left/right tolerance but they are less likely to make a difference, I guess. The supervision FSTs are converted to unconstrained FSTs (i.e. e2e-style) only after time constraints are applied and after they are split up into chunks.

WriteBasicType(os, binary, label_dim);
KALDI_ASSERT(frames_per_sequence > 0 && label_dim > 0 &&
num_sequences > 0);
WriteToken(os, binary, "<End2End>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for context, Hossein said offline:)

The problem is in reading a non-binary supervision which starts like this: <Supervision> <Weight> 1 <NumSequences> 1 <FramesPerSeq> 16 <LabelDim> 981 0 1 498 498 0 2 497 497 0 3 496 496 I guess what happens is that after the label dim is read, we do if (PeekToken(is, binary) == 'E' ) to see if there is a <End2End> tag but this peeking removes the '\n' at the end of the first line which causes the next command which is ReadFstKaldi to fail (I guess because it expects a '\n' or whitespace but sees a '0'). I can fix ReadFstKaldi or PeekToken (e.g. to put back ws/newlines if no '<' was read). Not sure what is the best fix. What would you suggest? 

Likely the issue is that ungetc failed (which it sometimes can, but not always).
@hhadian, I think the easiest fix would be to revert this change here, to make it always write the <End2End> tag.

@hhadian
Copy link
Contributor Author

hhadian commented Apr 24, 2018

I pushed a fix. I noticed there are no tests for the new features, i.e. e2e/unconstrained supervisions. Should I add tests for them?

@danpovey
Copy link
Contributor

If you can quickly add any tests, e.g. that the I/O works, then do it. Just don't make it a blocker.

@danpovey danpovey merged commit f0333bb into kaldi-asr:master Apr 24, 2018
@danpovey danpovey changed the title [WIP] Add support for unconstrained-egs for chain training + example recipes Add support for unconstrained-egs for chain training + example recipes Apr 25, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
…example recipes (kaldi-asr#2383) This enables ignoring the supervision phone-to-frame alignment information inside chunks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants