Attention modeling, with example scripts #1731

danpovey · 2017-06-30T18:05:36Z

still far from compiling.

danpovey · 2017-06-30T18:10:07Z

@hhadian, there's some stuff you can help with here:

Add attention.o to Makefile and make sure it compiles.
Write test for GetAttentionDotProducts() [you can do this with reference to the
comment that explains what it does: write a simple version in the test file, that
you can compare with.]
Write ApplyScalesToOutput() and ApplyScalesToInput(), and write testing code for them.
The code will have a loop that goes up to context_dim, just like GetAttentionDotProducts(), and
each time it loops it will call AddDiagVecMat. It will need to create a temporary transposed copy
of the 'C' input.

Note: eventually, if this works, we may ask @kangshiyin to write CUDA versions of GetAttentionDotProducts(), ApplyScalesToOutput(), and ApplyScalesToInput(). But this probably won't take more than half the time, even with the naive implementation, so there's no need to do that just yet.

hhadian · 2017-06-30T18:48:35Z

Will do

hhadian · 2017-06-30T20:03:02Z

src/nnet3/nnet-attention-component.cc

+
+
+
+void TimeHeightConvolutionComponent::Check() {


Is TimeHeightConvolutionComponent supposed to be here? Almost all the functions ralating TimeHeightConvolutionComponent are duplicated here.

danpovey · 2017-06-30T20:05:18Z

I am copying-and-modifying that file. I'll remove all that stuff. For now don't bother compiling that file, I'm working on it.

…

On Fri, Jun 30, 2017 at 4:03 PM, Hossein Hadian ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/nnet3/nnet-attention-component.cc <#1731 (comment)>: > + // 'keys' contains the keys; note, these are not extended with + // context information; that happens further in. + CuSubMatrix<BaseFloat> keys(in, 0, in.NumRows(), 0, key_dim_); + + // 'values' contains the values which we will interpolate. + // these don't contain the context information; that will be added + // later if output_context_ == true. + CuSubMatrix<BaseFloat> values(in, 0, in.NumRows(), key_dim_, value_dim_); + + + AttentionForward(key_scale_, keys, queries, values, c, out); +} + + + +void TimeHeightConvolutionComponent::Check() { Is TimeHeightConvolutionComponent supposed to be here? Almost all the functions ralating TimeHeightConvolutionComponent are duplicated here. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuxQTo75lwzns5X8feBthBKn5AW6iks5sJVR3gaJpZM4OK40U> .

hhadian · 2017-06-30T20:14:46Z

OK, sure, I was just wondering.

hhadian · 2017-06-30T21:47:34Z

Do you want me to write the test for GetAttentionDotProducts in a new file attention-test.cc or in an existing tester file? Also, the function itself seems not to be implemented yet (I looked in attention.cc), should I implement it?

I read the docs, but I'm not sure what query, key, and value are going to be in ASR tasks. The values should be the frames of speech, but what are the keys and queries?

danpovey · 2017-06-30T21:51:56Z

new file attention-test.cc. It's implemented but mis-named as AttentionCoreForward(), in the cc file, fix the name. query, key and value are all just different sub-ranges of the input to the layer, i.e. they're ranges of column indexes.

…

On Fri, Jun 30, 2017 at 5:47 PM, Hossein Hadian ***@***.***> wrote: Do you want me to write the test for GetAttentionDotProducts in a new file attention-test.cc or in an existing tester file? Also, the function itself seems not to be implemented yet (I looked in attention.cc), should I implement it? I read the docs, but I'm not sure what query, key, and value are going to be in ASR tasks. The values should be the frames of speech, but what are the keys and queries? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuwoqpCeqBmWKm8Smyl9vUU6kFPTOks5sJWz4gaJpZM4OK40U> .

hhadian · 2017-07-01T00:23:29Z

src/nnet3/attention.h

+
+ This function implements:
+
+ A->Row(i) += alpha * C(i, j) * B.Row(i + j * row_shift).


This line has j on the right-hand side but not on the left-hand side. I'm not sure I get it right.

danpovey · 2017-07-01T00:39:10Z

add a \sum_j on the right hand side.

…

On Fri, Jun 30, 2017 at 8:23 PM, Hossein Hadian ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/nnet3/attention.h <#1731 (comment)>: > + CuMatrixBase<BaseFloat> *C); + + +/** + This function is related to GetAttentionDotProducts(); it + is used in scaling the values by the softmax scales, and + in backprop. + + We have put the A, B and C in an unusual order here in order + to make clearer the relationship with GetAttentionDotProducts(). + The matrices have the same relationship in terms of their + dimensions, as A, B and C do in GetAttentionDotProducts(). + + This function implements: + + A->Row(i) += alpha * C(i, j) * B.Row(i + j * row_shift). This line has j on the right-hand side but not on the left-hand side. I'm not sure I get it right. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu1IcinmZoX2aUvuYZpKkx5AU0Rj2ks5sJZGDgaJpZM4OK40U> .

hhadian · 2017-07-01T03:26:28Z

src/nnet3/attention.cc

+ CuSubMatrix<BaseFloat> output_values_part(
+ *output, 0, num_output_rows, 0, value_dim);
+
+ ApplyScalesToOutput(1.0, values, *c, &output_values_part);


Since it is asserted (a few lines before) that values and c both have num_output_rows rows, this will set A, B, and C in a way that all have the same number of rows, so row_shift will become 0.

danpovey · 2017-07-01T03:31:47Z

'values' should have 'num_input_rows' rows, the assert must have been a mistake.

…

On Fri, Jun 30, 2017 at 11:26 PM, Hossein Hadian ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/nnet3/attention.cc <#1731 (comment)>: > + queries_key_part, + keys, c); + // think of 'queries_context_part' as a position-dependent bias term. + c->AddMat(1.0, queries_context_part); + // compute the soft-max function. Up till this point, 'c' + // actually contained what in attention.h we called 'b', which is + // the input to the softmax. + c->SoftMaxPerRow(*c); + + + // the part of the output that is weighted + // combinations of the input values. + CuSubMatrix<BaseFloat> output_values_part( + *output, 0, num_output_rows, 0, value_dim); + + ApplyScalesToOutput(1.0, values, *c, &output_values_part); Since it is asserted (a few lines before) that values and c both have num_output_rows rows, this will set A, B, and C in a way that all have the same number of rows, so row_shift will become 0. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuyCMB9aaXxZPvPYxJfoae7RRSjMEks5sJbxmgaJpZM4OK40U> .

…lyScalesToInput not done yet

[WIP] [attention] add ApplyScalesToInput/Output functions and tests

Add ApplyScalesToInput + test

Add test template for AttentionForward/Backward

danpovey · 2017-07-03T19:43:05Z

@hhadian, I think the component-level code is now working and tested.

Can you please work on the script-level changes required to test this?
I think the easiest way to do this will be to copy-and-modify BasicLayer, supporting layers of the
form affine + attention + [some kind of nonlinearity]. Do this in a new file attention.py Let's not bother with the ResNet-like thing they were doing, at this point; I've found previously in speech tasks that that stuff was not helpful. Note: our "NormalizeLayer" is basically the same as the "layer normalization" from Hinton that they refer to in the paper.

E.g. we want someone to be able to write in a config line:

attention-renorm-layer num-heads=10 value-dim=50 key-dim=50 time-stride=3 num-left-inputs=5 num-right-inputs=2.
or
attention-relu-renorm-layer num-heads=10 value-dim=50 key-dim=50 time-stride=3 num-left-inputs=5 num-right-inputs=2.

You can intersperse these with regular relu-batchnorm-layers for initial experiments.

You can have num-left-inputs-required and num-right-inputs-required and key-dim all present but defaulting to -1, and output-context present and defaulting to true; time-stride can default to 1 and num-heads to 1, but require the user to specify value-dim, key-dim, num-left-inputs and num-right-inputs.

hhadian · 2017-07-03T20:25:26Z

Will do

hhadian · 2017-07-03T23:30:47Z

What values do you suggest for the first experiment? With num-heads=10 value-dim=50 key-dim=50 time-stride=3 num-left-inputs=5 num-right-inputs=2, the input dim of the attention block is 1580 and the output dim is 580.

danpovey · 2017-07-03T23:33:44Z

That's similar to TDNNs with relu-dim=512, so I think it's a reasonable setting.

…

On Mon, Jul 3, 2017 at 7:30 PM, Hossein Hadian ***@***.***> wrote: What values do you suggest for the first experiment? With num-heads=10 value-dim=50 key-dim=50 time-stride=3 num-left-inputs=5 num-right-inputs=2, the input dim of the attention block is 1580 and the output dim is 580. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu-4j3R2kPZ6Te2SEwqhOt2O3sP2zks5sKXmpgaJpZM4OK40U> .

danpovey · 2017-07-04T04:08:02Z

@hhadian, I notice that the stats are not being printed in the progress logs.
It is due to an oversight on my part, I should have implemented the functions Add() and Scale() in the component, which would add and scale the stats. Also ZeroStats(). Can you fix it? You can look at class NonlinearComponent (see nnet-component-itf.{h,cc}) for inspiration.

hhadian · 2017-07-04T04:21:00Z

Sure, will do.

Add Scale/Add/ZeroStats + xconfig scripts for Attention

A fix in Add function

kkm000 · 2017-07-06T21:56:10Z

src/nnet3/nnet-compile-utils.cc

+void GetTList(const std::vector<Index> &indexes,
+ std::vector<int32> *t_values) {
+ // set of t values
+ std::unordered_set<int32> t_set;


Optionally, use std::set, which is sorted. Might be marginally more efficient.

You may also use some stl magic to reduce the function to just three lines, arguably more readable:

std::set<int32> t_set; std::remove_copy(indexes.begin(), indexes.end(), std::inserter(t_set, t_set.begin()), kNoTime); t_values->assign(t_set.begin(), t_set.end());

(include <algorithm> and <iterator>).

In the normal case, there could be many (e.g. 128) copies of each 't' value, so in that case I think
the way we have it is more efficient. (Also Index is a struct, not an integer).

I see, missed the iter->t part.

Update recipes + minor fix to allow for bigger contexts

hhadian · 2017-07-15T05:08:51Z

The results of changing num_heads while keeping the output/input dimension constant.
[The input dim of attention layer is num_heads * (3 * dim + C) assuming key_dim = value_dim = dim and C is context-size]
[The output dim is num_heads * (dim + C) ]

In the following results, input_dim ~= 1580

# System tdnn_7k head15_dim34 head10_dim50 head8_dim63 # WER on train_dev(tg) 13.93 14.33 14.26 14.10 # WER on train_dev(fg) 12.85 13.27 12.93 12.99 # WER on eval2000(tg) 16.7 16.9 16.6 16.6 # WER on eval2000(fg) 15.0 15.2 14.8 15.0 # Final train prob -0.085 -0.079 -0.079 -0.080 # Final valid prob -0.106 -0.103 -0.102 -0.101 # Final train prob (xent) -1.260 -1.026 -1.024 -1.037 # Final valid prob (xent) -1.3193 -1.1072 -1.1048 -1.1147

In the following results, input_dim ~= 2330 (i.e. ~50% bigger layers):

# System tdnn_7k head20_dim37 head15_dim50 head10_dim75 # WER on train_dev(tg) 13.93 14.07 13.96 14.01 # WER on train_dev(fg) 12.85 12.98 12.90 12.81 # WER on eval2000(tg) 16.7 16.9 16.4 16.4 # WER on eval2000(fg) 15.0 15.3 14.8 14.9 # Final train prob -0.085 -0.076 -0.078 -0.077 # Final valid prob -0.106 -0.100 -0.101 -0.101 # Final train prob (xent) -1.260 -0.999 -0.995 -1.003 # Final valid prob (xent) -1.3193 -1.0964 -1.0946 -1.0926

In all these, there are 2 attention layers, one near the beginning and one near the end and context is (5, 2).

danpovey · 2017-07-15T05:20:27Z

Hm. It does look like having the key/value dimension less than about 50 is harmful. Try setups where the value dimension is larger than the key dimension (e.g. twice larger), with slightly fewer heads. If you could do some of these experiments in a context where you replace more of the TDNN layers with attention layers, the results might be more different, even if we ultimately decide to use only a couple of attention layers.

…

On Sat, Jul 15, 2017 at 1:08 AM, Hossein Hadian ***@***.***> wrote: The results of changing num_heads while keeping the output/input dimension constant. [The input dim of attention layer is num_heads * (3 * dim + C) assuming key_dim = value_dim = dim and C is context-size] [The output dim is num_heads * (dim + C) ] In the following results, input_dim ~= 1580 # System tdnn_7k head15_dim34 head10_dim50 head8_dim63 # WER on train_dev(tg) 13.93 14.33 14.26 14.10 # WER on train_dev(fg) 12.85 13.27 12.93 12.99 # WER on eval2000(tg) 16.7 16.9 16.6 16.6 # WER on eval2000(fg) 15.0 15.2 14.8 15.0 # Final train prob -0.085 -0.079 -0.079 -0.080 # Final valid prob -0.106 -0.103 -0.102 -0.101 # Final train prob (xent) -1.260 -1.026 -1.024 -1.037 # Final valid prob (xent) -1.3193 -1.1072 -1.1048 -1.1147 In the following results, input_dim ~= 2330 (i.e. ~50% bigger layers): # System tdnn_7k head20_dim37 head15_dim50 head10_dim75 # WER on train_dev(tg) 13.93 14.07 13.96 14.01 # WER on train_dev(fg) 12.85 12.98 12.90 12.81 # WER on eval2000(tg) 16.7 16.9 16.4 16.4 # WER on eval2000(fg) 15.0 15.3 14.8 14.9 # Final train prob -0.085 -0.076 -0.078 -0.077 # Final valid prob -0.106 -0.100 -0.101 -0.101 # Final train prob (xent) -1.260 -0.999 -0.995 -1.003 # Final valid prob (xent) -1.3193 -1.0964 -1.0946 -1.0926 In all these, there are 2 attention layers, one near the beginning and one near the end and context is (5, 2). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuwqfaonr_M-hFzRznCi1g2g3iHx_ks5sOElogaJpZM4OK40U> .

hhadian · 2017-07-15T05:23:11Z

Will do

hhadian · 2017-07-18T19:13:32Z

Results regarding the position of attention layer in the network:

# System tdnn_7k L2 L5 L6 L7 # WER on train_dev(tg) 13.93 14.11 14.00 13.80 13.64 # WER on train_dev(fg) 12.85 12.94 12.85 12.74 12.55 # WER on eval2000(tg) 16.7 16.8 16.6 16.4 16.3 # WER on eval2000(fg) 15.0 15.2 15.0 15.0 14.8 # Final train prob -0.085 -0.085 -0.080 -0.079 -0.077 # Final valid prob -0.106 -0.104 -0.101 -0.101 -0.099 # Final train prob (xent) -1.260 -1.150 -1.034 -1.030 -1.009 # Final valid prob (xent) -1.3193 -1.2133 -1.1319 -1.1034 -1.0980

Li means only layer i is attention and the rest are TDNN as baseline.
With the current config (i.e. context 5,2, key/value dim 50 and num-heads 10 ) attention is not working in the initial layers (I used time-stride=1 for L2 and time-stride=3 for the rest). I am trying a larger context for L2 to see if it helps.
I guess I should also try it on pre-final-chain (i.e. layer 8).

hhadian · 2017-07-18T19:54:38Z

Since attention is good at layer 7, I tried bigger value dimensions with that:

# System tdnn_7k L7_key50_val50 L7_key50_val100 L7_key40-val80 # WER on train_dev(tg) 13.93 13.64 13.64 13.76 # WER on train_dev(fg) 12.85 12.55 12.68 12.62 # WER on eval2000(tg) 16.7 16.3 16.3 16.2 # WER on eval2000(fg) 15.0 14.8 14.7 14.6 # Final train prob -0.085 -0.077 -0.074 -0.076 # Final valid prob -0.106 -0.099 -0.095 -0.098 # Final train prob (xent) -1.260 -1.009 -0.984 -0.997 # Final valid prob (xent) -1.3193 -1.0980 -1.0727 -1.0887

So the best result is currently L7_key40-val80 with 0.5% and 0.4% absolute improvement on eval2000 tg and fg.

danpovey · 2017-07-18T21:19:43Z

Cool! Don't ignore the train_dev, those numbers are just as important as eval2000. Overall it's not clear to me that you are getting an improvement from a larger value-dim. Try a larger amount of left and right context for the attention layer, or two attention layers right at the end.

…

On Tue, Jul 18, 2017 at 3:54 PM, Hossein Hadian ***@***.***> wrote: Since attention is good at layer 7, I tried bigger value dimensions with that: # System tdnn_7k L7_key50_val50 L7_key50_val100 L7_key40-val80 # WER on train_dev(tg) 13.93 13.64 13.64 13.76 # WER on train_dev(fg) 12.85 12.55 12.68 12.62 # WER on eval2000(tg) 16.7 16.3 16.3 16.2 # WER on eval2000(fg) 15.0 14.8 14.7 14.6 # Final train prob -0.085 -0.077 -0.074 -0.076 # Final valid prob -0.106 -0.099 -0.095 -0.098 # Final train prob (xent) -1.260 -1.009 -0.984 -0.997 # Final valid prob (xent) -1.3193 -1.0980 -1.0727 -1.0887 So the best result is currently L7_key40-val80 with 0.5% and 0.4% absolute improvement on eval2000 tg and fg. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu3fCGO3TKQZzLXy9A7j29xWD88uhks5sPQ2AgaJpZM4OK40U> .

hhadian · 2017-07-18T21:51:54Z

Will do.
Re train_dev, is it really just as important as eval2000? because my impression was that train_sev has a lot of speaker overlap and should be considered less important when tuning.

danpovey · 2017-07-18T21:54:26Z

I think even the test sets have speaker overlap on Switchboard. The only time I'd be concerned about speaker overlap is in things that are specifically about speaker adaptation. These changes are orthogonal to that and I'm more concerned about test-set noise due to limited size.

…

On Tue, Jul 18, 2017 at 5:52 PM, Hossein Hadian ***@***.***> wrote: Will do. Re train_dev, is it really just as important as eval2000? because my impression was that train_sev has a lot of speaker overlap and should be considered less important when tuning. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1731 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu88eq2uPTv1sa19mUcpeqLi8DBbYks5sPSj-gaJpZM4OK40U> .

Add/update attention recipes + minor xconfig update

danpovey · 2017-09-15T20:42:22Z

@hhadian, sorry, your authorship seems to have been lost by git due to the squash (I don't like to merge, except between versions of Kaldi). Next time you merge stuff, it will be to master anyway.

* 'master' of https://github.com/kaldi-asr/kaldi: (43 commits) [src,scripts,egs] Transfer learning for ASR with nnet3 (kaldi-asr#1633) [src,scripts,egs] Attention modeling, with example scripts (kaldi-asr#1731) [src] Fix bug in block matrix addition (thanks: Sidhi Adkoli). [egs] Fix inconseqential input-checking bug in Swbd example script (kaldi-asr#1886) [build] dependency-check: that python2.7 and python3 exist and 2.7 is default (kaldi-asr#1876) [scripts] A cosmetic change to info messages in chain training (kaldi-asr#1880) [doc] Keep tutorial code up to date (thanks: Luwei Yang) [scripts] Bug-fix in long-utterance-segmentation script (thanks: Armin Oliya) (kaldi-asr#1877) [egs] Fixed some issues in the multilingual BABEL example scripts (kaldi-asr#1850) [build] Cosmetic fix in Makefile Remove memory leaks and unused variables (when CUDA is not enabled) (kaldi-asr#1866) [scripts] Fix default for egs.cmd in nnet3 training scripts (kaldi-asr#1865) [doc] Fix to how documentation is built (thanks: David van Leeuwen) [scripts] Add --decode-extra-opts in steps/decode.sh (required for speech activity detection scripts) (kaldi-asr#1859) [src] Adding documentation for lattice discriminative training functions (kaldi-asr#1854) [src] Typo fixes in documenation. (kaldi-asr#1857) [egs] Update to score.sh in fisher_swbd setup, allow --iter option (kaldi-asr#1853) [scripts] bug-fix in TFRNNLM rescoring script (no 'ark' needed for unk.probs file) (kaldi-asr#1851) [src] Remove repeated parameter documentation. (kaldi-asr#1849) [egs] Aspire example scripts: Update autoencoder example to xconfig (kaldi-asr#1847) ...

…#1731)

danpovey added 4 commits June 29, 2017 14:50

Preliminary work on attention

a8a048c

[src] Make sure softmax-related functions can work in-place.

4a62208

Merge remote-tracking branch 'origin/in_place_softmax'

2264304

[src] further attention model work, still won'te compile.

28978eb

hhadian reviewed Jun 30, 2017

View reviewed changes

hhadian reviewed Jul 1, 2017

View reviewed changes

hhadian and others added 11 commits July 1, 2017 02:00

GetAttentionDotProducts and ApplyScalesToOutput tested and done-- App…

0cee9c2

…lyScalesToInput not done yet

Merge pull request #39 from hhadian/attention

d36225c

[WIP] [attention] add ApplyScalesToInput/Output functions and tests

Add ApplyScalesToInput + test

348ad98

Merge pull request #40 from hhadian/attention

f59a80e

Add ApplyScalesToInput + test

[src] Finish attention component and get it to compile (not tested yet)

739ff5c

[src] Make sure component is tested (failing. need lower-level tests.)

54aa27e

[src] Fix various bugs that came up while testing attention component

ecffd43

Add test template for AttentionForward/Backward

0c75eaa

Merge pull request #41 from hhadian/attention

f29f8a6

Add test template for AttentionForward/Backward

[src] Add derivative test code for attention code; fix bugs found.

9b157ac

[src] Fixes to address test failures

48a7c1d

hhadian and others added 5 commits July 4, 2017 02:13

Add Scale/Add/ZeroStats + xconfig scripts for Attention

227d215

Merge pull request #42 from hhadian/attention

1481377

Add Scale/Add/ZeroStats + xconfig scripts for Attention

A fix in Add function

9d8ea4e

Merge pull request #43 from hhadian/attention

1f35dce

A fix in Add function

[src] Fix small cosmetic issue

4d0750c

kkm000 reviewed Jul 6, 2017

View reviewed changes

hhadian and others added 3 commits July 6, 2017 21:10

Minor fix in 1a recipe + a fix to allow for bigger contexts

b660fba

Update recipe scripts

d919396

Merge pull request #44 from hhadian/attention

8166ae8

Update recipes + minor fix to allow for bigger contexts

hhadian and others added 4 commits September 15, 2017 14:35

Add recipes for swbd and tedlium

2e8ab42

Minor updates to attention xconfig

e292185

Move attention recipes to tuning directories

3fe698b

Merge pull request #46 from hhadian/attention

d14a083

Add/update attention recipes + minor xconfig update

danpovey changed the title ~~Preliminary work on attention model~~ Attention modeling, with example scripts Sep 15, 2017

Merge remote-tracking branch 'upstream/master' into attention

eee5e4f

danpovey merged commit d1016d8 into kaldi-asr:master Sep 15, 2017

vince62s mentioned this pull request Oct 17, 2017

Change log / Release #1943

Open

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[src,scripts,egs] Attention modeling, with example scripts (kaldi-asr…

fc31cc2

…#1731)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attention modeling, with example scripts #1731

Attention modeling, with example scripts #1731

Uh oh!

danpovey commented Jun 30, 2017

danpovey commented Jun 30, 2017

hhadian commented Jun 30, 2017

hhadian Jun 30, 2017

danpovey commented Jun 30, 2017 via email

hhadian commented Jun 30, 2017

hhadian commented Jun 30, 2017

danpovey commented Jun 30, 2017 via email

hhadian Jul 1, 2017

danpovey commented Jul 1, 2017 via email

hhadian Jul 1, 2017

danpovey commented Jul 1, 2017 via email

danpovey commented Jul 3, 2017

hhadian commented Jul 3, 2017

hhadian commented Jul 3, 2017

danpovey commented Jul 3, 2017 via email

danpovey commented Jul 4, 2017

hhadian commented Jul 4, 2017

kkm000 Jul 6, 2017

danpovey Jul 6, 2017

kkm000 Jul 6, 2017

hhadian commented Jul 15, 2017

danpovey commented Jul 15, 2017 via email

hhadian commented Jul 15, 2017

hhadian commented Jul 18, 2017

hhadian commented Jul 18, 2017

danpovey commented Jul 18, 2017 via email

hhadian commented Jul 18, 2017

danpovey commented Jul 18, 2017 via email

danpovey commented Sep 15, 2017

Labels

3 participants


		This function implements:

		A->Row(i) += alpha * C(i, j) * B.Row(i + j * row_shift).




		void TimeHeightConvolutionComponent::Check() {

Attention modeling, with example scripts #1731

Attention modeling, with example scripts #1731

Uh oh!

Conversation

danpovey commented Jun 30, 2017

danpovey commented Jun 30, 2017

hhadian commented Jun 30, 2017

hhadian Jun 30, 2017

Choose a reason for hiding this comment

danpovey commented Jun 30, 2017 via email

hhadian commented Jun 30, 2017

hhadian commented Jun 30, 2017

danpovey commented Jun 30, 2017 via email

hhadian Jul 1, 2017

Choose a reason for hiding this comment

danpovey commented Jul 1, 2017 via email

hhadian Jul 1, 2017

Choose a reason for hiding this comment

danpovey commented Jul 1, 2017 via email

danpovey commented Jul 3, 2017

hhadian commented Jul 3, 2017

hhadian commented Jul 3, 2017

danpovey commented Jul 3, 2017 via email

danpovey commented Jul 4, 2017

hhadian commented Jul 4, 2017

kkm000 Jul 6, 2017

Choose a reason for hiding this comment

danpovey Jul 6, 2017

Choose a reason for hiding this comment

kkm000 Jul 6, 2017

Choose a reason for hiding this comment

hhadian commented Jul 15, 2017

danpovey commented Jul 15, 2017 via email

hhadian commented Jul 15, 2017

hhadian commented Jul 18, 2017

hhadian commented Jul 18, 2017

danpovey commented Jul 18, 2017 via email

hhadian commented Jul 18, 2017

danpovey commented Jul 18, 2017 via email

danpovey commented Sep 15, 2017

Labels

3 participants