Bad convergence when using momentum optimizer

The contrast training experiment shows the DeepASR model converges well when the Adam optimizer is used #676. But when changed to the momentum optimizer, the convergence turns out to be bad. There should be some problems in the implementation of the momentum optimizer in Fluid.

Here is the comparsion of training accuracy on 4 GPUs between Fluid and Houyi with the same setting:

Parameters:

batch_size: 128 device: GPU hidden_dim: 1024 learning_rate: 0.00016 minimum_batch_size: 1 parallel: True proj_dim: 512 stacked_num: 5 momentum: 0.9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bad convergence when using momentum optimizer #696

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Bad convergence when using momentum optimizer #696

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions