Finding Good Learning Rate For Different Values of Depth, Heads

I've discovered that dalle-pytorch is somehow resilient to a variety of learning rates somehow. Use 3e-4 unless you have reason otherwise. This information here is incorrect.

Most important result I've found is that a learning rate of 4e-4 to 5e-4 works better than 3e-4 for depth >= 26. Increase the default when training with higher depth!

I had access to two A100's with 40 GiB of VRAM yesterday so I did a "hyperparameter sweep" with Weights and Biases.

I only chose three parameters to tune: learning rate, depth and heads.

Wandb ran the first 1200 iterations of a training session 48 times while varying those values. Here are the results:

https://wandb.ai/afiaka87/hp_tuning/reports/DALLE-Pytorch-Sweep-over-Learning-Rate-Depth-and-Heads--Vmlldzo1Mjg4Mjk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finding Good Learning Rate For Different Values of Depth, Heads #84

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Finding Good Learning Rate For Different Values of Depth, Heads #84

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions