- Notifications
You must be signed in to change notification settings - Fork 647
Closed
Description
I've discovered that dalle-pytorch is somehow resilient to a variety of learning rates somehow. Use 3e-4 unless you have reason otherwise. This information here is incorrect.
Most important result I've found is that a learning rate of 4e-4 to 5e-4 works better than 3e-4 for depth >= 26. Increase the default when training with higher depth!
I had access to two A100's with 40 GiB of VRAM yesterday so I did a "hyperparameter sweep" with Weights and Biases.
I only chose three parameters to tune: learning rate, depth and heads.
Wandb ran the first 1200 iterations of a training session 48 times while varying those values. Here are the results:
Metadata
Metadata
Assignees
Labels
No labels

