Skip to content

Finding Good Learning Rate For Different Values of Depth, Heads #84

@afiaka87

Description

@afiaka87

I've discovered that dalle-pytorch is somehow resilient to a variety of learning rates somehow. Use 3e-4 unless you have reason otherwise. This information here is incorrect.

Most important result I've found is that a learning rate of 4e-4 to 5e-4 works better than 3e-4 for depth >= 26. Increase the default when training with higher depth!

I had access to two A100's with 40 GiB of VRAM yesterday so I did a "hyperparameter sweep" with Weights and Biases.

I only chose three parameters to tune: learning rate, depth and heads.

Wandb ran the first 1200 iterations of a training session 48 times while varying those values. Here are the results:

https://wandb.ai/afiaka87/hp_tuning/reports/DALLE-Pytorch-Sweep-over-Learning-Rate-Depth-and-Heads--Vmlldzo1Mjg4Mjk

loss-over-time-and-importance
all_params

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions