Run in Google Colab Download as Jupyter Notebook View on GitHub

How to use FastaiLRFinder with Ignite

This how-to guide demonstrates how we can leverage the FastaiLRFinder handler to find an optimal learning rate to train our model on. We will compare the results produced with and without using the handler for better understanding.

In this example, we will be using a ResNet18 model on the MNIST dataset. The base code is the same as used in the Getting Started Guide.

Basic Setup

import torch import torch.nn as nn from torch.utils.data import DataLoader from torchvision.datasets import MNIST from torchvision.models import resnet18 from torchvision.transforms import Compose, Normalize, ToTensor  from ignite.engine import create_supervised_trainer, create_supervised_evaluator from ignite.metrics import Accuracy, Loss from ignite.handlers import FastaiLRFinder

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")   class Net(nn.Module):  def __init__(self):  super(Net, self).__init__()   self.model = resnet18(num_classes=10)  self.model.conv1 = nn.Conv2d(  1, 64, kernel_size=3, padding=1, bias=False  )   def forward(self, x):  return self.model(x)   model = Net().to(device)  data_transform = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))])  train_loader = DataLoader(  MNIST(download=True, root=".", transform=data_transform, train=True),  batch_size=128,  shuffle=True, )  test_loader = DataLoader(  MNIST(download=True, root=".", transform=data_transform, train=False),  batch_size=256,  shuffle=False, )   model = Net().to(device) optimizer = torch.optim.RMSprop(model.parameters(), lr=1e-06) criterion = nn.CrossEntropyLoss()

We will first train the model with a fixed learning rate (lr) of 1e-06 and inspect our results. Let’s save the initial state of the model and the optimizer to restore them later for comparison.

init_model_state = model.state_dict() init_opt_state = optimizer.state_dict()

Without LR Finder

trainer = create_supervised_trainer(model, optimizer, criterion, device=device)  trainer.run(train_loader, max_epochs=3)

State:	iteration: 1407	epoch: 3	epoch_length: 469	max_epochs: 3	output: 0.5554001927375793	batch: <class 'list'>	metrics: <class 'dict'>	dataloader: <class 'torch.utils.data.dataloader.DataLoader'>	seed: <class 'NoneType'>	times: <class 'dict'>

evaluator = create_supervised_evaluator(  model, metrics={"Accuracy": Accuracy(), "Loss": Loss(criterion)}, device=device ) evaluator.run(test_loader)  print(evaluator.state.metrics)

{'Accuracy': 0.8655, 'Loss': 0.602867822265625}

Let’s see how we can achieve better results by using the FastaiLRFinder handler. But first, let’s restore the initial state of the model and optimizer so we can re-train them from scratch.

model.load_state_dict(init_model_state) optimizer.load_state_dict(init_opt_state)

With LR Finder

When attached to the trainer, this handler follows the same procedure used by fastai. The model is trained for num_iter iterations while the learning rate is increased from start_lr (defaults to initial value specified by the optimizer, here 1e-06) to the upper bound called end_lr. This increase can be linear (step_mode="linear") or exponential (step_mode="exp"). The default step_mode is exponential which is recommended for larger learning rate ranges while linear provides good results for small ranges.

lr_finder = FastaiLRFinder()  # To restore the model's and optimizer's states after running the LR Finder to_save = {"model": model, "optimizer": optimizer}  with lr_finder.attach(trainer, to_save, end_lr=1e-02) as trainer_with_lr_finder:  trainer_with_lr_finder.run(train_loader)

Let’s plot how the learning rate changes within our specified range and print the suggested learning rate.

lr_finder.plot()  print("Suggested LR", lr_finder.lr_suggestion())

png

Suggested LR 1.0148376909312998e-05

Now we will apply the suggested learning rate to the optimizer, and train the model again with optimal learning rate.

lr_finder.apply_suggested_lr(optimizer) print(optimizer.param_groups[0]["lr"])

1.0148376909312998e-05

trainer.run(train_loader, max_epochs=3)

State:	iteration: 1407	epoch: 3	epoch_length: 469	max_epochs: 3	output: 0.09644963592290878	batch: <class 'list'>	metrics: <class 'dict'>	dataloader: <class 'torch.utils.data.dataloader.DataLoader'>	seed: <class 'NoneType'>	times: <class 'dict'>

# Calculate the new metrics after using the optimal lr evaluator.run(test_loader) print(evaluator.state.metrics)

{'Accuracy': 0.9715, 'Loss': 0.0908882568359375}

As we saw the accuracy increased and loss decreased on the test dataset when we trained our model for the same number of epochs with an optimal learning rate.

#lr finder

How to do time profiling

How to effectively increase batch size on limited compute