Skip to content

Conversation

littlebullGit
Copy link
Contributor

@littlebullGit littlebullGit commented Sep 22, 2025

This PR enhances callback to properly handle manual optimization scenarios, ensuring checkpoints reflect the intended model state and providing clear user guidance.

Fixes #20947

Key Changes:

  • Manual Optimization Support: Ensures checkpoints capture the model state before optimization when using manual optimization with every_n_train_steps.
  • User Warning: Adds a clear warning when pre-optimization state isn't saved, helping users understand the checkpoint behavior.
  • Documentation: Updates docstrings and examples to clarify the behavior with manual optimization.

Testing:

  • Added test cases to verify checkpoint behavior with manual optimization
  • Ensured backward compatibility with automatic optimization
  • Verified warning messages are shown in appropriate scenarios

📚 Documentation preview 📚: https://pytorch-lightning--21239.org.readthedocs.build/en/21239/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Sep 22, 2025
@littlebullGit littlebullGit force-pushed the fix/20947-checkpoint-manual-opt branch from 4f06495 to 16552e5 Compare September 22, 2025 03:53
@Borda Borda changed the title Fix ModelCheckpoint with manual optimization and every_n_train_steps Fix ModelCheckpoint with manual optimization and every_n_train_steps Sep 22, 2025
@littlebullGit littlebullGit reopened this Sep 22, 2025
@littlebullGit
Copy link
Contributor Author

littlebullGit commented Sep 23, 2025

The link error is (generated/CONTRIBUTING: line 6) broken https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a - 429 Client Error: Too Many Requests for url: https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a. Not related to my code. The other one is just timed out.
@Borda @SkafteNicki let me know how to proceed.

@SkafteNicki
Copy link
Collaborator

The link error is (generated/CONTRIBUTING: line 6) broken https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a - 429 Client Error: Too Many Requests for url: https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a. Not related to my code. The other one is just timed out.
@Borda @SkafteNicki let me know how to proceed.

Our CI is broken at the moment, nothing you can do. Please stand by while it being fixed.

@littlebullGit littlebullGit force-pushed the fix/20947-checkpoint-manual-opt branch from 9754a79 to 059625b Compare September 28, 2025 14:41
- Ensure checkpoints reflect the model state before optimization when using manual optimization - Add warning when pre-optimization state isn't saved - Update documentation to clarify the behavior with manual optimization Fixes Lightning-AI#20947
@littlebullGit littlebullGit force-pushed the fix/20947-checkpoint-manual-opt branch from 059625b to 7672de3 Compare October 1, 2025 12:48
@littlebullGit
Copy link
Contributor Author

@SkafteNicki @justusschock can we try CI again ? the last run seems fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pl Generic label for PyTorch Lightning package

3 participants