Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Discover and integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
state:open label:checkpointing
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
Lightning-AI
/
pytorch-lightning
Public
Notifications
You must be signed in to change notification settings
Fork
3.6k
Star
30.4k
Code
Issues
810
Pull requests
87
Discussions
Actions
Projects
0
Wiki
Security
Uh oh!
There was an error while loading.
Please reload this page
.
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights
Issues
Search Issues
state
:
open
label
:
checkpointing
state:open label:checkpointing
Search
Labels
Milestones
New issue
Search results
Open
Closed
Training loss will increase drastically when resume from ckpts after adopting self-defined sampler in multi-GPU experiments [Fabric]
accelerator: cuda
Compute Unified Device Architecture GPU
Compute Unified Device Architecture GPU
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
fabric
lightning.fabric.Fabric
lightning.fabric.Fabric
ver: 2.3.x
Status: Open.
#21312
In Lightning-AI/pytorch-lightning;
·
Moondok
opened
on Oct 24, 2025
LightingModel.save_checkpoint()
checkpointing
Related to checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
Is an improvement or enhancement
Status: Open.
#21280
In Lightning-AI/pytorch-lightning;
·
Karesto
opened
on Oct 10, 2025
[CLI] predict(ckpt_path="best") raises ValueError even though ModelCheckpoint is configured under trainer.callbacks
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
lightningcli
pl.cli.LightningCLI
pl.cli.LightningCLI
trainer: predict
ver: 2.5.x
Status: Open.
#21254
In Lightning-AI/pytorch-lightning;
·
ilinvai
opened
on Sep 29, 2025
Directly save checkpoint to specified disk location
checkpointing
Related to checkpointing
Related to checkpointing
question
Further information is requested
Further information is requested
Status: Open.
#21253
In Lightning-AI/pytorch-lightning;
·
HarveyYan
opened
on Sep 29, 2025
Feature request: Support splitting model weights and training states into separate checkpoint files
callback: model checkpoint
checkpointing
Related to checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
Is an improvement or enhancement
Status: Open.
#21170
In Lightning-AI/pytorch-lightning;
·
yilin404
opened
on Sep 7, 2025
FSDP with custom state_dict()
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
strategy: fsdp
Fully Sharded Data Parallel
Fully Sharded Data Parallel
ver: 2.5.x
waiting on author
Waiting on user action, correction, or update
Waiting on user action, correction, or update
Status: Open.
#21124
In Lightning-AI/pytorch-lightning;
·
LiuTaowen-Tony
opened
on Aug 28, 2025
Can't run a model trained in MPS-system on CPU-only system.
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
ver: 2.5.x
waiting on author
Waiting on user action, correction, or update
Waiting on user action, correction, or update
Status: Open.
#21094
In Lightning-AI/pytorch-lightning;
·
obbiondo
opened
on Aug 18, 2025
Resume training from checkpoint that only save trainable parameters
checkpointing
Related to checkpointing
Related to checkpointing
feature
Is an improvement or enhancement
Is an improvement or enhancement
Status: Open.
#21053
In Lightning-AI/pytorch-lightning;
·
jasonrichdarmawan
opened
on Aug 11, 2025
With automatic_optimization disabled and checkpointing every n steps, the best checkpointed model is the model obtained after backpropagation and not the one used for computing the loss
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
ver: 2.5.x
Status: Open.
#20947
In Lightning-AI/pytorch-lightning;
·
Yann-CV
opened
on Jun 26, 2025
fabric FSDP strategy save/load checkpoint does not support s3 url
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
fabric
lightning.fabric.Fabric
lightning.fabric.Fabric
strategy: fsdp
Fully Sharded Data Parallel
Fully Sharded Data Parallel
ver: 2.5.x
Status: Open.
#20749
In Lightning-AI/pytorch-lightning;
·
likesum
opened
on Apr 24, 2025
Progress bar is broken when loading trainer state from checkpoint
bug
Something isn't working
Something isn't working
checkpointing
Related to checkpointing
Related to checkpointing
progress bar: tqdm
ver: 2.5.x
Status: Open.
#20603
In Lightning-AI/pytorch-lightning;
·
JLenzy
opened
on Feb 25, 2025
Error learning rate when load ckpt for cotinue training if check_val_every_n_epoch > 1
bug
Something isn't working
Something isn't working
callback: lr monitor
checkpointing
Related to checkpointing
Related to checkpointing
ver: 2.4.x
Status: Open.
#20495
In Lightning-AI/pytorch-lightning;
·
razgzy
opened
on Dec 13, 2024
You can’t perform that action at this time.