Releases: Lightning-AI/pytorch-lightning
Weekly patch release
App
Changed
Fixed
- refactor path to root preventing circular import (#18357)
 
Fabric
Changed
- On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
 
Fixed
- Fixed model parameters getting shared between processes when running with 
strategy="ddp_spawn"andaccelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238) - Removed false positive warning when using 
fabric.no_backward_syncwith XLA strategies (#17761) - Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
 - Fixed FSDP full-precision 
param_dtypetraining (16-mixed,bf16-mixedand32-trueconfigurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278) 
PyTorch
Changed
- On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
 - Fix inefficiency in rich progress bar (#18369)
 
Fixed
- Fixed FSDP full-precision 
param_dtypetraining (16-mixedandbf16-mixedconfigurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278) - Fixed an issue that prevented the use of custom logger classes without an 
experimentproperty defined (#18093) - Fixed setting the tracking uri in 
MLFlowLoggerfor logging artifacts to the MLFlow server (#18395) - Fixed redundant 
iter()call to dataloader when checking dataloading configuration (#18415) - Fixed model parameters getting shared between processes when running with 
strategy="ddp_spawn"andaccelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238) - Properly manage 
fetcher.donewithdataloader_iter(#18376) 
Contributors
@awaelchli, @Borda, @carmocca, @quintenroets, @rlizzo, @speediedan, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Weekly patch release
App
Changed
- Removed the top-level import 
lightning.pdb; importlightning.app.pdbinstead (#18177) - Client retries forever (#18065)
 
Fixed
- Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
 
Fabric
Changed
- Disabled the auto-detection of the Kubeflow environment (#18137)
 
Fixed
- Fixed issue where DDP subprocesses that used Hydra would set hydra's working directory to current directory (#18145)
 - Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
 - Fixed an issue with 
Fabric.all_reduce()not performing an inplace operation for all backends consistently (#18235) 
PyTorch
Added
- Added 
LightningOptimizer.refresh()to update the__dict__in case the optimizer it wraps has changed its internal state (#18280) 
Changed
- Disabled the auto-detection of the Kubeflow environment (#18137))
 
Fixed
- Fixed a 
Missing folderexception when using a Google Storage URL as adefault_root_dir(#18088) - Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
 - Fixed the gradient unscaling logic if the training step skipped backward (by returning 
None) (#18267) - Ensure that the closure running inside the optimizer step has gradients enabled, even if the optimizer step has it disabled (#18268)
 - Fixed an issue that could cause the 
LightningOptimizerwrapper returned byLightningModule.optimizers()have different internal state than the optimizer it wraps (#18280) 
Contributors
@0x404, @awaelchli, @bilelomrani1, @Borda, @ethanwharris, @nisheethlahoti
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
2.0.6
App
- Fixed handling a 
Nonerequest in the file orchestration queue (#18111) 
Fabric
- Fixed 
TensorBoardLogger.log_graphnot unwrapping the_FabricModule(#17844) 
PyTorch
LightningCLInot saving correctlyseed_everythingwhenrun=Trueandseed_everything=True(#18056)- Fixed validation of non-PyTorch LR schedulers in manual optimization mode (#18092)
 - Fixed an attribute error for 
_FaultTolerantModewhen loading an old checkpoint that pickled the enum (#18094) 
Contributors
@awaelchli, @lantiga, @mauvilsa, @shihaoyin
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
App
Added
- plugin: store source app (#17892)
 - added colocation identifier (#16796)
 - Added exponential backoff to HTTPQueue put (#18013)
 - Content for plugins (#17243)
 
Changed
- Save a reference to created tasks, to avoid tasks disappearing (#17946)
 
Fabric
Added
- Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)
 
Changed
- Avoid info message when loading 0 entry point callbacks (#17990)
 
Fixed
- Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)
 - Fixed check for FSDP's flat parameters in all parameter groups (#17914)
 - Fixed automatic step tracking in Fabric's CSVLogger (#17942)
 - Fixed an issue causing the 
torch.set_float32_matmul_precisioninfo message to show multiple times (#17960) - Fixed loading model state when 
Fabric.load()is called afterFabric.setup()(#17997) 
PyTorch
Fixed
- Fixed delayed creation of experiment metadata and checkpoint/log dir name when using 
WandbLogger(#17818) - Fixed incorrect parsing of arguments when augmenting exception messages in DDP (#17948)
 - Fixed an issue causing the 
torch.set_float32_matmul_precisioninfo message to show multiple times (#17960) - Added missing 
map_locationargument for theLightningDataModule.load_from_checkpointfunction (#17950) - Fix support for 
neptune-client(#17939) 
Contributors
@anio, @awaelchli, @Borda, @ethanwharris, @lantiga, @nicolai86, @rjarun8, @schmidt-ai, @schuhschuh, @wouterzwerink, @yurijmikhalevich
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
App
Fixed
- bumped several dependencies to address security vulnerabilities.
 
Fabric
Fixed
- Fixed validation of parameters of 
plugins.precision.MixedPrecision(#17687) - Fixed an issue with HPU imports leading to performance degradation (#17788)
 
PyTorch
Changed
- Changes to the 
NeptuneLogger(#16761):- It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the 
log()method withappend()andextend(). - It now accepts a namespace 
Handleras an alternative toRunfor therunargument. This means that you can call itNeptuneLogger(run=run["some/namespace"])to log everything to thesome/namespace/location of the run. 
 - It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the 
 
Fixed
- Fixed validation of parameters of 
plugins.precision.MixedPrecisionPlugin(#17687) - Fixed deriving default map location in 
LightningModule.load_from_checkpointwhen there is an extra state (#17812) 
Contributors
@akreuzer, @awaelchli, @Borda, @jerome-habana, @kshitij12345
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
App
Added
- Added the property 
LightningWork.public_ipthat exposes the public IP of theLightningWorkinstance (#17742) - Add missing python-multipart dependency (#17244)
 
Changed
- Made type hints public (#17100)
 
Fixed
- Fixed 
LightningWork.internal_ipthat was mistakenly exposing the public IP instead; now exposes the private/internal IP address (#17742) - Fixed resolution of the latest version in CLI (#17351)
 - Fixed property raised instead of returned (#17595)
 - Fixed get project (#17617, #17666)
 
Fabric
Added
- Added support for 
Callbackregistration through entry points (#17756) 
Changed
Fixed
- Fixed computing the next version folder in 
CSVLogger(#17139) - Fixed inconsistent settings for FSDP Precision (#17670)
 
PyTorch
Changed
- Made type hints public (#17100)
 
Fixed
CombinedLoaderonly starts DataLoader workers when necessary when operating in sequential mode (#17639)- Fixed a potential bug with uploading model checkpoints to Neptune.ai by uploading files from stream (#17430)
 - Fixed signature inspection of decorated hooks (#17507)
 - The 
WandbLoggerno longer flattens dictionaries in the hyperparameters logged to the dashboard (#17574) - Fixed computing the next version folder in 
CSVLogger(#17139) - Fixed a formatting issue when the filename in 
ModelCheckpointcontained metrics that were substrings of each other (#17610) - Fixed 
WandbLoggerignoring theWANDB_PROJECTenvironment variable (#16222) - Fixed inconsistent settings for FSDP Precision (#17670)
 - Fixed an edge case causing overlapping samples in DDP when no global seed is set (#17713)
 - Fallback to module available check for mlflow (#17467)
 - Fixed LR finder max val batches (#17636)
 - Fixed multithreading checkpoint loading (#17678)
 
Contributors
@adamjstewart, @AleksanderWWW, @awaelchli, @baskrahmer, @bkiat1123, @Borda, @carmocca, @ethanwharris, @leng-yue, @lightningforever, @manangoel99, @mukhery, @Quasar-Kim, @water-vapor, @yurijmikhalevich
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release: App jobs
App
Fixed
Fabric
Changed
- Enable precision autocast for LightningModule step methods in Fabric (#17439)
 
Fixed
- Fixed an issue with 
LightningModule.*_stepmethods bypassing the DDP/FSDP wrapper (#17424) - Fixed device handling in 
Fabric.setup()when the model has no parameters (#17441) 
PyTorch
Fixed
- Fixed 
Model.load_from_checkpoint("checkpoint.ckpt", map_location=map_location)would always return model on CPU (#17308) - Fixed Sync module states during non-fit (#17370)
 - Fixed an issue that caused 
num_nodesnot to be set correctly forFSDPStrategy(#17438) 
Contributors
@awaelchli, @Borda, @carmocca, @ethanwharris, @ryan597, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
App
Changed
Fabric
Changed
- Let 
TorchCollectiveworks on thetorch.distributedWORLD process group by default (#16995) 
Fixed
- fixed for all 
_cuda_clearCublasWorkspaceson teardown (#16907) - Improved the error message for installing tensorboard or tensorboardx (#17053)
 
PyTorch
Changed
- Changed to the 
NeptuneLogger(#16761):- It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the 
log()method withappend()andextend(). - It now accepts a namespace 
Handleras an alternative toRunfor therunargument. This means that you can call it likeNeptuneLogger(run=run["some/namespace"])to log everything to thesome/namespace/location of the run. 
 - It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the 
 - Allow 
sys.argvand args inLightningCLI(#16808) - Moveed HPU broadcast override to the HPU strategy file (#17011)
 
Depercated
- Removed registration of 
ShardedTensorstate dict hooks inLightningModule.__init__withtorch>=2.1(#16892) - Removed the 
lightning.pytorch.core.saving.ModelIOclass interface (#16974) 
Fixed
- Fixed 
num_nodesnot being set forDDPFullyShardedNativeStrategy(#17160) - Fixed parsing the precision config for inference in 
DeepSpeedStrategy(#16973) - Fixed the availability check for 
richthat prevented Lightning to be imported in Google Colab (#17156) - Fixed for all 
_cuda_clearCublasWorkspaceson teardown (#16907) - The 
psutilpackage is now required for CPU monitoring (#17010) - Improved the error message for installing tensorboard or tensorboardx (#17053)
 
Contributors
@awaelchli, @belerico, @carmocca, @colehawkins, @dmitsf, @Erotemic, @ethanwharris, @kshitij12345, @Borda
If we forgot someone due to not matching commit email with GitHub account, let us know :]
2.0.1 appendix
App
Fixed
- Fix frontend hosts when running with multi-process in the cloud (#17324)
 
Fabric
No changes.
PyTorch
Fixed
- Make the 
is_picklablefunction more robust (#17270) 
Contributors
@eng-yue @ethanwharris @Borda @awaelchli @carmocca
If we forgot someone due to not matching commit email with GitHub account, let us know :]
2.0.1 patch release
App
No changes
Fabric
Changed
- Generalized 
Optimizervalidation to accommodate both FSDP 1.x and 2.x (#16733) 
PyTorch
Changed
- Pickling the 
LightningModuleno longer pickles theTrainer(#17133) - Generalized 
Optimizervalidation to accommodate both FSDP 1.x and 2.x (#16733) - Disable 
torch.inference_modewithtorch.compilein PyTorch 2.0 (#17215) 
Fixed
- Fixed issue where pickling the module instance would fail with a DataLoader error (#17130)
 - Fixed WandbLogger not showing "best" aliases for model checkpoints when 
ModelCheckpoint(save_top_k>0)is used (#17121) - Fixed the availability check for 
richthat prevented Lightning to be imported in Google Colab (#17156) - Fixed parsing the precision config for inference in 
DeepSpeedStrategy(#16973) - Fixed issue where 
torch.compilewould fail when logging to WandB (#17216) 
Contributors
@Borda @williamFalcon @lightningforever @adamjstewart @carmocca @tshu-w @saryazdi @parambharat @awaelchli @colehawkins @woqidaideshi @md-121 @yhl48 @gkroiz @idc9 @speediedan
If we forgot someone due to not matching commit email with GitHub account, let us know :]