Releases · Lightning-AI/pytorch-lightning

30 Aug 12:29

Borda

2.0.8

8345689

Weekly patch release

App

Changed

Change top folder (#18212)
Remove _handle_is_headless calls in app run loop (#18362)

Fixed

refactor path to root preventing circular import (#18357)

Fabric

Changed

On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)

Fixed

Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
Removed false positive warning when using fabric.no_backward_sync with XLA strategies (#17761)
Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
Fixed FSDP full-precision param_dtype training (16-mixed, bf16-mixed and 32-true configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)

PyTorch

Changed

On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
Fix inefficiency in rich progress bar (#18369)

Fixed

Fixed FSDP full-precision param_dtype training (16-mixed and bf16-mixed configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
Fixed an issue that prevented the use of custom logger classes without an experiment property defined (#18093)
Fixed setting the tracking uri in MLFlowLogger for logging artifacts to the MLFlow server (#18395)
Fixed redundant iter() call to dataloader when checking dataloading configuration (#18415)
Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
Properly manage fetcher.done with dataloader_iter (#18376)

Contributors

@awaelchli, @Borda, @carmocca, @quintenroets, @rlizzo, @speediedan, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

awaelchli, Borda, and 5 other contributors

Assets 10

16 Aug 07:30

Borda

2.0.7

34c3fb9

Weekly patch release

App

Changed

Removed the top-level import lightning.pdb; import lightning.app.pdb instead (#18177)
Client retries forever (#18065)

Fixed

Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)

Fabric

Changed

Disabled the auto-detection of the Kubeflow environment (#18137)

Fixed

Fixed issue where DDP subprocesses that used Hydra would set hydra's working directory to current directory (#18145)
Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
Fixed an issue with Fabric.all_reduce() not performing an inplace operation for all backends consistently (#18235)

PyTorch

Added

Added LightningOptimizer.refresh() to update the __dict__ in case the optimizer it wraps has changed its internal state (#18280)

Changed

Disabled the auto-detection of the Kubeflow environment (#18137))

Fixed

Fixed a Missing folder exception when using a Google Storage URL as a default_root_dir (#18088)
Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
Fixed the gradient unscaling logic if the training step skipped backward (by returning None) (#18267)
Ensure that the closure running inside the optimizer step has gradients enabled, even if the optimizer step has it disabled (#18268)
Fixed an issue that could cause the LightningOptimizer wrapper returned by LightningModule.optimizers() have different internal state than the optimizer it wraps (#18280)

Contributors

@0x404, @awaelchli, @bilelomrani1, @Borda, @ethanwharris, @nisheethlahoti

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

nisheethlahoti, awaelchli, and 4 other contributors

Assets 10

24 Jul 21:36

Borda

2.0.6

ecffb2a

Minor patch release

2.0.6

App

Fixed handling a None request in the file orchestration queue (#18111)

Fabric

Fixed TensorBoardLogger.log_graph not unwrapping the _FabricModule (#17844)

PyTorch

LightningCLI not saving correctly seed_everything when run=True and seed_everything=True (#18056)
Fixed validation of non-PyTorch LR schedulers in manual optimization mode (#18092)
Fixed an attribute error for _FaultTolerantMode when loading an old checkpoint that pickled the enum (#18094)

Contributors

@awaelchli, @lantiga, @mauvilsa, @shihaoyin

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

lantiga, awaelchli, and 2 other contributors

Assets 10

10 Jul 16:09

Borda

2.0.5

e819a81

Minor patch release

App

Added

plugin: store source app (#17892)
added colocation identifier (#16796)
Added exponential backoff to HTTPQueue put (#18013)
Content for plugins (#17243)

Changed

Save a reference to created tasks, to avoid tasks disappearing (#17946)

Fabric

Added

Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)

Changed

Avoid info message when loading 0 entry point callbacks (#17990)

Fixed

Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)
Fixed check for FSDP's flat parameters in all parameter groups (#17914)
Fixed automatic step tracking in Fabric's CSVLogger (#17942)
Fixed an issue causing the torch.set_float32_matmul_precision info message to show multiple times (#17960)
Fixed loading model state when Fabric.load() is called after Fabric.setup() (#17997)

PyTorch

Fixed

Fixed delayed creation of experiment metadata and checkpoint/log dir name when using WandbLogger (#17818)
Fixed incorrect parsing of arguments when augmenting exception messages in DDP (#17948)
Fixed an issue causing the torch.set_float32_matmul_precision info message to show multiple times (#17960)
Added missing map_location argument for the LightningDataModule.load_from_checkpoint function (#17950)
Fix support for neptune-client (#17939)

Contributors

@anio, @awaelchli, @Borda, @ethanwharris, @lantiga, @nicolai86, @rjarun8, @schmidt-ai, @schuhschuh, @wouterzwerink, @yurijmikhalevich

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

nicolai86, lantiga, and 9 other contributors

Assets 10

22 Jun 18:23

Borda

2.0.4

9d749ec

Minor patch release

App

Fixed

bumped several dependencies to address security vulnerabilities.

Fabric

Fixed

Fixed validation of parameters of plugins.precision.MixedPrecision (#17687)
Fixed an issue with HPU imports leading to performance degradation (#17788)

PyTorch

Changed

Changes to the NeptuneLogger (#16761):
- It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the log() method with append() and extend().
- It now accepts a namespace Handler as an alternative to Run for the run argument. This means that you can call it NeptuneLogger(run=run["some/namespace"]) to log everything to the some/namespace/ location of the run.

Fixed

Fixed validation of parameters of plugins.precision.MixedPrecisionPlugin (#17687)
Fixed deriving default map location in LightningModule.load_from_checkpoint when there is an extra state (#17812)

Contributors

@akreuzer, @awaelchli, @Borda, @jerome-habana, @kshitij12345

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

awaelchli, Borda, and 3 other contributors

Assets 10

07 Jun 17:09

Borda

2.0.3

7e0b1a1

Minor patch release

App

Added

Added the property LightningWork.public_ip that exposes the public IP of the LightningWork instance (#17742)
Add missing python-multipart dependency (#17244)

Changed

Made type hints public (#17100)

Fixed

Fixed LightningWork.internal_ip that was mistakenly exposing the public IP instead; now exposes the private/internal IP address (#17742)
Fixed resolution of the latest version in CLI (#17351)
Fixed property raised instead of returned (#17595)
Fixed get project (#17617, #17666)

Fabric

Added

Added support for Callback registration through entry points (#17756)

Changed

Made type hints public (#17100)
Support compiling a module after it was set up by Fabric (#17529)

Fixed

Fixed computing the next version folder in CSVLogger (#17139)
Fixed inconsistent settings for FSDP Precision (#17670)

PyTorch

Changed

Made type hints public (#17100)

Fixed

CombinedLoader only starts DataLoader workers when necessary when operating in sequential mode (#17639)
Fixed a potential bug with uploading model checkpoints to Neptune.ai by uploading files from stream (#17430)
Fixed signature inspection of decorated hooks (#17507)
The WandbLogger no longer flattens dictionaries in the hyperparameters logged to the dashboard (#17574)
Fixed computing the next version folder in CSVLogger (#17139)
Fixed a formatting issue when the filename in ModelCheckpoint contained metrics that were substrings of each other (#17610)
Fixed WandbLogger ignoring the WANDB_PROJECT environment variable (#16222)
Fixed inconsistent settings for FSDP Precision (#17670)
Fixed an edge case causing overlapping samples in DDP when no global seed is set (#17713)
Fallback to module available check for mlflow (#17467)
Fixed LR finder max val batches (#17636)
Fixed multithreading checkpoint loading (#17678)

Contributors

@adamjstewart, @AleksanderWWW, @awaelchli, @baskrahmer, @bkiat1123, @Borda, @carmocca, @ethanwharris, @leng-yue, @lightningforever, @manangoel99, @mukhery, @Quasar-Kim, @water-vapor, @yurijmikhalevich

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

yurijmikhalevich, awaelchli, and 13 other contributors

Assets 10

24 Apr 13:56

Borda

2.0.2

682d7ef

Minor patch release: App jobs

App

Fixed

Resolved Lightning App with remote storage (#17426)
Fixed AppState, streamlit example (#17452)

Fabric

Changed

Enable precision autocast for LightningModule step methods in Fabric (#17439)

Fixed

Fixed an issue with LightningModule.*_step methods bypassing the DDP/FSDP wrapper (#17424)
Fixed device handling in Fabric.setup() when the model has no parameters (#17441)

PyTorch

Fixed

Fixed Model.load_from_checkpoint("checkpoint.ckpt", map_location=map_location) would always return model on CPU (#17308)
Fixed Sync module states during non-fit (#17370)
Fixed an issue that caused num_nodes not to be set correctly for FSDPStrategy (#17438)

Contributors

@awaelchli, @Borda, @carmocca, @ethanwharris, @ryan597, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

awaelchli, Borda, and 4 other contributors

Assets 10

12 Apr 15:31

Borda

1.9.5

a020506

Minor patch release

App

Changed

Added healthz endpoint to plugin server (#16882)
System customization syncing for jobs run (#16932)

Fabric

Changed

Let TorchCollective works on the torch.distributed WORLD process group by default (#16995)

Fixed

fixed for all _cuda_clearCublasWorkspaces on teardown (#16907)
Improved the error message for installing tensorboard or tensorboardx (#17053)

PyTorch

Changed

Changed to the NeptuneLogger (#16761):
- It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the log() method with append() and extend().
- It now accepts a namespace Handler as an alternative to Run for the run argument. This means that you can call it like NeptuneLogger(run=run["some/namespace"]) to log everything to the some/namespace/ location of the run.
Allow sys.argv and args in LightningCLI (#16808)
Moveed HPU broadcast override to the HPU strategy file (#17011)

Depercated

Removed registration of ShardedTensor state dict hooks in LightningModule.__init__ with torch>=2.1 (#16892)
Removed the lightning.pytorch.core.saving.ModelIO class interface (#16974)

Fixed

Fixed num_nodes not being set for DDPFullyShardedNativeStrategy (#17160)
Fixed parsing the precision config for inference in DeepSpeedStrategy (#16973)
Fixed the availability check for rich that prevented Lightning to be imported in Google Colab (#17156)
Fixed for all _cuda_clearCublasWorkspaces on teardown (#16907)
The psutil package is now required for CPU monitoring (#17010)
Improved the error message for installing tensorboard or tensorboardx (#17053)

Contributors

@awaelchli, @belerico, @carmocca, @colehawkins, @dmitsf, @Erotemic, @ethanwharris, @kshitij12345, @Borda

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

Erotemic, dmitsf, and 7 other contributors

Assets 10

11 Apr 18:43

awaelchli

2.0.1.post0

38933be

2.0.1 appendix

App

Fixed

Fix frontend hosts when running with multi-process in the cloud (#17324)

Fabric

No changes.

PyTorch

Fixed

Make the is_picklable function more robust (#17270)

Contributors

@eng-yue @ethanwharris @Borda @awaelchli @carmocca

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

awaelchli, Borda, and 2 other contributors

Assets 10

30 Mar 14:45

carmocca

2.0.1

06032e8

2.0.1 patch release

App

No changes

Fabric

Changed

Generalized Optimizer validation to accommodate both FSDP 1.x and 2.x (#16733)

PyTorch

Changed

Pickling the LightningModule no longer pickles the Trainer (#17133)
Generalized Optimizer validation to accommodate both FSDP 1.x and 2.x (#16733)
Disable torch.inference_mode with torch.compile in PyTorch 2.0 (#17215)

Fixed

Fixed issue where pickling the module instance would fail with a DataLoader error (#17130)
Fixed WandbLogger not showing "best" aliases for model checkpoints when ModelCheckpoint(save_top_k>0) is used (#17121)
Fixed the availability check for rich that prevented Lightning to be imported in Google Colab (#17156)
Fixed parsing the precision config for inference in DeepSpeedStrategy (#16973)
Fixed issue where torch.compile would fail when logging to WandB (#17216)

Contributors

@Borda @williamFalcon @lightningforever @adamjstewart @carmocca @tshu-w @saryazdi @parambharat @awaelchli @colehawkins @woqidaideshi @md-121 @yhl48 @gkroiz @idc9 @speediedan

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

idc9, williamFalcon, and 14 other contributors

Assets 10

Releases: Lightning-AI/pytorch-lightning

Weekly patch release

App

Changed

Fixed

Fabric

Changed

Fixed

PyTorch

Changed

Fixed

Contributors

Contributors

Uh oh!

Weekly patch release

App

Changed

Fixed

Fabric

Changed

Fixed

PyTorch

Added

Changed

Fixed

Contributors

Contributors

Uh oh!

Minor patch release

2.0.6

App

Fabric

PyTorch

Contributors

Contributors

Uh oh!

Minor patch release

App

Added

Changed

Fabric

Added

Changed

Fixed

PyTorch

Fixed

Contributors

Contributors

Uh oh!

Minor patch release

App

Fixed

Fabric

Fixed

PyTorch

Changed

Fixed

Contributors

Contributors

Uh oh!

Minor patch release

App

Added

Changed

Fixed

Fabric

Added

Changed

Fixed

PyTorch

Changed

Fixed

Contributors

Contributors

Uh oh!

Minor patch release: App jobs

App

Fixed

Fabric

Changed