-
- Notifications
You must be signed in to change notification settings - Fork 656
Open
Labels
Description
🚀 Feature
Pytorch lightning recently added native support for MS DeepSpeed.
I believe it is also helpful for users if ignite incorporates the DeepSpeed pipeline for memory-efficient distributed training.
1. for idist.auto_model ..?
To initialize the DeepSpeed engine:
model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args, model=model, model_parameters=params) And for distributed environment setup, we need to replace torch.distributed.init_process_group(...) to deepspeed.init_distributed()
2. checkpoint handler
slightly different thing for checkpointing
model_engine.save_checkpoint(args.save_dir, ckpt_id, client_sd = client_sd) HMJiangGatech, secutron and JalinWang