Code to reproduce the experiments reported in this paper:
Jianyu Wang, Hao Liang, Gauri Joshi, "Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD," ICASSP 2020. (arXiv)
This repo contains the implementations of the following algorithms:
- Local SGD Stich ICLR 2018, Yu et al. AAAI 2019, Wang and Joshi 2018
- Overlap-Local-SGD (proposed in this paper)
- Elastic Averaging SGD Zhang et al. NeurIPS 2015
- CoCoD-SGD Shen et al. IJCAI 2019
- Blockwise Model-update Filtering (BMUF) Chen and Huo ICASSP 2016, also equivalent to SlowMo-Local SGD.
Please cite this paper if you use this code for your research/projects.
The code runs on Python 3.5 with PyTorch 1.0.0 and torchvision 0.2.1. The non-blocking communication is implemented using Python threading package.
We implement all the above mentioned algorithms as subclasses of torch.optim.optimizer. A typical usage is shown as follows:
import distoptim # Before training # define the optimizer # One can use: 1) LocalSGD (including BMUF); 2) OverlapLocalSGD; # 3) EASGD; 4) CoCoDSGD # tau is the number of local updates / communication period optimizer = distoptim.SELECTED_OPTIMIZER(tau) ...... # define model, criterion, logging, etc.. # Start training for batch_id, (data, label) in enumerate(data_loader): # same as serial training output = model(data) # forward loss = criterion(output, label) loss.backward() # backward optimizer.step() # gradient step optimizer.zero_grad() # additional line to average local models at workers # communication happens after every tau iterations # optimizer has its own iteration counter inside optimizer.average()In addition, one need to initialize the process group as described in this documentation. In our private cluster, each machine has one GPU.
# backend = gloo or nccl # rank: 0,1,2,3,... # size: number of workers # h0 is the host name of worker0, you need to change it torch.distributed.init_process_group(backend=args.backend, init_method='tcp://h0:22000', rank=args.rank, world_size=args.size)@article{wang2020overlap, title={Overlap Local-{SGD}: An Algorithmic Approach to Hide Communication Delays in Distributed {SGD}}, author={Wang, Jianyu and Liang, Hao and Joshi, Gauri}, journal={arXiv preprint arXiv:2002.09539}, year={2020} }