- Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Final resolution:
The resolution then should be alternative 1, since we agree that don't want to get rid of the 'number of gpus' functionality (which was the original proposed aggressive solution).
If we detect --gpus 0 with int, a warning should suffice alongside updated docs.
Is your feature request related to a problem? Please describe.
Trainer.gpus can currently be used to specify a number of GPUs or specific GPUs to run on. This makes values like
0 (run on CPU), "0" (Run on GPU 0), [0] (run on GPU 0)
confusing for newcomers.
Describe the solution you'd like
As an aggressive solution to this issue, we move to have gpus always specify specific GPUs as that is the more encompassing case. Going forward, we can put a deprecation notice up when a single int is passed in:
"In the future,
gpusto specify specific GPUs the model will run on. If you would like to run on CPU, pass in None or an empty list."
Then, in the next breaking version, we can simplify the behaviour.
Describe alternatives you've considered
- Keep as is: This is a viable solution. We could just document more carefully. However, anecdotally, this is confusing for our team and most likely other users.
- Have
gpusmean number of GPUs: There are many cases where researchers need to run multiple experiments on the same time on a multi-gpu machine. Being able to specify which GPU easily would be useful. As an argument for this, one could use 'CUDA_VISIBLE_DEVICES' to do this. - Create a new
num_gpusargument: This could make it self-documenting and allow for both workflows. However, it will be an additional argument to maintain.
Additional context