Skip to content

Update docs to be clear on --gpus behaviour. #563

@jeffling

Description

@jeffling

Final resolution:

The resolution then should be alternative 1, since we agree that don't want to get rid of the 'number of gpus' functionality (which was the original proposed aggressive solution).

If we detect --gpus 0 with int, a warning should suffice alongside updated docs.


Is your feature request related to a problem? Please describe.
Trainer.gpus can currently be used to specify a number of GPUs or specific GPUs to run on. This makes values like

0 (run on CPU), "0" (Run on GPU 0), [0] (run on GPU 0)

confusing for newcomers.

Describe the solution you'd like
As an aggressive solution to this issue, we move to have gpus always specify specific GPUs as that is the more encompassing case. Going forward, we can put a deprecation notice up when a single int is passed in:

"In the future, gpus to specify specific GPUs the model will run on. If you would like to run on CPU, pass in None or an empty list."

Then, in the next breaking version, we can simplify the behaviour.

Describe alternatives you've considered

  1. Keep as is: This is a viable solution. We could just document more carefully. However, anecdotally, this is confusing for our team and most likely other users.
  2. Have gpus mean number of GPUs: There are many cases where researchers need to run multiple experiments on the same time on a multi-gpu machine. Being able to specify which GPU easily would be useful. As an argument for this, one could use 'CUDA_VISIBLE_DEVICES' to do this.
  3. Create a new num_gpus argument: This could make it self-documenting and allow for both workflows. However, it will be an additional argument to maintain.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementhelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions