Skip to content

Conversation

@rlplays
Copy link

@rlplays rlplays commented Oct 18, 2025

To use this: In your config.ini, set backend=Multithreading instead of Multiprocessing under [vec] to take advantage of it. For envs that are CPU deep (i.e. envs that do a lot of computation per c_step) but not GPU wide (i.e. not too many params), this offers a nice speed up anywhere from 1.1x to 3x as fast.

For envs that have a very few CPU cycles as part of c_step, this may not provide a good speedup and in fact may be slower (see below for results). You can also limit the max number of threads say max_num_threads=4 under [vec] if your env is very short and still take advantage of native multi-threading.

Performance Comparison: Multithreading vs. Multiprocessing

Environment SPS BEFORE SPS AFTER (MT only) AFTER (TODO torch pin) Notes
rlplays 44K 112K - (my pixel platformer env)
go 520K 690K -
pacman 800K 890K -
drone_swarm 1.1M 940K - (*)
enduro 720K 520K - (*)
terraform 370K 380K - (**)

Notes:

  • (*) For some environments, Multiprocessing or even Serial processing is better than spreading across too many threads
  • (**) GPU bound environment - most time spent in copying/learning operations

(*) For some envs, it's better to do Multiprocessing or even Serial with a worker just going through the envs one by one rather than shard it across too many threads. I verified this matches the perf when limiting max_num_threads to be a small number (or even 0 to force Serial mode) rather than spread to all cores.

(**) GPU bound. Most time spent copying/learning.

…ng a new backend Multithreading. In your `config.ini`, set `backend=Multithreading` instead of `Multiprocessing` under `[vec]` to take advantage of it. For envs that are CPU deep (i.e. envs that do a lot of computation per `c_step`) but not GPU wide (i.e. not too many params), this offers a nice speed up anywhere from 1.1x to 3x as fast. For envs that have a very few CPU cycles as part of `c_step`, this may not provide a good speedup and in fact may be slower (see below for results). You can also limit the max number of threads say `max_num_threads=4` under `[vec]` if your env is very short and still take advantage of native multi-threading. env SPS BEFORE AFTER (MT only) AFTER (torch pin) rlplays 44K 112K (^ my env) go 520K 690K pacman 800K 890K drone_swarm 1.1M 940K (*) enduro 720K 520K (*) terraform 370K 380K (**) (*) For some envs, it's better to do Multiprocessing or even Serial with a worker just going through the envs one by one rather than shard it across too many envs. I verified this match the perf when limiting num_threads to be a small number (or just try plain Serial) rather than spread to all cores. (**) GPU bound. Most time spent copying/learning.


// TODO(perumaal): Should this be multi-thread aware as well? (see vec_step below).
// Main issue is that srand is not thread-safe. But do we care?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we should have a puffer_rnd or something like that so it's TLS aware. But envs may be complex that use rnd a lot. I am not sure if it matters though?

@rlplays rlplays changed the title This introduces native multi-threading in a single Python process using a new backend Multithreading. Native multi-threading backend Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant