Native multi-threading backend #382

rlplays · 2025-10-18T00:53:15Z

To use this: In your config.ini, set backend=Multithreading instead of Multiprocessing under [vec] to take advantage of it. For envs that are CPU deep (i.e. envs that do a lot of computation per c_step) but not GPU wide (i.e. not too many params), this offers a nice speed up anywhere from 1.1x to 3x as fast.

For envs that have a very few CPU cycles as part of c_step, this may not provide a good speedup and in fact may be slower (see below for results). You can also limit the max number of threads say max_num_threads=4 under [vec] if your env is very short and still take advantage of native multi-threading.

Performance Comparison: Multithreading vs. Multiprocessing

Environment	SPS BEFORE	SPS AFTER (MT only)	AFTER (TODO torch pin)	Notes
rlplays	44K	112K	-	(my pixel platformer env)
go	520K	690K	-
pacman	800K	890K	-
drone_swarm	1.1M	940K	-	(*)
enduro	720K	520K	-	(*)
terraform	370K	380K	-	(**)

Notes:

(*) For some environments, Multiprocessing or even Serial processing is better than spreading across too many threads
(**) GPU bound environment - most time spent in copying/learning operations

(*) For some envs, it's better to do Multiprocessing or even Serial with a worker just going through the envs one by one rather than shard it across too many threads. I verified this matches the perf when limiting max_num_threads to be a small number (or even 0 to force Serial mode) rather than spread to all cores.

(**) GPU bound. Most time spent copying/learning.

…ng a new backend Multithreading. In your `config.ini`, set `backend=Multithreading` instead of `Multiprocessing` under `[vec]` to take advantage of it. For envs that are CPU deep (i.e. envs that do a lot of computation per `c_step`) but not GPU wide (i.e. not too many params), this offers a nice speed up anywhere from 1.1x to 3x as fast. For envs that have a very few CPU cycles as part of `c_step`, this may not provide a good speedup and in fact may be slower (see below for results). You can also limit the max number of threads say `max_num_threads=4` under `[vec]` if your env is very short and still take advantage of native multi-threading. env SPS BEFORE AFTER (MT only) AFTER (torch pin) rlplays 44K 112K (^ my env) go 520K 690K pacman 800K 890K drone_swarm 1.1M 940K (*) enduro 720K 520K (*) terraform 370K 380K (**) (*) For some envs, it's better to do Multiprocessing or even Serial with a worker just going through the envs one by one rather than shard it across too many envs. I verified this match the perf when limiting num_threads to be a small number (or just try plain Serial) rather than spread to all cores. (**) GPU bound. Most time spent copying/learning.

rlplays · 2025-10-18T01:04:05Z

pufferlib/ocean/env_binding.h

-
+
+ // TODO(perumaal): Should this be multi-thread aware as well? (see vec_step below).
+ // Main issue is that srand is not thread-safe. But do we care?


It feels like we should have a puffer_rnd or something like that so it's TLS aware. But envs may be complex that use rnd a lot. I am not sure if it matters though?

…_envs`

rlplays commented Oct 18, 2025

View reviewed changes

rlplays changed the title ~~This introduces native multi-threading in a single Python process using a new backend Multithreading.~~ Native multi-threading backend Oct 18, 2025

Peru S added 10 commits November 4, 2025 11:58

Merge with 3.0

049c4d1

Ignore drive/data

5f18cb2

Ignore Drive binaries

cc9aaad

Ignore GPU drive binaries

7e5da18

Old GPU workaround: use old compile code

9fa25fc

Merge branch '3.0' into puffer-mt

c66b730

Fix Serial backend

8bcb2e2

Fix some typos/comment

0e5a89d

Support multithreading for envs that use num_agents instead of `num…

e5e9675

…_envs`

Minor cleanup

ba455c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Native multi-threading backend #382

Native multi-threading backend #382

Uh oh!

rlplays commented Oct 18, 2025 •

edited

Loading

rlplays Oct 18, 2025

Labels

1 participant



		// TODO(perumaal): Should this be multi-thread aware as well? (see vec_step below).
		// Main issue is that srand is not thread-safe. But do we care?

Native multi-threading backend #382

Are you sure you want to change the base?

Native multi-threading backend #382

Uh oh!

Conversation

rlplays commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison: Multithreading vs. Multiprocessing

rlplays Oct 18, 2025

Choose a reason for hiding this comment

Labels

1 participant

rlplays commented Oct 18, 2025 •

edited

Loading