I am following the RNN tutorial for reinforcement learning in PyTorch from this link. In there, I came across the SyncDataCollector class, whose documentation looked a bit confusing. I have the following questions:
What is a frame? Is a single frame just the state of the environment?
Assuming that a frame is just the state, what is a batch? If I choose a fixed number for the batch, is it going to concatenate simulations back to back until the number of states match it? Does it always start a fresh new simulation on every batch?
Is an iter a set of batches? Would the batches start a fresh new environment if I pass reset_at_each_iter as True?
For my case, it is important that the environment always start with the same seed when it calls for env.reset. How do I tell the data collector that it should begin with a fixed seed?
A “frame” is (arguably a poorly chosen term) for a step in the environment
Assuming that a frame is just the state, what is a batch?
In the context of collectors, a batch is an ensemble of frames. If you say: frames_per_batch=10 you are saying “I want to collect batches that have 10 steps in it”
Is an iter a set of batches? Would the batches start a fresh new environment if I pass reset_at_each_iter as True?
An iter of the collector gives you one batch:
for batch in collector: assert batch.numel() == collector.frames_per_batch
and yes if you set reset_at_each_iter to True you’ll get
for batch in collector: assert batch["is_init"][..., 0].all() assert batch.numel() == collector.frames_per_batch
(assuming your env has an InitTracker transform appended)
For my case, it is important that the environment always start with the same seed when it calls for env.reset. How do I tell the data collector that it should begin with a fixed seed?
that should be easy to implement with a very small transform that passes the seed to the env during reset. How does the resetting and seeding look like in your env?