- Notifications
You must be signed in to change notification settings - Fork 4.4k
Develop magic string + trajectory #3122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ologies/ml-agents into develop-magic-string
…ologies/ml-agents into develop-magic-string
| if trajectory.done_reached: | ||
| self._update_end_episode_stats(agent_id) | ||
| self._update_end_episode_stats( | ||
| agent_id, self.get_policy(trajectory.behavior_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever
| :param n_steps: number of steps to increment the step count by | ||
| """ | ||
| self.step = self.policy.increment_step(n_steps) | ||
| self.step += n_steps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still incrementing the policy's steps somewhere? I believe the steps are needed to update the nodes in the TF graph that hold the global steps, which are used for annealing the different params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the policy steps get updated right after this is called in trainer_controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we don't just have increment_step(self, n_steps, behavior_name_id) and do the get_policy inside the method? So we don't have to keep track of two step increments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically in my latest code (28ad94a) I've replaced all of these calls on the trainer with a single advance() call that takes care of the incrementing, trajectory ingestion, and policy updates, so getting the policy and operating on it in TC seems overly complicated
| self.stats_reporter.add_stat(stat, np.mean(stat_list)) | ||
| | ||
| bc_module = self.sac_policy.bc_module | ||
| bc_module = self.policy.bc_module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this works since self.policy should be a TFPolicy not a SACPolicy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because I removed self.policy from trainer? self.policy now just exist at the level of ppo/sac
| | ||
| def __init__( | ||
| self, brain, reward_buff_cap, trainer_parameters, training, load, seed, run_id | ||
| self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add type annotations on these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
| def __init__( | ||
| self, | ||
| brain, | ||
| brain_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for type annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's odd that the trainers have escaped type annotations for this long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few questions on the type checks. I can handle those next week if you want.
This is a new PR for spawning multiple brain names with the new trajectory-centric structure of the trainers as outlined in this design doc. As is, it passes all pytest/C# tests and trains 3DBall.
The major changes:
The minor changes: