Skip to content

Conversation

@andrewcoh
Copy link
Contributor

This is a new PR for spawning multiple brain names with the new trajectory-centric structure of the trainers as outlined in this design doc. As is, it passes all pytest/C# tests and trains 3DBall.

The major changes:

  • behavior name is now the root behavior name + team id (team id field added to the editor)
  • python side has some parsing in trainer controller to handle name_behavior_id
  • policies are created in a separate function within the trainer
  • getter/setter for policies by name_behavior_id
  • self.policy is removed from trainer.py. Multiagent trainers can now sensibly inherit from this hierarchy. save_model/export_model now expect a name_behavior_id identifier for a policy.
  • modified tests for new trainer functions

The minor changes:

  • trainer doesn't take brain_parameters as an argument, just brain name.
if trajectory.done_reached:
self._update_end_episode_stats(agent_id)
self._update_end_episode_stats(
agent_id, self.get_policy(trajectory.behavior_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever

:param n_steps: number of steps to increment the step count by
"""
self.step = self.policy.increment_step(n_steps)
self.step += n_steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still incrementing the policy's steps somewhere? I believe the steps are needed to update the nodes in the TF graph that hold the global steps, which are used for annealing the different params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the policy steps get updated right after this is called in trainer_controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't just have increment_step(self, n_steps, behavior_name_id) and do the get_policy inside the method? So we don't have to keep track of two step increments

Copy link
Contributor

@ervteng ervteng Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically in my latest code (28ad94a) I've replaced all of these calls on the trainer with a single advance() call that takes care of the incrementing, trajectory ingestion, and policy updates, so getting the policy and operating on it in TC seems overly complicated

self.stats_reporter.add_stat(stat, np.mean(stat_list))

bc_module = self.sac_policy.bc_module
bc_module = self.policy.bc_module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this works since self.policy should be a TFPolicy not a SACPolicy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because I removed self.policy from trainer? self.policy now just exist at the level of ppo/sac


def __init__(
self, brain, reward_buff_cap, trainer_parameters, training, load, seed, run_id
self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add type annotations on these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

def __init__(
self,
brain,
brain_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for type annotations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's odd that the trainers have escaped type annotations for this long.

Copy link
Contributor

@chriselion chriselion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few questions on the type checks. I can handle those next week if you want.

@andrewcoh andrewcoh merged commit f504908 into master Dec 27, 2019
@delete-merged-branch delete-merged-branch bot deleted the develop-magic-string-trajectory branch December 27, 2019 21:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

4 participants