Develop magic string + trajectory #3122

andrewcoh · 2019-12-23T22:06:54Z

This is a new PR for spawning multiple brain names with the new trajectory-centric structure of the trainers as outlined in this design doc. As is, it passes all pytest/C# tests and trains 3DBall.

The major changes:

behavior name is now the root behavior name + team id (team id field added to the editor)
python side has some parsing in trainer controller to handle name_behavior_id
policies are created in a separate function within the trainer
getter/setter for policies by name_behavior_id
self.policy is removed from trainer.py. Multiagent trainers can now sensibly inherit from this hierarchy. save_model/export_model now expect a name_behavior_id identifier for a policy.
modified tests for new trainer functions

The minor changes:

trainer doesn't take brain_parameters as an argument, just brain name.

…ologies/ml-agents into develop-magic-string

ervteng · 2019-12-23T23:03:46Z

ml-agents/mlagents/trainers/ppo/trainer.py

 if trajectory.done_reached:
- self._update_end_episode_stats(agent_id)
+ self._update_end_episode_stats(
+ agent_id, self.get_policy(trajectory.behavior_id)


ervteng · 2019-12-23T23:09:20Z

ml-agents/mlagents/trainers/trainer.py

 :param n_steps: number of steps to increment the step count by
 """
- self.step = self.policy.increment_step(n_steps)
+ self.step += n_steps


Are we still incrementing the policy's steps somewhere? I believe the steps are needed to update the nodes in the TF graph that hold the global steps, which are used for annealing the different params

Yes, the policy steps get updated right after this is called in trainer_controller.

Is there a reason why we don't just have increment_step(self, n_steps, behavior_name_id) and do the get_policy inside the method? So we don't have to keep track of two step increments

Basically in my latest code (28ad94a) I've replaced all of these calls on the trainer with a single advance() call that takes care of the incrementing, trajectory ingestion, and policy updates, so getting the policy and operating on it in TC seems overly complicated

chriselion · 2019-12-27T18:33:44Z

ml-agents/mlagents/trainers/sac/trainer.py

 self.stats_reporter.add_stat(stat, np.mean(stat_list))

- bc_module = self.sac_policy.bc_module
+ bc_module = self.policy.bc_module


I'm surprised this works since self.policy should be a TFPolicy not a SACPolicy

Is it because I removed self.policy from trainer? self.policy now just exist at the level of ppo/sac

chriselion · 2019-12-27T18:34:11Z

ml-agents/mlagents/trainers/sac/trainer.py


 def __init__(
- self, brain, reward_buff_cap, trainer_parameters, training, load, seed, run_id
+ self,


Could you add type annotations on these?

chriselion · 2019-12-27T18:34:54Z

ml-agents/mlagents/trainers/ppo/trainer.py

 def __init__(
 self,
- brain,
+ brain_name,


Ditto for type annotations

it's odd that the trainers have escaped type annotations for this long.

chriselion

Looks good, just a few questions on the type checks. I can handle those next week if you want.

andrewcoh added 30 commits November 15, 2019 14:05

added team id and identifier concat to behavior parameters

ce90761

splitting brain params into brain name and identifiers

def0336

set team id in prefab

ef9e70a

recieves brain_name and identifier on python side

3abc08a

added team id and identifier concat to behavior parameters

705d7e3

splitting brain params into brain name and identifiers

cad187a

set team id in prefab

3ad4635

recieves brain_name and identifier on python side

7e71e68

rebased with develop

c22fed2

Merge branch 'develop-magic-string' of https://github.com/Unity-Techn…

85032f9

…ologies/ml-agents into develop-magic-string

Correctly calls concatBehaviorIdentifiers

010161c

splitting brain params into brain name and identifiers

c21ef8d

set team id in prefab

d7dbf19

recieves brain_name and identifier on python side

a333e4f

added team id and identifier concat to behavior parameters

cb03043

rebased with develop

434c787

Correctly calls concatBehaviorIdentifiers

f0ae938

Merge branch 'develop-magic-string' of https://github.com/Unity-Techn…

7e8f3af

…ologies/ml-agents into develop-magic-string

trainer_controller expects name_behavior_ids

9af36d8

add_policy and create_policy separated

4939ef2

adjusting tests to expect trainer.add_policy to be called

9e77014

fixing tests

bddea76

fixed naming name_behavior_id

f005684

added team id and identifier concat to behavior parameters

c02cd81

splitting brain params into brain name and identifiers

4106f80

set team id in prefab

6a8b56c

recieves brain_name and identifier on python side

68a5242

added team id and identifier concat to behavior parameters

816db69

splitting brain params into brain name and identifiers

c4657f7

recieves brain_name and identifier on python side

7fca22c

andrewcoh and others added 15 commits December 12, 2019 09:51

fixed default trainer_util test to expect brain_name

4a46407

Merge branch 'master' into develop-magic-string

eeed349

fixing ci ppo_policy

a382354

fixed more ci problems/removed self.policies

7d95ba2

Merge branch 'master' into develop-magic-string

a89cee4

Merge branch 'master' into develop-magic-string

0c4a04a

Add agent group name to Trajectory

12e57f6

Rename to behavior_id

cc05251

magic string protocol with trainer refactor

94e958d

fixed logger warning

e83ef7d

Merge branch 'master' into develop-magic-string-trajectory

bd0980c

removed self.policy from rl_trainer

ff81467

removed self.trainer from trainer.py

53803ac

fixed increment_step tests

a9716be

Merge branch 'master' into develop-magic-string-trajectory

b6d92ba

andrewcoh requested review from chriselion and ervteng December 23, 2019 22:06

fixing circleci tests

0748274

ervteng reviewed Dec 23, 2019

View reviewed changes

chriselion reviewed Dec 27, 2019

View reviewed changes

chriselion approved these changes Dec 27, 2019

View reviewed changes

andrewcoh added 3 commits December 27, 2019 11:00

Merge branch 'master' into develop-magic-string-trajectory

bfc7f63

type annotations to the trainer params

493c8d0

parameters descriptions for trainers

22eaab9

andrewcoh merged commit f504908 into master Dec 27, 2019

delete-merged-branch bot deleted the develop-magic-string-trajectory branch December 27, 2019 21:50

github-actions bot locked as resolved and limited conversation to collaborators May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop magic string + trajectory #3122

Develop magic string + trajectory #3122

Uh oh!

andrewcoh commented Dec 23, 2019

ervteng Dec 23, 2019

ervteng Dec 23, 2019

andrewcoh Dec 23, 2019

ervteng Dec 23, 2019

ervteng Dec 23, 2019 •

edited

Loading

chriselion Dec 27, 2019

andrewcoh Dec 27, 2019

chriselion Dec 27, 2019

andrewcoh Dec 27, 2019

chriselion Dec 27, 2019

andrewcoh Dec 27, 2019

chriselion left a comment

Labels

4 participants

Develop magic string + trajectory #3122

Develop magic string + trajectory #3122

Uh oh!

Conversation

andrewcoh commented Dec 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ervteng Dec 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chriselion left a comment

Choose a reason for hiding this comment

Labels

4 participants

ervteng Dec 23, 2019 •

edited

Loading