Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a2b4fb7
Move sampler config into main YAML
Apr 14, 2020
94c684c
Make CLI override YAML
Apr 14, 2020
54e9914
Bring back default functionality, curriculum loader
Apr 15, 2020
df4a358
Load curriculum from same YAML
Apr 15, 2020
3a84c13
Example WallJump curriculum
Apr 15, 2020
92c9682
New-format YAML files
Apr 15, 2020
9dcf38d
Fix walljump curriculum
Apr 16, 2020
a926d4c
Commit SAC parameters
Apr 16, 2020
419a156
Delete old configs and add gail
Apr 16, 2020
c80c359
Change some of the documentation
Apr 17, 2020
f020ecc
Merge master into develop-single-config
Apr 17, 2020
0fa8f8b
More doc updates
Apr 17, 2020
72b39f0
Fix Yamato test
Apr 17, 2020
0c89258
Fix learn.py test
Apr 17, 2020
b84396f
More docs updates
Apr 17, 2020
756a75f
Update migrating.md file
Apr 17, 2020
cb97315
Update changelog and improve migrating
Apr 17, 2020
7bb6366
Don't hard break trying to get curriculum out of bad config
Apr 17, 2020
e0b8c9c
Use behavior name instead of brain
Apr 17, 2020
8d37045
Fix yamato_utils
Apr 17, 2020
b20ab5d
Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…
Apr 17, 2020
50eafc2
Delete curricula
Apr 17, 2020
cf920b6
Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…
Apr 17, 2020
eb3df94
Make RunOptions and YAML compatible
Apr 20, 2020
4171565
Rename walljump yaml SAC
Apr 21, 2020
4330c02
Fix newline formatting
Apr 21, 2020
41dd3f7
Merge branch 'master' into develop-single-config
Apr 22, 2020
75ad833
Update SAC configurations
Apr 22, 2020
5a75d7f
Edit Changelog
Apr 22, 2020
9ba2ef3
Fix learn.py tests
Apr 22, 2020
36c9591
Update strikers vs goalie and add Worm
Apr 23, 2020
79c8a6c
Merge branch 'master' into develop-single-config
Apr 23, 2020
3568c2e
Merge branch 'master' into develop-single-config
Apr 29, 2020
4d27ed5
Use hard links in Migrating.md
Apr 29, 2020
597635c
Merge branch 'master' into develop-single-config
Apr 29, 2020
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge master into develop-single-config
  • Loading branch information
Ervin Teng committed Apr 17, 2020
commit f020eccd921e6a00dcb67d3649cdb95dbaa55dd4
66 changes: 19 additions & 47 deletions docs/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,49 +124,25 @@ example.

## Training a new model with Reinforcement Learning

While we provide pre-trained `.nn` files for the agents in this environment, any environment you make yourself will require training agents from scratch to generate a new model file. We can do this using reinforcement learning.

In order to train an agent to correctly balance the ball, we provide two
deep reinforcement learning algorithms.

The default algorithm is Proximal Policy Optimization (PPO). This
is a method that has been shown to be more general purpose and stable
than many other RL algorithms. For more information on PPO, OpenAI
has a [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it, and [our page](Training-PPO.md) for how to use it in training.

We also provide Soft-Actor Critic, an off-policy algorithm that
has been shown to be both stable and sample-efficient.
For more information on SAC, see UC Berkeley's
[blog post](https://bair.berkeley.edu/blog/2018/12/14/sac/) and
[our page](Training-SAC.md) for more guidance on when to use SAC vs. PPO. To
use SAC to train Balance Ball, replace all references to `config/ppo/3DBall.yaml`
with `config/sac/3DBall.yaml` below.

To train the agents within the Balance Ball environment, we will be using the
ML-Agents Python package. We have provided a convenient command called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
While we provide pre-trained `.nn` files for the agents in this environment, any
environment you make yourself will require training agents from scratch to
generate a new model file. In this section we will demonstrate how to use the
reinforcement learning algorithms that are part of the ML-Agents Python package
to accomplish this. We have provided a convenient command `mlagents-learn` which
accepts arguments used to configure both training and inference phases.

### Training the environment

1. Open a command or terminal window.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier>`
where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in the `config/ppo/` and `config/sac` folders.
- `<run-identifier>` is a string used to separate the results of different
training runs. Make sure to use one that hasn't been used already!
4. If you cloned the ML-Agents repo, then you can simply run

```sh
mlagents-learn config/ppo/3DBall.yaml --run-id=firstRun
```

5. When the message _"Start training by pressing the Play button in the Unity
1. Navigate to the folder where you cloned the `ml-agents` repository. **Note**:
If you followed the default [installation](Installation.md), then you should
be able to run `mlagents-learn` from any directory.
1. Run `mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun`.
- `config/ppo/3DBall.yaml` is the path to a default training
configuration file that we provide. The `config/ppo` folder includes training configuration
files for all our example environments, including 3DBall.
- `run-id` is a unique name for this training session.
1. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.

Expand Down Expand Up @@ -261,14 +237,10 @@ mlagents-learn config/ppo/3DBall.yaml --run-id=firstRun --resume
```

Your trained model will be at `models/<run-identifier>/<behavior_name>.nn` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
(**Note:** There is a known bug on Windows that causes the saving of the model to
fail when you early terminate the training, it's recommended to wait until Step
has reached the max_steps parameter you set in your config YAML.) This file
corresponds to your model's latest checkpoint. You can now embed this trained
model into your Agents by following the steps below, which is similar to
the steps described
[above](#running-a-pre-trained-model).
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
to the model. This file corresponds to your model's latest checkpoint. You can
now embed this trained model into your Agents by following the steps below,
which is similar to the steps described [above](#running-a-pre-trained-model).

1. Move your model file into
`Project/Assets/ML-Agents/Examples/3DBall/TFModels/`.
Expand Down
14 changes: 7 additions & 7 deletions docs/Learning-Environment-Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,13 +317,13 @@ you would like to contribute environments, please see our
objects, goals, and walls.
- Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
rotation and forward/backward movement.
* Visual Observations (Optional): First-person view for the agent. Use
`VisualHallway` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Float Properties: None
* Benchmark Mean Reward: 0.7
* To train this environment, you can enable curiosity by adding the `curiosity` reward signal in `config/ppo/Hallway.yaml`
- Visual Observations (Optional): First-person view for the agent. Use
`VisualHallway` scene. **The visual observation version of this environment
does not train with the provided default training parameters.**
- Float Properties: None
- Benchmark Mean Reward: 0.7
- To train this environment, you can enable curiosity by adding the `curiosity` reward signal
in `config/ppo/Hallway.yaml`

## Bouncer

Expand Down
6 changes: 6 additions & 0 deletions ml-agents/mlagents/trainers/learn.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,12 @@ def _create_parser():
dest="env_path",
help="Path to the Unity executable to train",
)
argparser.add_argument(
"--lesson",
default=0,
type=int,
help="The lesson to start with when performing curriculum training",
)
argparser.add_argument(
"--keep-checkpoints",
default=5,
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.