Merge master into develop-single-config

Unity-Technologies · ervteng · Apr 29, 2020 · Apr 14, 2020 · Apr 14, 2020 · Apr 15, 2020
commit f020eccd921e6a00dcb67d3649cdb95dbaa55dd4
diff --git a/docs/Getting-Started.md b/docs/Getting-Started.md
@@ -124,49 +124,25 @@ example.
 
 ## Training a new model with Reinforcement Learning
 
-While we provide pre-trained `.nn` files for the agents in this environment, any environment you make yourself will require training agents from scratch to generate a new model file. We can do this using reinforcement learning.
-
-In order to train an agent to correctly balance the ball, we provide two
-deep reinforcement learning algorithms.
-
-The default algorithm is Proximal Policy Optimization (PPO). This
-is a method that has been shown to be more general purpose and stable
-than many other RL algorithms. For more information on PPO, OpenAI
-has a [blog post](https://blog.openai.com/openai-baselines-ppo/)
-explaining it, and [our page](Training-PPO.md) for how to use it in training.
-
-We also provide Soft-Actor Critic, an off-policy algorithm that
-has been shown to be both stable and sample-efficient.
-For more information on SAC, see UC Berkeley's
-[blog post](https://bair.berkeley.edu/blog/2018/12/14/sac/) and
-[our page](Training-SAC.md) for more guidance on when to use SAC vs. PPO. To
-use SAC to train Balance Ball, replace all references to `config/ppo/3DBall.yaml`
-with `config/sac/3DBall.yaml` below.
-
-To train the agents within the Balance Ball environment, we will be using the
-ML-Agents Python package. We have provided a convenient command called `mlagents-learn`
-which accepts arguments used to configure both training and inference phases.
+While we provide pre-trained `.nn` files for the agents in this environment, any
+environment you make yourself will require training agents from scratch to
+generate a new model file. In this section we will demonstrate how to use the
+reinforcement learning algorithms that are part of the ML-Agents Python package
+to accomplish this. We have provided a convenient command `mlagents-learn` which
+accepts arguments used to configure both training and inference phases.
 
 ### Training the environment
 
 1. Open a command or terminal window.
-2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
- **Note**: If you followed the default [installation](Installation.md), then
- you should be able to run `mlagents-learn` from any directory.
-3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier>`
- where:
- - `<trainer-config-path>` is the relative or absolute filepath of the
- trainer configuration. The defaults used by example environments included
- in `MLAgentsSDK` can be found in the `config/ppo/` and `config/sac` folders.
- - `<run-identifier>` is a string used to separate the results of different
- training runs. Make sure to use one that hasn't been used already!
-4. If you cloned the ML-Agents repo, then you can simply run
-
- ```sh
- mlagents-learn config/ppo/3DBall.yaml --run-id=firstRun
- ```
-
-5. When the message _"Start training by pressing the Play button in the Unity
+1. Navigate to the folder where you cloned the `ml-agents` repository. **Note**:
+ If you followed the default [installation](Installation.md), then you should
+ be able to run `mlagents-learn` from any directory.
+1. Run `mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun`.
+ - `config/ppo/3DBall.yaml` is the path to a default training
+ configuration file that we provide. The `config/ppo` folder includes training configuration
+ files for all our example environments, including 3DBall.
+ - `run-id` is a unique name for this training session.
+1. When the message _"Start training by pressing the Play button in the Unity
  Editor"_ is displayed on the screen, you can press the :arrow_forward: button
  in Unity to start training in the Editor.
 
@@ -261,14 +237,10 @@ mlagents-learn config/ppo/3DBall.yaml --run-id=firstRun --resume
 ```
 
 Your trained model will be at `models/<run-identifier>/<behavior_name>.nn` where
-`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
-(**Note:** There is a known bug on Windows that causes the saving of the model to
-fail when you early terminate the training, it's recommended to wait until Step
-has reached the max_steps parameter you set in your config YAML.) This file
-corresponds to your model's latest checkpoint. You can now embed this trained
-model into your Agents by following the steps below, which is similar to
-the steps described
-[above](#running-a-pre-trained-model).
+`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
+to the model. This file corresponds to your model's latest checkpoint. You can
+now embed this trained model into your Agents by following the steps below,
+which is similar to the steps described [above](#running-a-pre-trained-model).
 
 1. Move your model file into
  `Project/Assets/ML-Agents/Examples/3DBall/TFModels/`.

diff --git a/docs/Learning-Environment-Examples.md b/docs/Learning-Environment-Examples.md
@@ -317,13 +317,13 @@ you would like to contribute environments, please see our
  objects, goals, and walls.
  - Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
  rotation and forward/backward movement.
- * Visual Observations (Optional): First-person view for the agent. Use
- `VisualHallway` scene. __The visual observation version of
-  this environment does not train with the provided default
- training parameters.__
-* Float Properties: None
-* Benchmark Mean Reward: 0.7
- * To train this environment, you can enable curiosity by adding the `curiosity` reward signal in `config/ppo/Hallway.yaml`
+ - Visual Observations (Optional): First-person view for the agent. Use
+ `VisualHallway` scene. **The visual observation version of this environment
+ does not train with the provided default training parameters.**
+- Float Properties: None
+- Benchmark Mean Reward: 0.7
+ - To train this environment, you can enable curiosity by adding the `curiosity` reward signal
+   in `config/ppo/Hallway.yaml`
 
 ## Bouncer
 

diff --git a/ml-agents/mlagents/trainers/learn.py b/ml-agents/mlagents/trainers/learn.py
@@ -54,6 +54,12 @@ def _create_parser():
  dest="env_path",
  help="Path to the Unity executable to train",
  )
+ argparser.add_argument(
+ "--lesson",
+ default=0,
+ type=int,
+ help="The lesson to start with when performing curriculum training",
+ )
  argparser.add_argument(
  "--keep-checkpoints",
  default=5,