diff --git a/docs/Installation.md b/docs/Installation.md index a3c381c415..e6194c17c6 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -83,7 +83,7 @@ the `develop` branch as it may have potential fixes for bugs and dependency issu (Optional to get bleeding edge) ```sh -git clone --branch https://github.com/Unity-Technologies/ml-agents.git +git clone https://github.com/Unity-Technologies/ml-agents.git ``` #### Advanced: Local Installation for Development diff --git a/docs/Learning-Environment-Design-Agents.md b/docs/Learning-Environment-Design-Agents.md index 255dd44b17..8000a88465 100644 --- a/docs/Learning-Environment-Design-Agents.md +++ b/docs/Learning-Environment-Design-Agents.md @@ -2,39 +2,44 @@ **Table of Contents:** -- [Decisions](#decisions) -- [Observations and Sensors](#observations-and-sensors) - - [Generating Observations](#generating-observations) - - [Agent.CollectObservations()](#agentcollectobservations) - - [Observable Fields and Properties](#observable-fields-and-properties) - - [ISensor interface and SensorComponents](#isensor-interface-and-sensorcomponents) - - [Vector Observations](#vector-observations) - - [One-hot encoding categorical information](#one-hot-encoding-categorical-information) - - [Normalization](#normalization) - - [Stacking](#stacking) - - [Vector Observation Summary & Best Practices](#vector-observation-summary--best-practices) - - [Visual Observations](#visual-observations) - - [Visual Observation Summary & Best Practices](#visual-observation-summary--best-practices) - - [Raycast Observations](#raycast-observations) - - [RayCast Observation Summary & Best Practices](#raycast-observation-summary--best-practices) - - [Variable Length Observations](#variable-length-observations) - - [Variable Length Observation Summary & Best Practices](#variable-length-observation-summary--best-practices) - - [Goal Signal](#goal-signal) - - [Goal Signal Summary & Best Practices](#goal-signal-summary--best-practices) -- [Actions and Actuators](#actions-and-actuators) - - [Continuous Actions](#continuous-actions) - - [Discrete Actions](#discrete-actions) - - [Masking Discrete Actions](#masking-discrete-actions) - - [Actions Summary & Best Practices](#actions-summary--best-practices) -- [Rewards](#rewards) - - [Examples](#examples) - - [Rewards Summary & Best Practices](#rewards-summary--best-practices) -- [Agent Properties](#agent-properties) -- [Destroying an Agent](#destroying-an-agent) -- [Defining Multi-agent Scenarios](#defining-multi-agent-scenarios) - - [Teams for Adversarial Scenarios](#teams-for-adversarial-scenarios) - - [Groups for Cooperative Scenarios](#groups-for-cooperative-scenarios) -- [Recording Demonstrations](#recording-demonstrations) +- [Agents](#agents) + - [Decisions](#decisions) + - [Observations and Sensors](#observations-and-sensors) + - [Generating Observations](#generating-observations) + - [Agent.CollectObservations()](#agentcollectobservations) + - [Observable Fields and Properties](#observable-fields-and-properties) + - [ISensor interface and SensorComponents](#isensor-interface-and-sensorcomponents) + - [Vector Observations](#vector-observations) + - [One-hot encoding categorical information](#one-hot-encoding-categorical-information) + - [Normalization](#normalization) + - [Stacking](#stacking) + - [Vector Observation Summary \& Best Practices](#vector-observation-summary--best-practices) + - [Visual Observations](#visual-observations) + - [Visual Observation Summary \& Best Practices](#visual-observation-summary--best-practices) + - [Raycast Observations](#raycast-observations) + - [RayCast Observation Summary \& Best Practices](#raycast-observation-summary--best-practices) + - [Grid Observations](#grid-observations) + - [Grid Observation Summary \& Best Practices](#grid-observation-summary--best-practices) + - [Variable Length Observations](#variable-length-observations) + - [Variable Length Observation Summary \& Best Practices](#variable-length-observation-summary--best-practices) + - [Goal Signal](#goal-signal) + - [Goal Signal Summary \& Best Practices](#goal-signal-summary--best-practices) + - [Actions and Actuators](#actions-and-actuators) + - [Continuous Actions](#continuous-actions) + - [Discrete Actions](#discrete-actions) + - [Masking Discrete Actions](#masking-discrete-actions) + - [IActuator interface and ActuatorComponents](#iactuator-interface-and-actuatorcomponents) + - [Actions Summary \& Best Practices](#actions-summary--best-practices) + - [Rewards](#rewards) + - [Examples](#examples) + - [Rewards Summary \& Best Practices](#rewards-summary--best-practices) + - [Agent Properties](#agent-properties) + - [Destroying an Agent](#destroying-an-agent) + - [Defining Multi-agent Scenarios](#defining-multi-agent-scenarios) + - [Teams for Adversarial Scenarios](#teams-for-adversarial-scenarios) + - [Groups for Cooperative Scenarios](#groups-for-cooperative-scenarios) + - [Cooperative Behaviors Notes and Best Practices](#cooperative-behaviors-notes-and-best-practices) + - [Recording Demonstrations](#recording-demonstrations) An agent is an entity that can observe its environment, decide on the best course of action using those observations, and execute those actions within its @@ -661,7 +666,7 @@ a `CameraSensor` is a goal by attaching a `VectorSensorComponent` or a `CameraSensorComponent` to the Agent and selecting `Goal Signal` as `Observation Type`. On the trainer side, there are two different ways to condition the policy. This setting is determined by the -[conditioning_type parameter](Training-Configuration-File.md#common-trainer-configurations). +[goal_conditioning_type parameter](Training-Configuration-File.md#common-trainer-configurations). If set to `hyper` (default) a [HyperNetwork](https://arxiv.org/pdf/1609.09106.pdf) will be used to generate some of the weights of the policy using the goal observations as input. Note that using a @@ -674,7 +679,7 @@ For an example on how to use a goal signal, see the #### Goal Signal Summary & Best Practices - Attach a `VectorSensorComponent` or `CameraSensorComponent` to an agent and set the observation type to goal to use the feature. - - Set the conditioning_type parameter in the training configuration. + - Set the goal_conditioning_type parameter in the training configuration. - Reduce the number of hidden units in the network when using the HyperNetwork conditioning type. diff --git a/docs/Training-Configuration-File.md b/docs/Training-Configuration-File.md index e5d089e551..1828acb28e 100644 --- a/docs/Training-Configuration-File.md +++ b/docs/Training-Configuration-File.md @@ -2,21 +2,22 @@ **Table of Contents** -- [Common Trainer Configurations](#common-trainer-configurations) -- [Trainer-specific Configurations](#trainer-specific-configurations) - - [PPO-specific Configurations](#ppo-specific-configurations) - - [SAC-specific Configurations](#sac-specific-configurations) -- [Reward Signals](#reward-signals) - - [Extrinsic Rewards](#extrinsic-rewards) - - [Curiosity Intrinsic Reward](#curiosity-intrinsic-reward) - - [GAIL Intrinsic Reward](#gail-intrinsic-reward) - - [RND Intrinsic Reward](#rnd-intrinsic-reward) - - [Reward Signal Settings for SAC](#reward-signal-settings-for-sac) -- [Behavioral Cloning](#behavioral-cloning) -- [Memory-enhanced Agents using Recurrent Neural Networks](#memory-enhanced-agents-using-recurrent-neural-networks) -- [Self-Play](#self-play) - - [Note on Reward Signals](#note-on-reward-signals) - - [Note on Swap Steps](#note-on-swap-steps) +- [Training Configuration File](#training-configuration-file) + - [Common Trainer Configurations](#common-trainer-configurations) + - [Trainer-specific Configurations](#trainer-specific-configurations) + - [PPO-specific Configurations](#ppo-specific-configurations) + - [SAC-specific Configurations](#sac-specific-configurations) + - [MA-POCA-specific Configurations](#ma-poca-specific-configurations) + - [Reward Signals](#reward-signals) + - [Extrinsic Rewards](#extrinsic-rewards) + - [Curiosity Intrinsic Reward](#curiosity-intrinsic-reward) + - [GAIL Intrinsic Reward](#gail-intrinsic-reward) + - [RND Intrinsic Reward](#rnd-intrinsic-reward) + - [Behavioral Cloning](#behavioral-cloning) + - [Memory-enhanced Agents using Recurrent Neural Networks](#memory-enhanced-agents-using-recurrent-neural-networks) + - [Self-Play](#self-play) + - [Note on Reward Signals](#note-on-reward-signals) + - [Note on Swap Steps](#note-on-swap-steps) ## Common Trainer Configurations @@ -44,7 +45,7 @@ choice of the trainer (which we review on subsequent sections). | `network_settings -> num_layers` | (default = `2`) The number of hidden layers in the neural network. Corresponds to how many hidden layers are present after the observation input, or after the CNN encoding of the visual observation. For simple problems, fewer layers are likely to train faster and more efficiently. More layers may be necessary for more complex control problems.

Typical range: `1` - `3` | | `network_settings -> normalize` | (default = `false`) Whether normalization is applied to the vector observation inputs. This normalization is based on the running average and variance of the vector observation. Normalization can be helpful in cases with complex continuous control problems, but may be harmful with simpler discrete control problems. | | `network_settings -> vis_encode_type` | (default = `simple`) Encoder type for encoding visual observations.

`simple` (default) uses a simple encoder which consists of two convolutional layers, `nature_cnn` uses the CNN implementation proposed by [Mnih et al.](https://www.nature.com/articles/nature14236), consisting of three convolutional layers, and `resnet` uses the [IMPALA Resnet](https://arxiv.org/abs/1802.01561) consisting of three stacked layers, each with two residual blocks, making a much larger network than the other two. `match3` is a smaller CNN ([Gudmundsoon et al.](https://www.researchgate.net/publication/328307928_Human-Like_Playtesting_with_Deep_Learning)) that can capture more granular spatial relationships and is optimized for board games. `fully_connected` uses a single fully connected dense layer as encoder without any convolutional layers.

Due to the size of convolution kernel, there is a minimum observation size limitation that each encoder type can handle - `simple`: 20x20, `nature_cnn`: 36x36, `resnet`: 15 x 15, `match3`: 5x5. `fully_connected` doesn't have convolutional layers and thus no size limits, but since it has less representation power it should be reserved for very small inputs. Note that using the `match3` CNN with very large visual input might result in a huge observation encoding and thus potentially slow down training or cause memory issues. | -| `network_settings -> conditioning_type` | (default = `hyper`) Conditioning type for the policy using goal observations.

`none` treats the goal observations as regular observations, `hyper` (default) uses a HyperNetwork with goal observations as input to generate some of the weights of the policy. Note that when using `hyper` the number of parameters of the network increases greatly. Therefore, it is recommended to reduce the number of `hidden_units` when using this `conditioning_type` +| `network_settings -> goal_conditioning_type` | (default = `hyper`) Conditioning type for the policy using goal observations.

`none` treats the goal observations as regular observations, `hyper` (default) uses a HyperNetwork with goal observations as input to generate some of the weights of the policy. Note that when using `hyper` the number of parameters of the network increases greatly. Therefore, it is recommended to reduce the number of `hidden_units` when using this `goal_conditioning_type` ## Trainer-specific Configurations