-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into pre-commit-ci-update-config
- Loading branch information
Showing
44 changed files
with
1,911 additions
and
1,168 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Work with steps | ||
In this document we want to discuss about the hyper-parameters which refer to the concept of step. | ||
There are various ways to interpret it, so it is necessary to clearly specify how to interpret it. | ||
|
||
## Policy steps | ||
We start from the concept of *policy step*: a policy step is the particular step in which the policy selects the action to perform in the environment, given an observation received by it. | ||
|
||
> **Note** | ||
> | ||
> The environment step is the step performed by the environment: the environment takes in input an action and computes the next observation and the next reward. | ||
Now that we have introduced the concept of policy step, it is necessary to clarify some aspects: | ||
|
||
1. When there are multiple parallel environments, the policy step is proportional to the number of parallel environments. E.g., if there are $m$ environments, then the actor has to choose $m$ actions and each environment performs an environment step: this means that $\bold{m}$ **policy steps** are performed. | ||
2. When there are multiple parallel processes (i.e. the script has been run with `lightning run model --devices>=2 ...`), the policy step it is proportional to the number of parallel processes. E.g., let us assume that there are $n$ processes each one containing one single environment: the $n$ actors select an action and a (per-process) step in the environment is performed. In this case $\bold{n}$ **policy steps** are performed. | ||
|
||
In general, if we have $n$ parallel processes, each one with $m$ independent environments, the policy step increases **globally** by $n \cdot m$ at each iteration. | ||
|
||
The hyper-parameters which refer to the *policy steps* are: | ||
|
||
* `total_steps`: the total number of policy steps to perform in an experiment. Effectively, this number will be divided in each process by $n \cdot m$ to obtain the number of training steps to be performed by each of them. | ||
* `exploration_steps`: the number of policy steps in which the agent explores the environment in the P2E algorithms. | ||
* `max_episode_steps`: the maximum number of policy steps an episode can last ($\text{max\_steps}$); when this number is reached a `terminated=True` is returned by the environment. This means that if you decide to have an action repeat greater than one ($\text{action\_repeat} > 1$), then the environment performs a maximum number of steps equal to: $\text{env\_steps} = \text{max\_steps} \cdot \text{action\_repeat}$. | ||
* `learning_starts`: how many policy steps the agent has to perform before starting the training. | ||
* `train_every`: how many policy steps the agent has to perform between one training and the next. | ||
|
||
## Gradient steps | ||
A *gradient step* consists of an update of the parameters of the agent, i.e., a call of the *train* function. The gradient step is proportional to the number of parallel processes, indeed, if there are $n$ parallel processes, $n \cdot \text{gradient\_steps}$ calls to the *train* method will be executed. | ||
|
||
The hyper-parameters which refer to the *gradient steps* are: | ||
* `algo.per_rank_gradient_steps`: the number of gradient steps per rank to perform in a single iteration. | ||
* `algo.per_rank_pretrain_steps`: the number of gradient steps per rank to perform in the first iteration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,4 +31,4 @@ | |
np.int = np.int64 | ||
np.bool = bool | ||
|
||
__version__ = "0.3.0" | ||
__version__ = "0.3.2" |
Oops, something went wrong.