Skip to content

Commit

Permalink
Buffer is checkpointed by default
Browse files Browse the repository at this point in the history
  • Loading branch information
belerico committed May 9, 2024
1 parent 9e4cce2 commit 5ec76bc
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 8 deletions.
6 changes: 3 additions & 3 deletions howto/logs_and_checkpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,11 +195,11 @@ size: ???
memmap: True
validate_args: False
from_numpy: False
checkpoint: False # Used only for off-policy algorithms
checkpoint: True # Used only for off-policy algorithms
```

There can be few scenarios to pay attention to:

* If the buffer is memory-mapped (i.e. `buffer.memmap=True`) and one saves the buffer in the checkpoint then one **mustn't delete the buffer folder** of the stopped experiment: if the buffer is memory-mapped a file for every key saved in the replay buffer is created on disk (`observations.memmap`, `rewards.memmap` for example) and when the experiment is resumed those files are read back from the exact same location
* If the buffer is memory-mapped (i.e. `buffer.memmap=True`), one saves the buffer in the checkpoint and the buffer has been filled completely during the previous experiment (meaning that the olders trajectories have been overwritten by newer ones) then it could happen that the agent will be trained from "future" trajectories coming from a "future" policy. To be more precise the buffer is simply a pre-allocated numpy-array with an attribute `pos` that points to the first free slot to be written; if we are using a `sheeprl.data.buffers.SequentialReplayBuffer` we sample sequential sequences in `[0, pos - sequence_length) ∪ [pos, buffer_size)` or simply `[0, pos - sequence_length)` depending on whether the buffer has been filled or not respectively. When we save the buffer into the checkpoint we save all the relevant information regarding it (the `pos` attribute and the path to the memory-mapped files, which represents the buffer content to be retrieved upon resuming). Suppose that we have saved a checkpoint at step `N` and the experiment have gone further for `K < N` steps before it stops, with the buffer that had already been filled at least one time. When we resume the buffer is laoded from the checkpoint, meaning that the `pos` attribute points at the same position it was pointing at step `N` and because we have memory-mapped our buffer we find in `[pos, pos + K]` a bunch of trajectories that comes from a "future" policy: the one that we were training in the previous experiment and stopped! Currently we don't know if this can cause problems to the agent and neither we have found a nice solution to mitigate this problem. We have thought at a bunch of ways to solve this problem: one is to memmap the buffer metadata like the current `pos`: in this way when we load the buffer from the checkpoint we can remove all the unwanted trajectories in `[old_pos, current_pos]`; this could potentially erase a lot of the buffer content if for example one has a checkpoint at step `N` and the experiment stopped at step `2N - 1`. Another solution could be to employ an online queue to save the trajectories momentarily into and flush the queue to the replay buffer only upon checkpointing; the problem with this solution is that one has to maintain in memory a lot of info and the RAM could explode easily if one is working with images (this can be avoided by also memory-mapping the online queue). Practically, another possible solution is to set the `algo.learning_starts=K` from the CLI or in the algorithm section in the experiment config: in this way all the future trajectories will be erased by random samples sampled from the resumed agent.
* In any case, when the checkpoint is resumed the buffer **could be potentially pre-filled for `algo.learning_starts` steps** with random actions sapled from the resumed agent. If you don't want to pre-fill the buffer set `algo.learning_starts=0`
* If the buffer is memory-mapped (i.e. `buffer.memmap=True`), one saves the buffer in the checkpoint and the buffer has been filled completely during the previous experiment (meaning that the olders trajectories have been overwritten by newer ones) then it could happen that the agent will be trained from "future" trajectories coming from a "future" policy. To be more precise the buffer is simply a pre-allocated numpy-array with an attribute `pos` that points to the first free slot to be written; if we are using a `sheeprl.data.buffers.SequentialReplayBuffer` we sample sequential sequences in `[0, pos - sequence_length) ∪ [pos, buffer_size)` or simply `[0, pos - sequence_length)` depending on whether the buffer has been filled or not respectively. When we save the buffer into the checkpoint we save all the relevant information regarding it (the `pos` attribute and the path to the memory-mapped files, which represents the buffer content to be retrieved upon resuming). Suppose that we have saved a checkpoint at step `N` and the experiment have gone further for `K < N` steps before it stops, with the buffer that had already been filled at least one time. When we resume the buffer is laoded from the checkpoint, meaning that the `pos` attribute points at the same position it was pointing at step `N` and because we have memory-mapped our buffer we find in `[pos, pos + K]` a bunch of trajectories that comes from a "future" policy: the one that we were training in the previous experiment and stopped! Currently we don't know if this can cause problems to the agent and neither we have found a nice solution to mitigate this problem. We have thought at a bunch of ways to solve this problem: one is to memmap the buffer metadata like the current `pos`: in this way when we load the buffer from the checkpoint we can remove all the unwanted trajectories in `[old_pos, current_pos]`; this could potentially erase a lot of the buffer content if for example one has a checkpoint at step `N` and the experiment stopped at step `2N - 1`. Another solution could be to employ an online queue to save the trajectories momentarily into and flush the queue to the replay buffer only upon checkpointing; the problem with this solution is that one has to maintain in memory a lot of info and the RAM could explode easily if one is working with images (this can be avoided by also memory-mapping the online queue). Practically, another possible solution is to set the `algo.learning_starts=K` from the CLI or in the algorithm section in the experiment config: in this way all the future trajectories will be erased by trajectories conditioned by the resumed agent.
* In any case, when the checkpoint is resumed the buffer **could be potentially pre-filled for `algo.learning_starts` steps** with trajectories conditioned by the resumed agent. If you don't want to pre-fill the buffer set `algo.learning_starts=0`
2 changes: 1 addition & 1 deletion sheeprl/configs/buffer/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ size: ???
memmap: True
validate_args: False
from_numpy: False
checkpoint: False # Used only for off-policy algorithms
checkpoint: True # Used only for off-policy algorithms
2 changes: 1 addition & 1 deletion sheeprl/configs/exp/dreamer_v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ checkpoint:
# Buffer
buffer:
size: 5000000
checkpoint: False
checkpoint: True

# Distribution
distribution:
Expand Down
2 changes: 1 addition & 1 deletion sheeprl/configs/exp/dreamer_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ checkpoint:
buffer:
size: 5000000
type: sequential
checkpoint: False
checkpoint: True
prioritize_ends: False

# Distribution
Expand Down
2 changes: 1 addition & 1 deletion sheeprl/configs/exp/dreamer_v3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ checkpoint:
# Buffer
buffer:
size: 1000000
checkpoint: False
checkpoint: True

# Distribution
distribution:
Expand Down
2 changes: 1 addition & 1 deletion sheeprl/configs/exp/sac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ checkpoint:
# Buffer
buffer:
size: 1000000
checkpoint: False
checkpoint: True
sample_next_obs: False

# Environment
Expand Down

0 comments on commit 5ec76bc

Please sign in to comment.