-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929
base: master
Are you sure you want to change the base?
[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929
Conversation
… a couple more simplifications. Tuned offline examples to the new settings. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…rner'. Removed 'NumpyToTensor' connector from learner connector prior to 'OfflinePreLearner'. Signed-off-by: simonsays1980 <[email protected]>
…er connector. Signed-off-by: simonsays1980 <[email protected]>
@@ -341,6 +341,19 @@ py_test( | |||
args = ["--as-test", "--enable-new-api-stack"] | |||
) | |||
|
|||
py_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesooomeee!!! This is so cool!
@@ -356,6 +369,19 @@ py_test( | |||
args = ["--as-test", "--enable-new-api-stack"] | |||
) | |||
|
|||
py_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same! :D
@@ -564,6 +590,19 @@ py_test( | |||
args = ["--as-test", "--enable-new-api-stack"] | |||
) | |||
|
|||
py_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same same :D
@@ -98,6 +98,7 @@ def build_learner_connector( | |||
# Remove unneeded connectors from the MARWIL connector pipeline. | |||
pipeline.remove("AddOneTsToEpisodesAndTruncate") | |||
pipeline.remove("GeneralAdvantageEstimation") | |||
pipeline.remove("NumpyToTensor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaahh! So this was one of the problems? That we were already converting everything to torch tensors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it was one of them. The major one, was however, that we were passing in a learner that runs on GPU and that one needed to be serialized from ray to send it to the data workers. When deserializing it there it errored out.
if self._learner_connector is None: | ||
# Note, if we have a learner connector, but a `MultiAgentBatch` is passed in, | ||
# we are in an offline setting. | ||
# TODO (simon, sven): Check, if DreamerV3 has the same setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll see what tests doing :) DreamerV3 does not use connectors. It passes the batch from the replay buffer directly into update_from_batch
.
self._module = module_spec.build() | ||
self._module.set_state(module_state) | ||
|
||
# Build the module from spec. Note, this will be a MultiRLModule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification!
rllib/algorithms/marwil/marwil.py
Outdated
@@ -361,6 +362,9 @@ def build_learner_connector( | |||
pipeline.append( | |||
GeneralAdvantageEstimation(gamma=self.gamma, lambda_=self.lambda_) | |||
) | |||
pipeline.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. GAE connector outputs tensors already due to it requiring a VF forward pass (with the tensors coming from NumpyToTensor
). It does get a little more complicated now in the pipeline, but I feel like it's still ok (not too crazy, connector pieces are named properly, each piece performs a well distinguished task, ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And keep in mind: The data workers will run in parallel and prefetch batches which will actually make the pipeline quite smooth. Another connector piece or one less will not make a big difference, if at all. User usually have enough resources to run multiple data workers in parallel and they should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR. Love it! Thanks for these important fixes @simonsays1980 , unlocking GPU training for offline RL on the new API stack. This is huge!
Signed-off-by: Sven Mika <[email protected]>
…o resources not avilable. Signed-off-by: simonsays1980 <[email protected]>
…0/ray into offline-enable-gpu-training
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Why are these changes needed?
GPU and multi-GPU training was not working so far b/c of serialization errors driven by device mappings during the creation of
map_batches
workers. This PR solves these errors and brings several modifications/simplifications toOfflineData
,OfflinePreLearner
, andLearner
.Furthermore, it adds single-GPU learning tests for offline algorithms.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.