Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. #47929

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Oct 8, 2024

Why are these changes needed?

GPU and multi-GPU training was not working so far b/c of serialization errors driven by device mappings during the creation of map_batches workers. This PR solves these errors and brings several modifications/simplifications to OfflineData, OfflinePreLearner, and Learner.
Furthermore, it adds single-GPU learning tests for offline algorithms.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

… a couple more simplifications. Tuned offline examples to the new settings.

Signed-off-by: simonsays1980 <[email protected]>
@simonsays1980 simonsays1980 added enhancement Request for new feature and/or capability rllib RLlib related issues rllib-gpu-multi-gpu RLlib issues that's related to running on one or multiple GPUs rllib-offline-rl Offline RL problems labels Oct 8, 2024
@simonsays1980 simonsays1980 marked this pull request as ready for review October 8, 2024 12:02
…rner'. Removed 'NumpyToTensor' connector from learner connector prior to 'OfflinePreLearner'.

Signed-off-by: simonsays1980 <[email protected]>
@sven1977 sven1977 changed the title [RLlib; Offline RL] - Enable GPU and multi-GPU training for offline algorithms. [RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms. Oct 8, 2024
@@ -341,6 +341,19 @@ py_test(
args = ["--as-test", "--enable-new-api-stack"]
)

py_test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesooomeee!!! This is so cool!

@@ -356,6 +369,19 @@ py_test(
args = ["--as-test", "--enable-new-api-stack"]
)

py_test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same! :D

@@ -564,6 +590,19 @@ py_test(
args = ["--as-test", "--enable-new-api-stack"]
)

py_test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same same :D

@@ -98,6 +98,7 @@ def build_learner_connector(
# Remove unneeded connectors from the MARWIL connector pipeline.
pipeline.remove("AddOneTsToEpisodesAndTruncate")
pipeline.remove("GeneralAdvantageEstimation")
pipeline.remove("NumpyToTensor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaahh! So this was one of the problems? That we were already converting everything to torch tensors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was one of them. The major one, was however, that we were passing in a learner that runs on GPU and that one needed to be serialized from ray to send it to the data workers. When deserializing it there it errored out.

if self._learner_connector is None:
# Note, if we have a learner connector, but a `MultiAgentBatch` is passed in,
# we are in an offline setting.
# TODO (simon, sven): Check, if DreamerV3 has the same setting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see what tests doing :) DreamerV3 does not use connectors. It passes the batch from the replay buffer directly into update_from_batch.

self._module = module_spec.build()
self._module.set_state(module_state)

# Build the module from spec. Note, this will be a MultiRLModule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification!

@@ -361,6 +362,9 @@ def build_learner_connector(
pipeline.append(
GeneralAdvantageEstimation(gamma=self.gamma, lambda_=self.lambda_)
)
pipeline.append(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. GAE connector outputs tensors already due to it requiring a VF forward pass (with the tensors coming from NumpyToTensor). It does get a little more complicated now in the pipeline, but I feel like it's still ok (not too crazy, connector pieces are named properly, each piece performs a well distinguished task, ...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And keep in mind: The data workers will run in parallel and prefetch batches which will actually make the pipeline quite smooth. Another connector piece or one less will not make a big difference, if at all. User usually have enough resources to run multiple data workers in parallel and they should.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR. Love it! Thanks for these important fixes @simonsays1980 , unlocking GPU training for offline RL on the new API stack. This is huge!

@sven1977 sven1977 enabled auto-merge (squash) October 8, 2024 18:38
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 8, 2024
@github-actions github-actions bot disabled auto-merge October 9, 2024 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-gpu-multi-gpu RLlib issues that's related to running on one or multiple GPUs rllib-newstack rllib-offline-rl Offline RL problems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants