-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outdated] Scaling workspace resources #2194
[Outdated] Scaling workspace resources #2194
Conversation
Add another workspace memory resource that does not have the explicit memory limit. It should be used for large allocation; a user can set it to the host-memory-backed resource, such as managed memory, for better scaling and to avoid many OOMs.
Thanks Artem for proposing this solution. On one hand, it is nice to have a secondary workspace allocator to handle large allocations. I need to still think about this. An alternative solution would be to keep a single workspace allocator, and that would provide the large allocator as a fall-back when allocating from the fast (but smaller) pool fails. |
@@ -144,7 +177,7 @@ class workspace_resource_factory : public resource_factory { | |||
// Note, the workspace does not claim all this memory from the start, so it's still usable by | |||
// the main resource as well. | |||
// This limit is merely an order for algorithm internals to plan the batching accordingly. | |||
return total_size / 2; | |||
return total_size / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OOM errors we have seen with CAGRA were related to workspace pool grabbing all this place. What about limiting to a much smaller workspace size? (E.g. faiss has 1.5 GiB limit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an option, but so far I think it's not necessary. I also think it can hurt performance a little by reducing the batch size in places like ivf_pq::search or ivf_pq::extend.
With the current proposal, ann-bench executable (as a user of raft) set these resources:
- default - pool on top of device memory
- limited workspace - shares the same pool with default
- large workspace - managed memory (without pooling)
Hence the dataset/user allocations do not conflict for the same memory with the workspace (as they both use the same pool). At the same time, large temporary allocations (such as the cagra graph on device) use the managed memory and free it as soon as the algorithm finishes.
Thanks for joining the conversation, Tamas. I've updated the description with my rationale since you reviewed the PR. |
Benchmarking update: there's a limited evidence that the update improves performance of |
Thanks Artem for the update! It is a nice idea to have an extra memory resource that we can use for potentially host mem backed large temporary allocations. This can be useful for systems with improved H2D interconnect, such as Grace Hopper. |
Opened #2322 dropping the changes to neighbor methods, which are moved to cuVS. Keeping this PR open, so that we can copy those neighbor changes when cuVS is ready for them. |
@achirkin is this ready to be closed now that you've started a new PR for this? |
@cjnolet If you don't mind, I'd like to keep it open until we open cuVS PR with the corresponding neighbor changes. |
Use raft's large workspace resource for large temporary allocations during ANN index build. This is the port of rapidsai/raft#2194, which didn't make into raft before the algorithms were ported to cuVS. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #181
Closing this as rapidsai/cuvs#181 got merged in cuVS |
Use raft's large workspace resource for large temporary allocations during ANN index build. This is the port of rapidsai/raft#2194, which didn't make into raft before the algorithms were ported to cuVS. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai/cuvs#181
Brief
Add another workspace memory resource that does not have the explicit memory limit. That is, after the change we have the following:
rmm::mr::get_current_device_resource()
is default for all allocations, as before. It is used for the allocations with unlimited lifetime, e.g. returned to the user.raft::get_workspace_resource()
is for temporary allocations and forced to have fixed size, as before. However, it becomes smaller and should be used only for allocations, which do not scale with problem size. It defaults to a thin layer on top of thecurrent_device_resource
.raft::get_large_workspace_resource()
(new) is for temporary allocations, which can scale with the problem size. Unlikeworkspace_resource
, its size is not fixed. By default, it points to thecurrent_device_resource
, but the user can set it to something backed by the host memory (e.g. managed memory) to avoid OOM exceptions when there's not enough device memory left.Problem
We have a list of issues/preference/requirements, some of which contradict others
rmm::mr::pool_memory_resource
for performance reasons (to avoid lots of cudaMalloc calls in the loops)workspace_resource
, because some of them scale with the problem size and would inevitably fail with OOM at some point.Solution
I propose to split the workspace memory into two:
Notes: