Skip to content

Commit

Permalink
[JNI] remove rmm argument to set rw access for fabric handles (#17553)
Browse files Browse the repository at this point in the history
This is a follow up from #17526, where fabric handles can be enabled from RMM. That PR also sets the memory access protection flag (`cudaMemPoolSetAccess`), but I have learned that this second flag is not needed from the owner device. In fact, it causes confusion because the owning device fails to call this function with some of the flags (access none).  `cudaMemPoolSetAccess` is meant to only be called from peer processes that have imported the pool's handle. In our case, UCX handles this from the peer's side and it does not need to be anywhere in RMM or cuDF.

Sorry for the noise. I'd like to get this fix in, and then I am going to fix RMM by removing that API.

Authors:
  - Alessandro Bellina (https://github.com/abellina)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)
  - Jason Lowe (https://github.com/jlowe)

URL: #17553
  • Loading branch information
abellina authored Dec 9, 2024
1 parent f595592 commit 5b412dc
Showing 1 changed file with 5 additions and 11 deletions.
16 changes: 5 additions & 11 deletions java/src/main/native/src/RmmJni.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -778,17 +778,11 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_Rmm_newCudaAsyncMemoryResource(
try {
cudf::jni::auto_set_device(env);

// When we are using fabric, we need to set the memory access to be
// read_write, in order for peer GPUs to have access to this memory.
// Otherwise, choose default parameters (optional set to nullopt).
auto [handle_type, prot_flag] =
fabric
? std::pair{std::optional{
rmm::mr::cuda_async_memory_resource::allocation_handle_type::fabric},
std::optional{rmm::mr::cuda_async_memory_resource::access_flags::read_write}}
: std::pair{std::nullopt, std::nullopt};

auto ret = new rmm::mr::cuda_async_memory_resource(init, release, handle_type, prot_flag);
auto handle_type =
fabric ? std::optional{rmm::mr::cuda_async_memory_resource::allocation_handle_type::fabric}
: std::nullopt;

auto ret = new rmm::mr::cuda_async_memory_resource(init, release, handle_type);

return reinterpret_cast<jlong>(ret);
}
Expand Down

0 comments on commit 5b412dc

Please sign in to comment.