Enable code for dynamic parallelism #96

thedodd · 2022-11-15T01:42:03Z

Closes Dynamic Parallelism | implementation strategy #94

thedodd · 2022-11-20T05:26:28Z

So, interestingly, I'm running into an issue where the generated code can not be loaded by Module::from_ptx. It will return error a PTX JIT compilation failed.

Some background on current testing:

I've put together a reference C++ program which uses dynamic parallelism (ultra simple).
I can execute the reference program and all is good, expected output/behavior.
I also have a reference Rust program which is attempting to use this update code for dynamic parallelism, same exact functionality, data types (fixed sized types in C++);
When I compare the PTX between the two programs, it is nearly identical;
C++ program runs, expected behavior and output.

Now, what is quite strange is that if I copy the PTX from the working C++ program over to the Rust program (disabling PTX gen in the Rust program to ensure the C++ PTX is not overwritten), the Rust program aborts with that same error a PTX JIT compilation failed.

According to ptxas, both PTX files are valid and compile to object code (ptxas -c ...).
This issue is triggered even from attempting to construct a stream device side.
- Note that in my tests to narrow this down, I've removed stream construction and I am just passing in a null stream to the cuda launch call on the device.
- It is just interesting that the module loader does not like the stream or the launch.

So, I am wondering:

Is there something intrinsically wrong with attempting to call cuda::cuModuleLoadDataEx when the PTX is using dynamic parallelism?
Is there a way we can bypass this?
This is where my experimentation is currently at.

thedodd · 2022-11-20T05:29:25Z

Perhaps we need to be manually constructing a linker, linking the PTX and the cudadevrt.lib, then compiling to a cubin and such. Will try that.

thedodd · 2022-11-20T05:55:56Z

Yea, that was it. Need to create a linker, add the PTX, add libcudadevrt (right now I have this hard-coded, but I need to create a dynamic search mechanism, as I don't think the cuda linker will do this on its own ... we'll see).

From there, I was able to successfully execute the PTX from the sample C++ app of mine. The generated Rust PTX has an invalid memory access taking place, and it looks like it is coming from how the buffer is being populated. This is still a step forward, as the code gen is much easier to fix. I at least know what I'm dealing with, instead of some opaque "JIT compilation failed" error.

thedodd · 2022-11-20T06:02:26Z

Yea, that did it. Code gen is far from optimal for loading the param buffer. But it works, and I am able to successfully use dynamic parallelism from the Rust generated PTX end to end. Expected output and behavior.

Macro codegen for populating the buffer can be optimized further, as the generated PTX is not optimal. I'll focus on that later.

unneeded stuff

apriori · 2022-12-20T00:39:25Z

crates/cust/src/link.rs

@@ -114,6 +114,28 @@ impl Linker {
        }
    }

+    /// Link device runtime lib.
+    pub fn add_libcudadevrt(&mut self) -> CudaResult<()> {
+        let mut bytes = std::fs::read("/usr/local/cuda-11/lib64/libcudadevrt.a")


When this PR is finalized, this should maybe be replaced by searching CUDA_PATH? Not sure what is the proper way.

Enable code for dynamic parallelism

3fd5949

WIP: working solution, but a lot of cruft. Clean up and unwind the

7d622ca

unneeded stuff

apriori reviewed Dec 20, 2022

View reviewed changes

WIP: pop this and continue experimentation

95f1066

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable code for dynamic parallelism #96

Enable code for dynamic parallelism #96

thedodd commented Nov 15, 2022 •

edited

Loading

thedodd commented Nov 20, 2022

thedodd commented Nov 20, 2022

thedodd commented Nov 20, 2022

thedodd commented Nov 20, 2022

apriori Dec 20, 2022

Enable code for dynamic parallelism #96

Are you sure you want to change the base?

Enable code for dynamic parallelism #96

Conversation

thedodd commented Nov 15, 2022 • edited Loading

thedodd commented Nov 20, 2022

thedodd commented Nov 20, 2022

thedodd commented Nov 20, 2022

thedodd commented Nov 20, 2022

apriori Dec 20, 2022

Choose a reason for hiding this comment

thedodd commented Nov 15, 2022 •

edited

Loading