-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Go bridge work on Windows. #47
Conversation
Fails with: 2024/07/26 10:29:48 WIN32 Error #0xc1 goroutine 1 [running]: Upon further debugging I found out that the error comes from the I'm using |
0xc1 means ERROR_BAD_EXE_FORMAT. Just in case for reference, I can compile and load the library, I simply can't exercise the CUDA code, but I do pass LoadLibraryA.
|
And what's your |
Same error
Sending you the headers by mail |
This is the culprit. It should read |
Oops. Changing this made it proceed further. Now I'm getting:
Debugging with gdb it reports |
This is strange. It attempts to copy dll after successful self-consistency check... So "FAIL" ought to mean that CUDA worked(*), it simply failed to copy the dll... Is there .dll left after attempt to troubleshoot with (*) Though it doesn't quite align with the fact that you spot drop_error_message. For reference 0xc0000409 means STATUS_STACK_BUFFER_OVERRUN in Windows, but it might happen that go is assigning own meaning to it. |
Just in case for reference, I can execute |
Every single run's /Temp/ folder is different, so there should be no leftover .dll. However after each run it does create a new ntt_api.dll in /ntt-cuda/go. Running it with antivirus off produces the same error. |
So it means that copy succeeds. What is failing then? I mean copy appears to succeed and it's performed after successful data comparison... So what's up with FAIL? ... Does |
Yes, this one passes. |
Try the following. Add |
It fails at the first |
Hmm... According to the documentation cudaGetDevice can return a value other than cudaSuccess only from previous asynchronous calls, but the first cudaGetDevice is synchronous... Note that the On ROCm it's different, all calls are succeeding, but instead of test failing, the test application hangs so hard that the debugger couldn't show anything. This is after copying the .dll. This appears to be due to the mere presence of try-catch in Dfree. I mean if I replace Dfree-s with direct calls to cudaFreeAsync, the test finishes successfully. "Mere presence" means that I tried to try-catch{ignore}, but to no avail. In other words it looks like no-throwing-destructors has to be taken literally in Windows DLL context...
|
Both calls fail with error code 4.
Works now! |
Mere presence of try-catch in destructor was proven to be problematic in Windows DLL context.
15d6648
to
43889a3
Compare
@sandsentinel, could you test this on Windows? As mentioned, annoyingly enough one needs three compilers on the %PATH%, nvcc, cl and mingw[!] gcc, by the time
go test
is executed. As for the gcc, consider tdm-gcc. I'm not using it myself, I'm using one that is available with msys2, but tdm should work.