Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Go bridge work on Windows. #47

Merged
merged 2 commits into from
Jul 29, 2024

Conversation

dot-asm
Copy link
Collaborator

@dot-asm dot-asm commented Jul 25, 2024

@sandsentinel, could you test this on Windows? As mentioned, annoyingly enough one needs three compilers on the %PATH%, nvcc, cl and mingw[!] gcc, by the time go test is executed. As for the gcc, consider tdm-gcc. I'm not using it myself, I'm using one that is available with msys2, but tdm should work.

@sandsentinel
Copy link
Collaborator

Fails with:

2024/07/26 10:29:48 WIN32 Error #0xc1
panic: WIN32 Error #0xc1

goroutine 1 [running]:
log.Panic({0x18afe44, 0x1, 0x1})
C:/Program Files (x86)/Go/src/log/log.go:432 +0x87
github.com/supranational/sppark/go.Load({0x75cd6d, 0x12}, {0x18afe6c, 0x2, 0x2})
C:/Users/Can/Desktop/sppark/go/sppark.go:179 +0x612
poc_cu/go.init.0()
C:/Users/Can/Desktop/sppark/poc/ntt-cuda/go/goldilocks.go:25 +0x85
exit status 2
FAIL poc_cu/go 5.290s

Upon further debugging I found out that the error comes from the LoadLibraryA call.

I'm using tdm-gcc and all 3 compilers are on %PATH%.

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 26, 2024

0xc1 means ERROR_BAD_EXE_FORMAT. Just in case for reference, I can compile and load the library, I simply can't exercise the CUDA code, but I do pass LoadLibraryA.

  • run go test -c and execute the emitted poc_cu.test.exe, the goal is to generate defective poc.dll in the current directory and examine it with dumpbin /headers;
  • what's your cl version (I mean execute cl without arguments and copy the first line of the output);
  • temporarily switch off antivirus prior to executing go test;

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 26, 2024

And what's your go version?

@sandsentinel
Copy link
Collaborator

what's your cl version

Microsoft (R) C/C++ Optimizing Compiler Version 19.40.33811 for x64

And what's your go version?

go version go1.22.5 windows/386

temporarily switch off antivirus prior to executing go test

Same error

and examine it with dumpbin /headers

Sending you the headers by mail

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 26, 2024

And what's your go version?

go version go1.22.5 windows/386

This is the culprit. It should read windows/amd64. Well, it should be possible to instruct it to target amd64, but it's better to have it native.

@sandsentinel
Copy link
Collaborator

This is the culprit. It should read windows/amd64

Oops. Changing this made it proceed further. Now I'm getting:

2024/07/26 12:50:57 copying C:\Users\Can\AppData\Local\Temp\go-build1996923370\b001\ntt_api.dll
PASS
exit status 0xc0000409
FAIL    poc_cu/go       5.658s

Debugging with gdb it reports drop_error_message as the top item on stack.

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 26, 2024

Now I'm getting:

This is strange. It attempts to copy dll after successful self-consistency check... So "FAIL" ought to mean that CUDA worked(*), it simply failed to copy the dll... Is there .dll left after attempt to troubleshoot with go test -c? If so, remove it and re-run go test. If it still fails, try to suspect suspend antivirus...

(*) Though it doesn't quite align with the fact that you spot drop_error_message. For reference 0xc0000409 means STATUS_STACK_BUFFER_OVERRUN in Windows, but it might happen that go is assigning own meaning to it.

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 26, 2024

Just in case for reference, I can execute go test in poc/go with "panic: CUDA driver version is insufficient for CUDA runtime version". This vouches for the ability to transfer the error message to Go, which entails drop_error_message on the C side.

@sandsentinel
Copy link
Collaborator

Every single run's /Temp/ folder is different, so there should be no leftover .dll. However after each run it does create a new ntt_api.dll in /ntt-cuda/go.

Running it with antivirus off produces the same error.

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 26, 2024

Every single run's /Temp/ folder is different, so there should be no leftover .dll. However after each run it does create a new ntt_api.dll in /ntt-cuda/go.

So it means that copy succeeds. What is failing then? I mean copy appears to succeed and it's performed after successful data comparison... So what's up with FAIL? ... Does go test work in poc/go? The "hello from GPU" thing...

@sandsentinel
Copy link
Collaborator

Does go test work in poc/go?

Yes, this one passes.

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 27, 2024

Try the following. Add return; in the beginning of ~NTTParameters() in ntt/parameters.cuh. If go test works, move the return; downward till it stops working...

@sandsentinel
Copy link
Collaborator

It fails at the first Dfree, but cudaGetDevice returns an error with code 4 which means the CUDA driver has already been unloaded.

@dot-asm
Copy link
Collaborator Author

dot-asm commented Jul 29, 2024

Hmm... According to the documentation cudaGetDevice can return a value other than cudaSuccess only from previous asynchronous calls, but the first cudaGetDevice is synchronous... Note that the ~NTTParameters() is called twice, hence I wonder if you could confirm that it's actually the first cudaGetDevice that fails. This is to figure out if documentation is correct...

On ROCm it's different, all calls are succeeding, but instead of test failing, the test application hangs so hard that the debugger couldn't show anything. This is after copying the .dll. This appears to be due to the mere presence of try-catch in Dfree. I mean if I replace Dfree-s with direct calls to cudaFreeAsync, the test finishes successfully. "Mere presence" means that I tried to try-catch{ignore}, but to no avail. In other words it looks like no-throwing-destructors has to be taken literally in Windows DLL context...

git pull and try again :-)

@sandsentinel
Copy link
Collaborator

hence I wonder if you could confirm that it's actually the first cudaGetDevice that fails.

Both calls fail with error code 4.

git pull and try again :-)

Works now!

dot-asm added 2 commits July 29, 2024 15:15
Mere presence of try-catch in destructor was proven to be problematic
in Windows DLL context.
@dot-asm dot-asm force-pushed the go-bridge-for-windows branch from 15d6648 to 43889a3 Compare July 29, 2024 13:19
@dot-asm dot-asm merged commit 874fb50 into supranational:main Jul 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants