Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfaults when using CUDA #1397

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aswild
Copy link

@aswild aswild commented Dec 9, 2024

Summary: switch from using xxd to bin2c when generating the .ptx.c files so that the PTX data can be null-terminated.

In newer drivers or cuda versions, vmaf now segfaults when trying to do anything from the GPU. The coredumps indicate that the crash happens somewhere inside the cuModuleLoadData calls in init_fex_cuda.

Documentation for cuModuleLoadData states that its image argument can be "obtained by mapping a cubin or PTX or fatbin file, [or] passing a cubin or PTX or fatbin file as a NULL-terminated text string...". It looks like VMAF is trying to do the latter, encoding PTX text files as an ASCII string using xxd, but there's no null-terminator in the data because nothing asked for one.

I'm a CUDA noob and don't know how this ever worked on older driver versions, but I tried editing the .ptx.c files by hand to add 0x00 bytes at the end and it worked!

Switch from xxd to bin2c (which is distributed with the cuda-nvcc package) that supports a --padd option to add a null byte to the PTX data, eliminating the segfaults. The arrays got renamed slightly to remove the src_ prefix, since bin2c doesn't do any automatic naming of the output array.

This should resolve #1357

Summary: switch from using xxd to bin2c when generating the .ptx.c files
so that the PTX data can be null-terminated.

In newer drivers or cuda versions, vmaf now segfaults when trying to do
anything from the GPU. The coredumps indicate that the crash happens
somewhere inside the cuModuleLoadData calls in init_fex_cuda.

Documentation for cuModuleLoadData states that its `image` argument can
be "obtained by mapping a cubin or PTX or fatbin file, [or] passing
a cubin or PTX or fatbin file as a NULL-terminated text string...". It
looks like VMAF is trying to do the latter, encoding PTX text files as
an ASCII string using xxd, but there's no null-terminator in the data
because nothing asked for one.

I'm a CUDA noob and don't know how this ever worked on older driver
versions, but I tried editing the .ptx.c files by hand to add 0x00 bytes
at the end and it worked!

Switch from xxd to bin2c (which is distributed with the cuda-nvcc
package) that supports a `--padd` option to add a null byte to the PTX
data, eliminating the segfaults. The arrays got renamed slightly to
remove the src_ prefix, since bin2c doesn't do any automatic naming of
the output array.
@nilfm99 nilfm99 requested a review from kylophone December 9, 2024 19:05
@nilfm99
Copy link
Collaborator

nilfm99 commented Dec 9, 2024

Thanks for the contribution! @kylophone is this something you could easily test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libvmaf cuda - init_fex_cuda: Assertion `0' failed.
2 participants