cuda cross-compilation with sdk #1547

aurelien-enchanted-tools · 2024-05-06T16:31:48Z

aurelien-enchanted-tools
May 6, 2024

Hello,

I decided to start cuda programming to better understand build issues.
I found an example here (https://developer.nvidia.com/blog/even-easier-introduction-cuda/) that I modified (with https://docs.nvidia.com/cuda/archive/11.4.0/cuda-c-programming-guide/index.html#kernels):

cat > add.cu << "EOF"
#include <iostream>
#include <math.h>
// Kernel function to add the elements of two arrays
__global__ void add(float* x, float* y, float* z) {
    int i = threadIdx.x;
    z[i] = x[i] + y[i];
}

int main(void) {
    int N = 1<<8;
    float *x, *y, *z;

    // Allocate Unified Memory – accessible from CPU or GPU
    cudaMallocManaged(&x, N*sizeof(float));
    cudaMallocManaged(&y, N*sizeof(float));
    cudaMallocManaged(&z, N*sizeof(float));

    // initialize x , y and z arrays on the host
    for (int i = 0; i < N; i++) {
        x[i] = 1.0f;
        y[i] = 2.0f;
        z[i] = 0.0f;
    }

    // Run kernel on 256 elements on the GPU
    add<<<1, N>>>(x, y, z);

    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();

    // all values should be 3.0f
    for (int i = 0; i < N; i++)
        std::cout << i << ":" << z[i] << std::endl;

    // Free memory
    cudaFree(x);
    cudaFree(y);
    cudaFree(z);

    return 0;
}
EOF

I build both the example and my customization, with ${CUDACXX} add.cu ${CUDAFLAGS} -ccbin=${CUDAHOSTCXX} -o add from an sdk env, and export it and run it to a orin NX target.

But for both, I don't obtain correct results. For the example, I obtain Max error: 1, and for my customization, I obtain a zero value for all index of z.

Why these do not give the expected results? Thank you in advance.

Edit:

It seems that the __global__ is never taken into account. I suspect there’s something missing in the build command line.

if I use from https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api#answer-14038590:

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

and if I add it after the cuda kernel call

    add<<<1, N>>>(x, y, z);
    gpuErrchk( cudaPeekAtLastError() );

I obtain at runtime:

GPUassert: no kernel image is available for execution on the device add.cu 36

EDIT: I found my mistake, incompatibility between the architecture of cuda of the sdk and that of the image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenEmbedded for Tegra

cuda cross-compilation with sdk #1547

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

OpenEmbedded for Tegra

cuda cross-compilation with sdk #1547

aurelien-enchanted-tools May 6, 2024

Replies: 0 comments

aurelien-enchanted-tools
May 6, 2024