You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cat > add.cu << "EOF"
#include<iostream>
#include<math.h>// Kernel function to add the elements of two arrays
__global__ voidadd(float* x, float* y, float* z) {
int i = threadIdx.x;
z[i] = x[i] + y[i];
}
intmain(void) {
int N = 1<<8;
float *x, *y, *z;
// Allocate Unified Memory – accessible from CPU or GPUcudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
cudaMallocManaged(&z, N*sizeof(float));
// initialize x , y and z arrays on the hostfor (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
z[i] = 0.0f;
}
// Run kernel on 256 elements on the GPU
add<<<1, N>>>(x, y, z);
// Wait for GPU to finish before accessing on hostcudaDeviceSynchronize();
// all values should be 3.0ffor (int i = 0; i < N; i++)
std::cout << i << ":" << z[i] << std::endl;
// Free memorycudaFree(x);
cudaFree(y);
cudaFree(z);
return0;
}
EOF
I build both the example and my customization, with ${CUDACXX} add.cu ${CUDAFLAGS} -ccbin=${CUDAHOSTCXX} -o add from an sdk env, and export it and run it to a orin NX target.
But for both, I don't obtain correct results. For the example, I obtain Max error: 1, and for my customization, I obtain a zero value for all index of z.
Why these do not give the expected results? Thank you in advance.
Edit:
It seems that the __global__ is never taken into account. I suspect there’s something missing in the build command line.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I decided to start cuda programming to better understand build issues.
I found an example here (https://developer.nvidia.com/blog/even-easier-introduction-cuda/) that I modified (with https://docs.nvidia.com/cuda/archive/11.4.0/cuda-c-programming-guide/index.html#kernels):
I build both the example and my customization, with
${CUDACXX} add.cu ${CUDAFLAGS} -ccbin=${CUDAHOSTCXX} -o add
from an sdk env, and export it and run it to a orin NX target.But for both, I don't obtain correct results. For the example, I obtain
Max error: 1
, and for my customization, I obtain a zero value for all index of z.Why these do not give the expected results? Thank you in advance.
Edit:
It seems that the
__global__
is never taken into account. I suspect there’s something missing in the build command line.if I use from https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api#answer-14038590:
and if I add it after the cuda kernel call
I obtain at runtime:
EDIT: I found my mistake, incompatibility between the architecture of cuda of the sdk and that of the image.
Beta Was this translation helpful? Give feedback.
All reactions