-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ubuntu24.04] Install the driver in a single step #156
base: main
Are you sure you want to change the base?
Conversation
ece1c16
to
e6bc8a8
Compare
e6bc8a8
to
41d0861
Compare
41d0861
to
2a1f2b2
Compare
fi | ||
fi | ||
fi | ||
|
||
export IGNORE_CC_MISMATCH=1 | ||
make -s -j ${MAX_THREADS} SYSSRC=/lib/modules/${KERNEL_VERSION}/build nv-linux.o nv-modeset-linux.o > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By removing this code, we are no longer using the MAX_THREADS
variable which was introduced here: f18cc79
The runfile installer does expose a similar option that controls the concurrency level when building the kernel modules. I am wondering if we can convert MAX_THREADS
(if set) to the relevant option and add it to the list of install_args
used when installing the run file below.
-j CONCURRENCY-LEVEL, --concurrency-level=CONCURRENCY-LEVEL
Set the concurrency level for operations such as building the kernel module which may be parallelized on SMP systems. By default, this will be set to the number of detected CPUs, or to '1', if nvidia-installer fails to detect the number of CPUs. Systems with a large number of CPUs will have the default concurrency level limited to 32; setting a higher level on the command line will override this limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed. Please check now
cd /usr/src/nvidia-${DRIVER_VERSION} | ||
if [ -d /lib/modules/${KERNEL_VERSION}/kernel/drivers/video ]; then | ||
rm -rf /lib/modules/${KERNEL_VERSION}/kernel/drivers/video | ||
else | ||
rm -rf /lib/modules/${KERNEL_VERSION}/video | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question -- is it safe to remove this? I am unfamiliar with why this logic was required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove this. It wouldn't work otherwise
ubuntu24.04/nvidia-driver
Outdated
--x-module-path=/tmp/null \ | ||
--x-library-path=/tmp/null \ | ||
--x-sysconfig-path=/tmp/null \ | ||
-m=${KERNEL_TYPE} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR -- looking at this again, it is probably best to not explicitly set this option unless the user sets KERNEL_TYPE
(or whatever API is exposed by the operator). Especially now that we are leveraging the nvidia-installer
to perform the compilation and installation, we can depend on the defaults the nvidia-installer
applies for KERNEL_TYPE
-- for example, on newer driver versions it will automatically choose to install the open modules on compatible systems.
2a1f2b2
to
b4041f6
Compare
Signed-off-by: Tariq Ibrahim <[email protected]>
b4041f6
to
38a6df2
Compare
This change condenses the two-step driver install into a single step.
Currently, the driver image installs the userspace components and kernel modules separately. This was to allow for signing of the kernel modules with a custom private key and then relinking the signed kernel modules as well as updating the Kernel module should the underlying kernel host be updated.
As none of these workflow apply today, we simplify the driver installation and allow for defining an API in gpu-operator where users can easily pass custom runfile installation arguments
I have tested Driver upgrades and updates (both Open and ClosedRM modules) with these changes and the driver container has been running with no issues