first pass of gpu smoke test #281

RobJY · 2021-11-05T21:28:25Z

Script collects version of driver, toolkit and tensorflow in a way that doesn't require communication with the card

RobJY · 2021-11-05T21:39:06Z

Starting this PR to start the discussion about another attempt at building a script for a build level smoke test for gpu functionality. The committed script test-config-cuda.sh collects the driver, toolkit and tensorflow versions in a way that shouldn't rely on communication with the gpu card, so should be available during the build.

We could then have a JSON file of tested version combinations that this script would compare against and raise a warning if the current combination isn't on the list. This would again be run by config_R_cuda.sh like our last attempt did.

It adds the maintenance burden of keeping up the JSON file, but may help to catch issues during the build.

I'll also work on another PR that will have an Ansible playbook to spin up an EC2 gpu instance, run the stand alone gpu test script in tests/gpu/misc/test-gpu.sh, and then shut down the EC2 instance. This could be used to test new conbinations and populate the above mentioned JSON file. Ideally the playbook would be triggered by a manual GitHub action.

How does this sound?

eitsupi

I created the scripts/tests directory before, do you think test-config-cuda.sh would be better placed under the directory?

And, I was wondering about a few details about scripts, so I commented on them. If they are inappropriate, please ignore them.

scripts/test-config-cuda.sh

eitsupi · 2021-11-09T13:12:42Z

scripts/test-config-cuda.sh

+echo $VERSION_TOOLKIT
+
+# tensorflow
+if ! VERSION_TF_OUTPUT=`python -c 'import tensorflow as tf; print(tf.__version__)' 2>&1`;


$(...) may be preferable to `...`. https://github.com/koalaman/shellcheck/wiki/SC2006

Thanks @eitsupi! I made the change as you suggested in my latest commit. I didn't realize backticks were deprecated. I'll try to use parentheses going forward.

eitsupi · 2021-11-09T13:14:25Z

tests/gpu/misc/examples_tf.R

@@ -0,0 +1,36 @@
+
+## Tensorflow:
+install.packages('keras', repos='http://cran.us.r-project.org')


Is there any reason to use a US CRAN mirror?

rocker-versioned2/scripts/install_R.sh

Line 9 in a382d7d

CRAN=${CRAN:-https://cran.r-project.org}

@RobJY is this to trigger source installation?

Should we need this line at all since tensorflow is installed in the containers by scripts/install_tensorflow.sh?
When I attempt to run exampls_tf.R without it though I get the following error:

Error in library(keras) : there is no package called ‘keras’ Execution halted

Running library(tensorflow) in R gives a similar error message.

Running scripts/test-config-cuda.sh reports the correct tensorflow version, but it's checking the version from Python with python -c 'import tensorflow as tf; print(tf.__version__)'.

Do I need to add the path where tensorflow is installed by scripts/install_tensorflow.sh somewhere so R sees it?

What I wanted to point out is that the repos argument may simply be unnecessary.

Yes @eitsupi, you're right. It worked fine when I removed repos. Thanks! I've committed that change.

It still seems like it shouldn't need to install that though. I tried adding the path /opt/venv/reticulate with the following code, but I got the same error:

old_path <- Sys.getenv("PATH") Sys.setenv(PATH = paste(old_path, "/opt/venv/reticulate", sep = ":"))

Is it fine to install keras here or does the fact that I need to install it indicate that there's an issue with the tensorflow install?

Thanks eitsupi!

and added command line argument checking

RobJY · 2022-03-10T19:29:20Z

I have a question that may have gotten lost in a comment above.

Is it ok to install keras here (scripts/tests/examples_tf.R) or does the fact that I need to install it indicate that there's an issue with the tensorflow install? It seems like keras should already be installed, but I get an error if I just run library(keras) without installing first.

cboettig · 2022-03-11T00:53:19Z

I think it makes sense to install keras here, as the python package are not installed in the base image (see install_tensorflow.sh)

I've tried to move away from the idea of a 'global' venv with python modules pre-installed, since in my experience different users / different projects often want their own versions of all the modules.

I think we need to remove the multi-user venv setup created by setting WORKON_HOME, https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_python.sh#L7, since this too easily leads to permission conflicts with mutliple users installing/upgrading packages in a shared environment. I think it would be okay for us to create a default environment in the default user's home dir (i.e. ~/rstudio) instead, but using /opt/venv was a bad idea. R is happy to draw packages across multiple local libraries; e.g. a user can access packages already installed in R_HOME with read-only privileges while adding new/upgraded packages to their home dir, ~/R, but as this doesn't work with python venv, there seems less point in having the shared library pre-installed...

RobJY · 2022-03-11T19:07:01Z

Oh, I see. That makes sense to me now. Thank you for explaining @cboettig!

I think this PR is ready to go. Please let me know if you'd like any other changes.

cboettig · 2022-03-11T19:44:22Z

@noamross can you give this another look over too?

first pass of gpu smoke test

fbc570b

Script collects version of driver, toolkit and tensorflow in a way that doesn't require communication with the card

eitsupi reviewed Nov 9, 2021

View reviewed changes

RobJY added 3 commits November 12, 2021 16:55

removed duplicated code

5e4a5ba

Thanks eitsupi!

changed from backticks to parentheses

a623852

and added command line argument checking

removed repo from keras install

2ef3245

RobJY added 2 commits March 11, 2022 13:46

Merge branch 'master' into smoke-test-2

10b6ea7

removed repo from keras install command

96d5855

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first pass of gpu smoke test #281

first pass of gpu smoke test #281

RobJY commented Nov 5, 2021

RobJY commented Nov 5, 2021

eitsupi left a comment

eitsupi Nov 9, 2021

RobJY Jan 6, 2022

eitsupi Nov 9, 2021

cboettig Nov 18, 2021

RobJY Jan 6, 2022

eitsupi Jan 7, 2022

RobJY Jan 7, 2022 •

edited

Loading

RobJY commented Mar 10, 2022

cboettig commented Mar 11, 2022

RobJY commented Mar 11, 2022

cboettig commented Mar 11, 2022

		@@ -0,0 +1,36 @@

		## Tensorflow:
		install.packages('keras', repos='http://cran.us.r-project.org')

first pass of gpu smoke test #281

Are you sure you want to change the base?

first pass of gpu smoke test #281

Conversation

RobJY commented Nov 5, 2021

RobJY commented Nov 5, 2021

eitsupi left a comment

Choose a reason for hiding this comment

eitsupi Nov 9, 2021

Choose a reason for hiding this comment

RobJY Jan 6, 2022

Choose a reason for hiding this comment

eitsupi Nov 9, 2021

Choose a reason for hiding this comment

cboettig Nov 18, 2021

Choose a reason for hiding this comment

RobJY Jan 6, 2022

Choose a reason for hiding this comment

eitsupi Jan 7, 2022

Choose a reason for hiding this comment

RobJY Jan 7, 2022 • edited Loading

Choose a reason for hiding this comment

RobJY commented Mar 10, 2022

cboettig commented Mar 11, 2022

RobJY commented Mar 11, 2022

cboettig commented Mar 11, 2022

RobJY Jan 7, 2022 •

edited

Loading