-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first pass of gpu smoke test #281
base: master
Are you sure you want to change the base?
Conversation
Script collects version of driver, toolkit and tensorflow in a way that doesn't require communication with the card
Starting this PR to start the discussion about another attempt at building a script for a build level smoke test for gpu functionality. The committed script We could then have a JSON file of tested version combinations that this script would compare against and raise a warning if the current combination isn't on the list. This would again be run by config_R_cuda.sh like our last attempt did. It adds the maintenance burden of keeping up the JSON file, but may help to catch issues during the build. I'll also work on another PR that will have an Ansible playbook to spin up an EC2 gpu instance, run the stand alone gpu test script in tests/gpu/misc/test-gpu.sh, and then shut down the EC2 instance. This could be used to test new conbinations and populate the above mentioned JSON file. Ideally the playbook would be triggered by a manual GitHub action. How does this sound? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the scripts/tests
directory before, do you think test-config-cuda.sh
would be better placed under the directory?
And, I was wondering about a few details about scripts, so I commented on them. If they are inappropriate, please ignore them.
scripts/test-config-cuda.sh
Outdated
echo $VERSION_TOOLKIT | ||
|
||
# tensorflow | ||
if ! VERSION_TF_OUTPUT=`python -c 'import tensorflow as tf; print(tf.__version__)' 2>&1`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$(...)
may be preferable to `...`
. https://github.com/koalaman/shellcheck/wiki/SC2006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eitsupi! I made the change as you suggested in my latest commit. I didn't realize backticks were deprecated. I'll try to use parentheses going forward.
tests/gpu/misc/examples_tf.R
Outdated
@@ -0,0 +1,36 @@ | |||
|
|||
## Tensorflow: | |||
install.packages('keras', repos='http://cran.us.r-project.org') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to use a US CRAN mirror?
CRAN=${CRAN:-https://cran.r-project.org} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobJY is this to trigger source installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we need this line at all since tensorflow is installed in the containers by scripts/install_tensorflow.sh
?
When I attempt to run exampls_tf.R
without it though I get the following error:
Error in library(keras) : there is no package called ‘keras’
Execution halted
Running library(tensorflow)
in R gives a similar error message.
Running scripts/test-config-cuda.sh
reports the correct tensorflow version, but it's checking the version from Python with python -c 'import tensorflow as tf; print(tf.__version__)'
.
Do I need to add the path where tensorflow is installed by scripts/install_tensorflow.sh
somewhere so R sees it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I wanted to point out is that the repos
argument may simply be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @eitsupi, you're right. It worked fine when I removed repos
. Thanks! I've committed that change.
It still seems like it shouldn't need to install that though. I tried adding the path /opt/venv/reticulate
with the following code, but I got the same error:
old_path <- Sys.getenv("PATH")
Sys.setenv(PATH = paste(old_path, "/opt/venv/reticulate", sep = ":"))
Is it fine to install keras
here or does the fact that I need to install it indicate that there's an issue with the tensorflow install?
Thanks eitsupi!
and added command line argument checking
I have a question that may have gotten lost in a comment above. Is it ok to install keras here (scripts/tests/examples_tf.R) or does the fact that I need to install it indicate that there's an issue with the tensorflow install? It seems like keras should already be installed, but I get an error if I just run |
I think it makes sense to install keras here, as the python package are not installed in the base image (see install_tensorflow.sh) I've tried to move away from the idea of a 'global' venv with python modules pre-installed, since in my experience different users / different projects often want their own versions of all the modules. I think we need to remove the multi-user venv setup created by setting |
Oh, I see. That makes sense to me now. Thank you for explaining @cboettig! I think this PR is ready to go. Please let me know if you'd like any other changes. |
@noamross can you give this another look over too? |
Script collects version of driver, toolkit and tensorflow in a way that doesn't require communication with the card