-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include core ML libraries on the ML image #756
base: master
Are you sure you want to change the base?
Conversation
Once this builds we should be able to do:
on any machine with NVIDIA GPUs available and nvidia container runtime configured. tensorflow is still more verbose than torch, I see:
but I don't think these messages about the plugins already being registered are cause for concern. It should detect the GPU device at the end. Points for discussion:
|
@eitsupi is there a way to run superlinter locally, and a way to have it automatically reformat the scripts? |
See https://github.com/super-linter/super-linter
|
@eitsupi the two failing tests occur because the github runners don't have enough disk space for these large images. I don't think the images are too large to build by themselves, but not sure what the right way forward is here? any suggestions? |
I think it's common to delete unnecessary things in the runner. The most thorough one I know of is the one in the Apache Arrow main repository. |
I'll take a look and see what can be done to make more space on the runner, maybe the test can be streamlined. Also, the texlive manual installs from CTAN mirrors continue to be incredibly unreliable and create erroneous test failures. I think your proposed new build system will avoid a lot of these issues by not forcing these tests on unrelated images anyway. |
Now other unrelated tests are failing because rstudio-daily website is throwing http-500 errors 😭 https://rstudio.org/download/latest/daily/server/jammy/rstudio-server-latest-amd64.deb |
projects (e.g. pangeo) will often host separate images with torch vs tensorflow as each is quite large. Current sizes (before adding any of these) in our stack are:
rocker/cuda is quite huge, 'ml' adds ~ tidyverse/rstudio R packages for a modest further increase. But each of the ML frameworks add another 4 - 5 GB
in the grand scheme of things maybe that's not to large, but not small... thoughts? |
A simple question: Can't the user install those packages? That's all we're doing here, right? python3 -m pip install --no-cache-dir "torch" "tensorflow[and-cuda]" "keras"
install2.r --error --skipmissing --skipinstalled torch tensorflow keras I'm also concerned that the version of the Python package is not fixed, and I heard that the R |
This is a good question, and it comes down to question of context. Yes, obviously these can be installed by a user, (just like everything we do here). But there are several reasons why we would want to pre-package them:
I agree that in some use cases users might rather start with the |
Related to this: I think it would be better to stop rebuilding old ml images. In the first place, I suspect that it is faster to install R on a huge Python image than to install a large number of Python packages on an R-based image, since we can now easily install any version of R with |
Anyway, I strongly disagree with merging this now. These images are too large and may fail to build. |
I agree that we should have some separation between the building and testing of the ml images and the base images. Arguably this could be moved out of the And yes, there may difficulties in having these images built within the existing deployment action setup we have in place in this repository. However there certainly is no more general problem in building these images on GitHub actions -- I am regularly building and pushing images with these libraries via github actions on my own repositories.
I don't really understand the logic here. This isn't about what is faster. Faster isn't the goal here. These python packages are required by some R packages, and the goal of this stack is to support ML use cases of R users. As I stated at the very top of this PR, the goal of this PR is so users can do something like what @eddelbuettel suggested:
That is something I'd like to support. That's something that we've had numerous users want to support. Yes, sure there are already lots of ways to do this and we could just say "find a huge python image and run rig", but in my experience that's not a satisfactory answer for many users. Anyway, I totally want to respect not breaking things here. Yes there are open questions in how best to go about this, yes the answers keep changing, they always have in the decade + we have been doing rocker and that's the point -- to have a place for community to share and discuss ways of doing that. How do you feel about moving all of the cuda-focused development into a separate repo? It could be a nice platform on which to test drive a multistage build and other elements. I think the main issue is that it would want to copy the |
I see value in continuing the mono repo and I think it would be hard to set up the workflow, etc. again in another repository. |
Now that I think we finally have a nice way to provide a default python environment without locking users into that virtualenv, I think we can provide out-of-the-box support for the main ml frameworks one would probably expect to have in such a large and GPU-focused image.
This adds support for
tensorflow
,keras
, andtorch
out-of-the-box, with both the corresponding R and python libraries.