-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible image builds for Fornax images #10
base: main
Are you sure you want to change the base?
Conversation
It's not exactly related to this PR, but I wonder if you experimented with pixi (https://github.com/prefix-dev/pixi)? It looked compelling during demos/talks at scipy and the high-energy physics folks have good reproducibility experiences with it (and it finally makes conda and pip on parity with each other). |
dependencies: | ||
- jupyterlab | ||
- dask | ||
- jupytext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have jupyterlab-myst
in here, too?
base-image/packages.txt
Outdated
nest-asyncio==1.6.0 | ||
notebook==7.2.0 | ||
notebook-shim==0.2.4 | ||
numpy==1.26.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some point we need to bump this to 2.0, and this migration feels to be a good point for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think the hold back is lsdb notebooks, and lightkurve, which has a some dependency that requires numpy<2. There may be others that I am not aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, they rely on healpy atm, and also one of the ML notebooks also has a dependency that requires np <2. So I would suggest having a reminder issue about numpy 2, and just roll with the old one for now.
tractor/conda-science_demo.yml
Outdated
- netpbm | ||
- acstools | ||
- astropy | ||
- astroquery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyvo is pulled in by astroquery, but it would good to spell it out separately as there is direct use of it.
Maybe the policy could be to list everything that is directly imported in the demo notebooks?
I know that we call this tractor as it was one of the complicated packages to install, but I wonder, if the base image doesn't really have the astro packages, that we should call this something more descriptive. Or have a bit more of the standard packages available in the base image, too (e.g. pyvo, astroquery, maybe scikit-learn, or some of the other astropy affiliated packages). |
The idea of the base image is that it contains the system and jupyter environment. This is not meant to be made available as an image to the end user. |
@bsipocz, does github CI supports building these images and say pushing them to dockerhub (assuming we have an account)? |
I haven't yet done it myself, but it seems to be possible. Happy to help figuring it out if we plan to do it here. |
tractor/Dockerfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more useful for the users to call this image something else rather than tractor. E.g. keep the forced photometry name and drop tractor?
(After all we just called it tractor informally as that was the most complicated component to install, but it has zero descriptive information content for a generic user, especially for one who is not familiar with that library)
This looks like a cool tool. Are we willing to force users to use this instead of conda directly? From the documentation, pixi uses conda under the hood, but you are discouraged from interacting with it directly. |
It promises better user experience, and the usage should not be that different, so I don't think it should be a big issue. (Being said, I'm not a conda user myself so it may not be the best input here. I'll ping the scientists on the team fin case they have strong opinions against trying this) |
…ult kernel; add config scripts
Until I figure out how to deal with nasa-fornax/fornax-demo-notebooks#303, can we have the demo notebook kernel named super genericly? E.g. (See Troy's comment here: nasa-fornax/fornax-demo-notebooks#321) |
The names |
I don't think that would work out of the box either, so far I only saw Here is what I got on CI (and of course locally, too): https://app.circleci.com/pipelines/github/nasa-fornax/fornax-demo-notebooks/184/workflows/3b48be23-3bc0-41cc-8e46-95914baa0f3c/jobs/139?invite=true#step-103-273 I have changed some of these back to python3, but in practice, people keep committing back unrelated metadata changes with their content changes, so the situation is not great. |
Re dependencies: I have cleaned up the dependency files for the currently existing notebooks. They are not yet conflicting, so if possible, it would be great to include them in the image. they are stored with their notebooks, so |
The reason there are named kernels is that in the high energy image, there are multiple ones, and different notebooks will use different kernels. |
Sure, but what about keeping the default (FYI: we don't use nbsphinx, but I'll check if our tools are checking nbsphinx config or if I can easily add this into MyST-NB myself). |
(Btw, this would come up to be a problem for anyone downloading a notebook and try to run it on their own machine, thus my preference to be as generic as possible for most things.) |
OK, I've found a config option for the kernel name, so that question should not block this PR, and I'm OK with calling it |
This is an attempt to migrate the Fornax images from gitlab. For now these images are for testing only and are not deployed in production. We will ultimately be using these public images.
The docker images are restructured from what we have on gitlab to make the builds reproducible, so building images at different times should produce images with the exact same packages (as much as possible). This is done with the
conda-lock
package. Reproducible builds will allow us to ultimately to create DOIs for them and cite them.I am putting this out here to get feedback. This PR contains most of the needed parts, and the content of the images mirrors what we have on fornax (as much as possible). A description of how the builds are done is in the readme file. Part of this structure was borrowed from PANGEO