Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some biocontainer images fail to load #649

Closed
mdehollander opened this issue Jun 5, 2023 · 43 comments
Closed

Some biocontainer images fail to load #649

mdehollander opened this issue Jun 5, 2023 · 43 comments

Comments

@mdehollander
Copy link

mdehollander commented Jun 5, 2023

Describe the bug

$ module load quay.io/biocontainers/fastp
Lmod Warning:  MODULEPATH directory: "~/singularity-hpc/modules" has too many non-modulefiles (413). Please make sure that modulefiles are in their own directory and not mixed
in with non-modulefiles (e.g. source code)

Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     ~/singularity-hpc/modules/quay.io/biocontainers/fastp/0.23.2--h5f740d0_3/module.lua: [string "-- Lmod Module..."]:98: unfinished string near '"deb-list : gcc-8-base_8.3.0-6_amd64.deb'
     Please check the modulefile and especially if there is a line number specified in the above message 
While processing the following module(s):

The too many modulefiles warning also happens for tools that are working (trinity for example). But this one has a deb-list error and therefore does not work

To Reproduce
Steps to reproduce the behavior:
Install via:

shpc install quay.io/biocontainers/fastp

Expected behavior
The tool loads without warning or errors.

Version of Singularity and Singularity Registry HPC Client

$ shpc --version
0.1.22
$ module --version

Modules based on Lua: Version 8.7.24  2023-05-04 15:12 -05:00
    by Robert McLay [email protected] 
$ singularity --version
apptainer version 1.1.8

Here is a link to the generated fastp module.lua file: https://bin.disroot.org/?440b21e6dfca18ca#9JNBgP8Uuo65TYs27eckSy4AFqn9ZUFVPRsjLKR3gUV2

@vsoch
Copy link
Member

vsoch commented Jun 5, 2023

This looks like a module software issue, or possibly we have a typo. Pinging @georgiastuart and @marcodelapierre for their expertise! Thank you for the link (this will be hugely helpful to reproduce it).

@georgiastuart
Copy link
Contributor

georgiastuart commented Jun 5, 2023

Not sure if this is the reason, but these parentheses seem prematurely closed:

whatis("busybox    : BusyBox v1.32.1 (2021-04-13 11:15:36 UTC) multi-call binary.")
whatis("deb-list    : gcc-8-base_8.3.0-6_amd64.deb")
libc6_2.28-10_amd64.deb
libgcc1_1%3a8.3.0-6_amd64.deb
bash_5.0-4_amd64.deb
libc-bin_2.28-10_amd64.deb
libtinfo6_6.1+20181013-2+deb10u2_amd64.deb
ncurses-base_6.1+20181013-2+deb10u2_all.deb
base-files_10.3+deb10u9_amd64.deb")

Specifically in the deb-list line. The parens are closed twice.

@vsoch
Copy link
Member

vsoch commented Jun 5, 2023

Ah so possibly we need to test stripping out (or escaping) parentheses in the name and versions

@georgiastuart
Copy link
Contributor

Oh, no I think the version parens are fine. It's on the next line. Sorry, grabbed a line too early.

@vsoch
Copy link
Member

vsoch commented Jun 5, 2023

okay so this part is wonky?

whatis("deb-list    : gcc-8-base_8.3.0-6_amd64.deb")
libc6_2.28-10_amd64.deb
libgcc1_1%3a8.3.0-6_amd64.deb
bash_5.0-4_amd64.deb
libc-bin_2.28-10_amd64.deb
libtinfo6_6.1+20181013-2+deb10u2_amd64.deb
ncurses-base_6.1+20181013-2+deb10u2_all.deb
base-files_10.3+deb10u9_amd64.deb")

It looks like those are labels maybe?

{% if labels %}{% for key, value in labels.items() %}whatis("{{ key }}    : {{ value }}")
{% endfor %}{% endif %}

I think I can probably reproduce this locally and work on it for a fix - thanks for the help @georgiastuart !

@vsoch
Copy link
Member

vsoch commented Jun 5, 2023

I produced something slightly different (and true to what the template shows - there are newlines in the list, so that is actually one big value. I think we need to do a little more parsing of the labels (by newline). I'll try that now!

whatis("deb-list    : gcc-8-base_8.3.0-6_amd64.deb
libc6_2.28-10_amd64.deb
libgcc1_1%3a8.3.0-6_amd64.deb
bash_5.0-4_amd64.deb
libc-bin_2.28-10_amd64.deb
libtinfo6_6.1+20181013-2+deb10u2_amd64.deb
ncurses-base_6.1+20181013-2+deb10u2_all.deb
base-files_10.3+deb10u9_amd64.deb")

@georgiastuart
Copy link
Contributor

Yours looks right! There's no closing ") on both the first line and the last line. Not sure why that showed up in the original.

@vsoch
Copy link
Member

vsoch commented Jun 5, 2023

okay fix is done! I'll get in the PR this evening. Thanks for reporting @mdehollander !

@vsoch
Copy link
Member

vsoch commented Jun 5, 2023

And I think we need to figure out how to ignore these non-modulefiles (the wrapper scripts) https://lmod.readthedocs.io/en/latest/025_new.html

@mdehollander
Copy link
Author

Thanks for looking into this. Hope this works. Here is another example of a lua file for spades for double checking:

whatis("busybox    : BusyBox v1.32.1 (2021-04-13 11:15:36 UTC) multi-call binary.")
whatis("deb-list    : gcc-8-base_8.3.0-6_amd64.deb
libc6_2.28-10_amd64.deb
libgcc1_1%3a8.3.0-6_amd64.deb
bash_5.0-4_amd64.deb
libc-bin_2.28-10_amd64.deb
libtinfo6_6.1+20181013-2+deb10u2_amd64.deb
ncurses-base_6.1+20181013-2+deb10u2_all.deb
base-files_10.3+deb10u9_amd64.deb")
whatis("glibc    : GNU C Library (Debian GLIBC 2.28-10) stable release version 2.28.")
whatis("io.buildah.version    : 1.19.6")
whatis("org.label-schema.build-arch    : amd64")
whatis("org.label-schema.build-date    : Tuesday_23_May_2023_17:14:37_CEST")
whatis("org.label-schema.schema-version    : 1.0")
whatis("org.label-schema.usage.apptainer.version    : 1.1.8")
whatis("org.label-schema.usage.singularity.deffile.bootstrap    : docker")
whatis("org.label-schema.usage.singularity.deffile.from    : quay.io/biocontainers/spades@sha256:7dfda44ae2535ba1ccc7c60c2ec265f8672cfd45885f458a964daf1b839a7ec1")
whatis("pkg-list    : gcc-8-base
libc6
libgcc1
bash
libc-bin
libtinfo6
ncurses-base
base-files")

Looking forward to see the PR.

@vsoch
Copy link
Member

vsoch commented Jun 6, 2023

Ah I'm so sorry! I meant to open it after work and it totally blew out of my ears. Here you go! #650

@marcodelapierre
Copy link
Contributor

Apol for the wait -- out and about for work again. On this one:

And I think we need to figure out how to ignore these non-modulefiles (the wrapper scripts) https://lmod.readthedocs.io/en/latest/025_new.html

@vsoch , Did you notice any weird behaviour coming from the wrapper scripts being there? Because otherwise I remember testing this a bit and concluding we were fine, because Lmod only scans files with .lua extension.

@vsoch
Copy link
Member

vsoch commented Jun 7, 2023

I don't have a production environment where I test, so I can't say either way - I am going by the issue reported here. Could it be a version thing?

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Jun 7, 2023

ah! hadn't noticed the message, thank you.

I see, from your link:

Lmod (8.7.4+): To catch directory that are full of non-modulefiles, Lmod count the number of regular files that do not start with a “.”. If there are more than 100, Lmod reports a warning.

The one workaround that comes to mind to me is to have the wrapper scripts hosted in a separate tree. I remember at first you and I had discussed having it inside the containers tree, but than ruled it out because with docker/podman (no SIF files) it was creating an unnecessary extra tree, to be managed (eg. at uninstall time).

Ultimately, I think that unless users notice a slow down in the module functioning, it should not be a problem to leave it as is for now.

@vsoch
Copy link
Member

vsoch commented Jun 7, 2023

@mdehollander do you notice a slowdown? if so we can look into an option to place the executables elsewhere.

@mdehollander
Copy link
Author

mdehollander commented Jun 7, 2023

I tried the strip-parens-templates branch and can confirm the module file work now:

$ module load quay.io/biocontainers/fastp
Lmod Warning:  MODULEPATH directory: "~/singularity-hpc/modules" has too many non-modulefiles (464). Please make sure that modulefiles are in their own directory and not mixed
in with non-modulefiles (e.g. source code)

$ which fastp
~/singularity-hpc/modules/quay.io/biocontainers/fastp/0.23.4--h5f740d0_0/bin/fastp

So thanks 🎉

The part that gave an error now looks like this:

whatis("busybox    : BusyBox v1.32.1 (2021-04-13 11:15:36 UTC) multi-call binary.")
whatis("deb-list    : gcc-8-base_8.3.0-6_amd64.deb, libc6_2.28-10_amd64.deb, libgcc1_1%3a8.3.0-6_amd64.deb, bash_5.0-4_amd64.deb, libc-bin_2.28-10_amd64.deb, libtinfo6_6.1+20181013-2+deb10u2_amd64.deb, ncurses-base_6.1+20181013-2+deb10u2_all.deb, base-files_10.3+deb10u9_amd64.deb")
whatis("glibc    : GNU C Library (Debian GLIBC 2.28-10) stable release version 2.28.")

There are indeed many files in the modules folder, and not starting with a . I haven't installed many modules, but there are more than 600 files. Most files are in the bin folders. I don't notice any slowness when loading a module, so indeed not a big issue. Happy that loading the biocontainer modules works now!

Edit:
I did import modules from a /cvmfs mount, but but interrupted it as well, but still quite some bioconductor modules where imported as a module. When I remove them, the list is much shorter and there is no lmod warning anymore

@vsoch
Copy link
Member

vsoch commented Jun 9, 2023

@mdehollander is there anything else you'd like us to work on or try?

@mdehollander
Copy link
Author

No, the biocontainer images are now loading. Thanks! The warning about the number of modules is something than can be ignored :-P

@mdehollander
Copy link
Author

mdehollander commented Aug 16, 2023

Ultimately, I think that unless users notice a slow down in the module functioning, it should not be a problem to leave it as is for now.

@marcodelapierre @vsoch

I finally could really test this now we have a better connection with the cvmfs mirror. And I have to say I do notice a slowdown. Having all cvmfs biocontainer configs in the module directory it takes almost a minute to load a module. Loading all available modules takes also very long. In contrast, if I start fresh and only create a few module files with shpc install everything just work instantly. Does this also happen in other setups? Or could it be my version of lmod/tcl. Let me know if we reopen this issue or open a new one if needed

This is the output of loading a cvmfs biocontainer module:

time module load quay.io/biocontainers/megahit
Lmod Warning:  MODULEPATH directory: "/singularity-hpc/modules" has too many non-modulefiles (149544). Please make sure that modulefiles are in their own directory and not
mixed in with non-modulefiles (e.g. source code)

real    2m8.685s
user    0m48.762s
sys     0m21.364s

@vsoch
Copy link
Member

vsoch commented Aug 16, 2023

This has been my experience generally with cvms- it’s slow.

@marcodelapierre
Copy link
Contributor

Thanks for this precious feedback Matthias!
Are the modulefiles hosted in CVMfs as well, or just the path containing the container images?

If the modules are on CVMfs, would you have the chance to trying copying just the modulefile tree in a local filesystem, and see whether at least the module load operation gets smoother?

@vsoch
Copy link
Member

vsoch commented Aug 17, 2023

We also have a google cloud storage example (that could be extrapolated to other object storage) https://github.com/singularityhub/singularity-hpc/tree/main/example/google-cloud-storage

@mdehollander
Copy link
Author

The module files are on the local file system, not on the cvmfs. The cvmfs provides only the containers. Using the script from this repo I create the module files.

The speed of the cvmfs mount is not an issue. We are using a mirror that is located nearby, and the response is instant. Loading a container (with first cleaning the cache), is just less than a second:

c$ time singularity run /cvmfs/singularity.galaxyproject.org/m/e/megahit\:1.2.9--h5b5514e_3 megahit --help                                                 MEGAHIT v1.2.9                                                                                                                                                                                                                                                                                                                                                    contact: Dinghua Li <[email protected]>                                                                                                                                                                                       
...
real    0m3.141s
user    0m0.605s
sys     0m0.186s   

So I think lmod has issues querying/loading the module files from shpc. In htop I see this command running for minutes:

~/src/spack/opt/spack/linux-ubuntu20.04-broadwell/gcc-9.4.0/lua-5.4.4-oww5owpjdgiqxgs7kirdehcxzyy5gwov/bin/lua ~/src/spack/linux-ubuntu20.04-broadwell/gcc-9.4.0/lmod-8.7.24-tydjf2snttgvtpjigy5l27attmaqtlgs/lmod/lmod/libexec/lmod shell load quay.io/biocontainers/megahit 

@vsoch
Copy link
Member

vsoch commented Aug 18, 2023

Can you test without spack?

@mdehollander
Copy link
Author

Not so easily, spack was great in getting a recent version of lmod installed. I first used the lmod that comes with the ubuntu dist, but that is version 6.6 and I had troubles installing and loading normal modules with shpc. Installing lmod 8.7.24 via spack was very easy and made the normal thing running.

@marcodelapierre
Copy link
Contributor

@mdehollander you can try to give a go at one of my reference scripts, to install Lmod on Ubuntu via apt + source:

https://github.com/marcodelapierre/hpc-middleware-scripts/blob/main/modules/install-lmod.sh

@mdehollander
Copy link
Author

mdehollander commented Aug 18, 2023

Thanks for the script. That made installing lmod easy as well.

Unfortunately the behavior is the same for the lmod/lua versions I had installed via spack:

time module load quay.io/biocontainers/megahit
real    2m0.533s
user    0m48.118s
sys     0m18.690s

$ module --version

Modules based on Lua: Version 8.5  2021-05-10 13:40 -05:00
    by Robert McLay [email protected]

$ lua
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio

I guess the slowdown happens when there a lot of module files. Currently there are 195212 files for 27400 modules in my config.
If I only keep the config for the megahit module, loading is again very fast:

time module load quay.io/biocontainers/megahit

real    0m0.088s
user    0m0.065s
sys     0m0.014s

@marcodelapierre
Copy link
Contributor

Thanks for testing the "spack" variable, Mattias.

I think it could be interesting to test an alternate setup, where all the bash scripts are in a separate tree, and the module tree only contains the .lua. There would be about a factor 10 reduction in file count, but also a 100% Lua content in the tree...wondering if this may significantly improve Lmod behaviour.

@mdehollander
Copy link
Author

Is this something I can easily try out myself? Or do the module files need to be regenerated?

I am also pinging @audreystott and @muffato from #598, since they might have used the biocontainer_match.py script as well. I am wondering if they also see performance issues when there are a lot of module files generated from a biocontainer registry. If so, maybe time to open a new issue?

@muffato
Copy link
Contributor

muffato commented Aug 23, 2023

Hi @mdehollander . Here we don't serve the entire biocontainers repository through shpc. Only the ones that have been requested by users - a few dozens so far. So no performance issue.

@vsoch
Copy link
Member

vsoch commented Aug 23, 2023

I'll also comment again that every experience that I've had with cvmfs has been really slow - just for loading a workflow with easybuild or set of dependencies. I don't mean to defer the problem (because I'm happy to help) but I don't think this is explicitly a problem with shpc.

@mdehollander
Copy link
Author

@vsoch Indeed, it depends how good the performance of cvmfs is. But here using it without shpc, so just singularity run on anything in the cvmfs dir, or just browsing, is super fast due to the nearby mirror I think. Standalone shpc is also fast in loading modules. Where it becomes slow for us is when all biocontainers are added as module with the biocontainer_match.py script. I would assume this is either an issue with shpc, or more general with lmod in handling many module files. I am not an expert on module files, so I can't figure that out.

Serving a selection of biocontainers as a module is an option, but I still think the possibility to have them all available as module (and the search/spider function), would be nice.

@marcodelapierre
Copy link
Contributor

Is this something I can easily try out myself? Or do the module files need to be regenerated?

Probably yes, I can outline a strategy I have in mind, and you can evaluate how much support you would need.
Suppose you have a directory path modules/ which contains all the modules + spurious files from the SHPC installation.

  1. You could create a sibling directory wrappers/, where you would move all of the wrapper scripts from all bin/ subdirectories of modules/, preserving the same subdirectory tree; this can be automated using a bash script.
  2. Then, look at line 70 of the singularity lua template: you would use a substitution tool such as grep to look into each modulefile for the line containing prepend_path, and then substitute the string pathJoin(moduleDir, "bin") with another general string that points to the wrappers/ tree.

Apologies, I have a few meetings starting, so I am unable to further detail how to implement point 2. (it might need a bit of Lua magic).

Please shout up for further support :-)

@audreystott
Copy link
Contributor

So I’ve got kind of a workaround with the modules – when I install Lmod, I have to give it a spider cache dir; and then after the biocontainer_match.py script has been run, I will then run an update to lmod to cache the files. So that the system doesn’t have to recache the entire 9000 biocontainers each time I run module avail or module load.

See https://lmod.readthedocs.io/en/latest/130_spider_cache.html

sudo mkdir /opt/mData
sudo chmod a+w /opt/mData
./configure --with-spiderCacheDir=/opt/mData/cacheDir --with-updateSystemFn=/opt/mData/cacheTS.txt

#This step will take several hours as 8000+ module files are being written
sudo python3 singularity-hpc/example/biocontainer-match.py --containers /cvmfs/singularity.galaxyproject.org/all

#This step will take up to 10 minutes, but it enables the spider cache to be stored on the system for use again later.
/usr/local/lmod/lmod/libexec/update_lmod_system_cache_files -d /opt/mData/cacheDir -t /opt/mData/cacheTS.txt /home/ubuntu/singularity-hpc/modules

The spider cache has to be recached every time you add a new module though... so this is not an ideal workaround longterm.

@mdehollander
Copy link
Author

mdehollander commented Aug 25, 2023

Probably yes, I can outline a strategy I have in mind, and you can evaluate how much support you would need.

@marcodelapierre Thanks for the steps. It seems like doable but will takes some time. But after reading the response from @audreystott I am not sure how much effect this will have. Because it seems to be a known thing that if you have a lot of modulefiles, lmod becomes slow. Using a cache seems to be the way forward here. If that is true, is indeed not an shpc issue, but rather something that needs to be correctly configured with lmod.

@audreystott Thanks for you detailed answer. I will definitely try this and report back here. I assume you have also set LMOD_CACHED_LOADS to yes to assist in the module load function?

@marcodelapierre
Copy link
Contributor

My pleasure Mattias.

And shout out to Audrey, our containers&modules&more bio-devops wizard!

@vsoch
Copy link
Member

vsoch commented Aug 28, 2023

@audreystott ! 🙌

@audreystott
Copy link
Contributor

@mdehollander yes that's correct. Both these:

export LMOD_SHORT_TIME=86400
export LMOD_CACHED_LOADS=yes

LMOD_SHORT_TIME:
[number, default: 2, –with-shortTime]. If the time to build the spider cache takes longer than this number then write the spider cache out into the user’s account. If you want to prevent the spider cache file being written to the user’s account then set this number to be large, like 86400.

@marcodelapierre
Copy link
Contributor

Is this something I can easily try out myself? Or do the module files need to be regenerated?

Probably yes, I can outline a strategy I have in mind, and you can evaluate how much support you would need. Suppose you have a directory path modules/ which contains all the modules + spurious files from the SHPC installation.

  1. You could create a sibling directory wrappers/, where you would move all of the wrapper scripts from all bin/ subdirectories of modules/, preserving the same subdirectory tree; this can be automated using a bash script.
  2. Then, look at line 70 of the singularity lua template: you would use a substitution tool such as grep to look into each modulefile for the line containing prepend_path, and then substitute the string pathJoin(moduleDir, "bin") with another general string that points to the wrappers/ tree.

Apologies, I have a few meetings starting, so I am unable to further detail how to implement point 2. (it might need a bit of Lua magic).

Please shout up for further support :-)

Mattias, also note that Vanessa has a PR open on point 1. above: #654

I am aiming to review it ASAP this week.

@mdehollander
Copy link
Author

Using lmod cache works great! See here the difference:

After enabling the cache:

$ time module load quay.io/biocontainers/spades          
                                               
real    0m1.027s
user    0m0.950s
sys     0m0.072s                      

Loading a module with the cache disabled takes much longer:

$ time module --ignore_cache load quay.io/biocontainers/spades                  

real    2m41.524s
user    1m16.342s
sys     0m26.073s

@vsoch
Copy link
Member

vsoch commented Aug 28, 2023

oh wow, that is immense! @mdehollander would you care to write up this trick for somewhere in our docs?

@mdehollander
Copy link
Author

Credits to @audreystott as well, since she pointed me to the lmod caching.

I add some things to the user-guide. Would a PR on https://github.com/singularityhub/singularity-hpc/blob/main/docs/getting_started/user-guide.rst be a good start. If I am correct, the biocontainers_match script is also not yet covered in the docs. I found it mentioned in an issue.

@vsoch
Copy link
Member

vsoch commented Aug 28, 2023

Yes perhaps we want some section for helper scripts, and/or scaled installs? We could add to developer docs or similar. I will leave it to your judgment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants