Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grace-Hopper non-pilot documentation #188

Merged
merged 37 commits into from
Apr 19, 2024
Merged

Grace-Hopper non-pilot documentation #188

merged 37 commits into from
Apr 19, 2024

Conversation

ptheywood
Copy link
Member

@ptheywood ptheywood commented Mar 13, 2024

Adds non-pilot grace-hopper specific documentation.

Closes #185

  • Usage page
    • Add ghlogin partition detail
    • Add CPU architecture/ Partitions subsection explaining which are which
    • Reduce GH pilot subsection to be less content + links, stating that the pilot has ended. To be removed at a future date
  • Add announcement stating 3 GH200 480GB nodes are now avialable
    • link to a usage subsection?
  • Remove wmlce documentation (via orphan)
  • Add admonition to each ppc64le only page
  • Add gh details to each software page
    • Conda
    • Python
      • Check if python2 is still available. Change that bit to be architecture specific
      • Make default python version information partition specific.
    • pytorch
      • Include how to get a CUDA enabled build at the mo
    • Rust - no changes needed
    • Tensorflow
      • Include how to get a CUDA enabled build at the mo
    • CUDA/NVCC
      • Modules, CUDA verisons & limitations, gencodes
    • GCC
      • mention psABI warnings and how to suppress.
      • Double check / reprhase psabi docs
    • NVHPC
    • Blas/Lapack (OpenBLAS)
      • Maybe split this page into3. Current page and one per implementation so its clearer which are availabe on which platform. defferred to Split blas-lapack.rst #192
    • Boost
    • FFTW
    • HDF5
    • OpenMPI
    • NVTX
    • CMake (no modules)
    • Make
    • ncu
      • add version info
    • nsys
      • add version info
    • nvidia-smi
    • Apptainer/Singularity documentation.
      • Current plan is for ppc64le to continue with singularity to avoid unneccesary breaking changes
      • Apptainer page - just aarch64, cross ref singularity CE. Mention fakeroot.
      • Singularity - rename singularity CE. ppc64le only. Cross ref Apptainer
  • Add arch bash detection for .bashrc to faq and usage.
  • Basic tabs theme customisation
  • Finished tabs theme customisation (code tabs don't look great)
  • Guides
  • sidebar and non-sidebar common admonitions and use the.

@ptheywood ptheywood force-pushed the gh-tabs branch 2 times, most recently from d169eab to 5ee6800 Compare March 18, 2024 14:07
This requires switching away from a globbed toctree to remove wmlce from the toc (without redirects)
Sadly cannot redifine the same replacement mutliple times in a single RST file, so this include strategy isn't great for long pages.
…4le only.

This could do with expanding to include other content from the gh pilot docs.
@ptheywood ptheywood marked this pull request as ready for review April 19, 2024 14:24
@ptheywood ptheywood requested review from bodgerer and a team April 19, 2024 14:24
@bodgerer
Copy link
Collaborator

Hi Peter! Thanks so much for this, it's looking really good. Love the tabs and how they remember which tab you were looking at in the previous box :)

One thing though: someone has been trying out MPI on the Grace Hoppers, and I've recently needed to generate a bede-mpirun to get things running. It has the same semantics as the ppc64le version (although as there's no hardware SMT, the 1ppt option is fairly pointless).

There are two references on the usage page to using mpirun rather than bede-mpirun on aarch64... can these be removed, please?

Other than that, looks good to go to me.

Best,

Mark

This behaves the same as on ppc64le nodes, but due to the absence of hardware SMT 1ppt is effectively the same as 1ppc
@bodgerer
Copy link
Collaborator

Oh, and the gh mpi example would need updating, too. Sorry!

@ptheywood
Copy link
Member Author

ptheywood commented Apr 19, 2024

0fea049 re-combines the aarch and ppc64le mpi usage section, with the tab jsut differntiating the job script by partition now.
There is also a .. note:: pointing out 1ppt is the same as 1ppc on gh.

I've updated the suggestion to use mpirun to now be bede-mpirun in the gh pilot section.

image

@ptheywood
Copy link
Member Author

I'll get this merged now, and we can tweak it more in the future if any other adjustments are needed / mistakes spotted.

@ptheywood ptheywood merged commit cf5a15c into main Apr 19, 2024
2 checks passed
@ptheywood ptheywood deleted the gh-tabs branch April 19, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grace-Hopper documentation
2 participants