Skip to content

Commit

Permalink
deploy: 7d8ba09
Browse files Browse the repository at this point in the history
  • Loading branch information
wikfeldt committed Sep 20, 2024
0 parents commit 2cf94ab
Show file tree
Hide file tree
Showing 525 changed files with 80,713 additions and 0 deletions.
Empty file added .nojekyll
Empty file.
Binary file added _images/cluster_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 58 additions & 0 deletions _sources/connect_to_cluster.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Connecting to a HPC resource
==============================


If using SSH keys, once the keys are created and uploaded on the PDC interface, entering the cluster is as simple as:

.. code-block:: console
$ ssh -Y <username>@dardel.pdc.kth.se
Which should get you into the PDC supercomputer. The ``-Y`` flag is used to be able to open graphical windows on the supercomputer, e.g.
to visualise images. This will work only if you have a running local X server (if you are on Linux/WSL, you most likely do).
Alternatively, you may choose to use Kerberos as an authentication method. To do that, you first need to ask for a Kerberos ticket:

.. code-block:: console
$ kinit -f <username>@NADA.PDC.KTH.SE
After that, the SSH command looks like the following:

.. code-block:: console
$ ssh -o GSSAPIAuthentication=yes -Y <username>@dardel.pdc.kth.se
More information about Kerberos can be found at `this <https://www.pdc.kth.se/support/documents/login/configuration.html>`__ address.

.. type-along::

Let us check on which node we ended up. The name of the machine can be checked with the `hostname` command:

.. code-block:: console
$ hostname
We can get a sense of the size of Dardel by using the ``sinfo`` command:

.. code-block:: console
$ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
gpu up 1-00:00:00 49/9/4/62 nid[002792-002853]
main up 1-00:00:00 604/256/112/972 nid[001012-001531,001756-001816,001818-001819,001821-001896,001898-002007,002009-002023,002552-002567,002588-002759]
scania up 4-00:00:00 22/187/15/224 nid[001532-001755]
scania-hf up 4-00:00:00 0/3/1/4 nid[000011-000014]
memory up 7-00:00:00 34/8/0/42 nid[000101-000118,001772-001779,002552-002567]
shared up 7-00:00:00 27/5/0/32 nid[001000-001011,002568-002587]
long up 7-00:00:00 76/4/0/80 nid[001800-001819,002588-002647]
eggnog up 7-00:00:00 4/0/0/4 nid[002536-002539]
supernova up 14-00:00:0 5/6/5/16 nid[001817,001820,001897,002008,002540-002551]
E.g. the ``main`` partition has 972 nodes, each containing 128 cores.

A general sense of the amount of work load can be gained with the ``squeue`` command, which shows all the jobs (running, queued):

.. code-block:: console
$ squeue
54 changes: 54 additions & 0 deletions _sources/folders_and_transfer.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Move between folders, ls, transferring to/from local storage
===============================================================

Upon logging in, you should be in your "home" folder, as reported by the prompt:

.. code:: console
fiusco@login1:~>
where ``login1`` is the name of the host (in this case, the login node) and ``~`` represents
the home folder. The full path of this directory can be printed using the ``pwd`` command
(**p**\ rint **w**\ orking **d**\ irectory):

.. code:: console
$ pwd
/cfs/klemming/home/f/fiusco
The contents of a directory can be listed with the ``ls`` command:

.. code-block:: console
$ ls
Private Public spack-user
The ``cd`` (**c**\ hange **d**\ irectory) command can be used to navigate the filesystem.

Moving files/folders from/to the cluster can be achieved via the ``scp`` command to be run locally
(i.e. not on the cluster):

.. code-block:: console
$ scp [-r] /path/to/local/source [email protected]:/path/to/destination
The optional ``-r`` flag is used to indicate recursive copying of whole folders and their contents.

.. type-along::

In this workshop, our working folder will be in ``/cfs/klemming/projects/supr/testingsharedbus/``. You can create your own folder:

.. code-block:: console
$ cd /cfs/klemming/projects/supr/bustestingshared
$ mkdir my_name
We can now clone the repository containing the material for the workshop:

.. code-block:: console
$ cd my_name
$ git clone https://github.com/ENCCS/supercomputing4ai_demo
$ cd supercomputing4ai_demo
37 changes: 37 additions & 0 deletions _sources/guide.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Instructor's guide
==================

Why we teach this lesson
------------------------



Intended learning outcomes
--------------------------



Timing
------



Preparing exercises
-------------------

e.g. what to do the day before to set up common repositories.



Other practical aspects
-----------------------



Interesting questions you might get
-----------------------------------



Typical pitfalls
----------------
158 changes: 158 additions & 0 deletions _sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
Introduction to supercomputing for AI
=====================================

High performance computing (HPC) resources can be used to accelerate AI workflows. The EuroHPC Joint Undertaking (JU) offers
free access to such resources to SMEs as well as larger companies. In this hands-on, you will learn:

* What is an HPC resource and how it is different from a cloud environment;
* What are the available HPC resources through the EuroHPC JU;
* How to connect to a cluster and explore resources;
* How to run a demo AI workflow based on Singularity.



.. prereq::

You will need to have credentials to access the `PDC <https://pdc.kth.se/>`__ cluster. A working SSH client is needed:
it is included on macOS and most Linux flavours; it is also available on Windows in the Powershell or under the `Windows Subsystem for Linux (WSL) <https://learn.microsoft.com/en-us/windows/wsl/install>`__.





Who is the course for?
----------------------

This course is intended for data scientists that want to take advantage of higher computing power to perform their workflows.
Some degree of familiarity with a command-line shell is recommended, but no expertise is required. No previous knowledge of
supercomputing environments is required.




About the course
----------------

We will train a Unet model to be able to recognise water in satellite pictures. The source code can be found at `this <https://github.com/ENCCS/supercomputing4ai_demo.git>`__
repo. The example is based on Tensorflow and will be run using `Singularity <https://docs.sylabs.io/guides/3.5/user-guide/introduction.html>`__.
The structure of the example is the following:

::

./supercomputing4ai_demo
├── build_singularity.def
├── images
│ └── generated-images
├── models
│ ├── serving
│ │ └── main.py
│ └── unet
│ ├── data
│ │ └── water
│ │ ├── Images
│ │ └── Masks
│ ├── main.py
│ └── result
│ ├── models
│ └── training
└── README.md


The ``models`` subfolder contains the model to be trained (``unet``) and the inference code (``serving``). Under ``data``, the training
dataset can be found, with the ``Images`` being some satellite images and ``Masks`` being the water-covered areas in those images. Upon
running ``main.py`` in the ``unet`` folder, a Unet will be trained, producing a set of weights in the ``models/`` subfolder and training
statistics (binary cross-entropy loss and accuracy). Inference is then performed with the ``models/serving/main.py`` script, which takes
as an input an image and generates a mask of the water parts.

.. csv-table::
:widths: auto
:delim: ;

20 min ; :doc:`supercomputer_why`
20 min ; :doc:`connect_to_cluster`
20 min ; :doc:`folders_and_transfer`
20 min ; :doc:`software_modules`
20 min ; :doc:`sbatch_singularity`


.. toctree::
:maxdepth: 1
:caption: The lesson

supercomputer_why
connect_to_cluster
folders_and_transfer
software_modules
sbatch_singularity



.. toctree::
:maxdepth: 1
:caption: Reference

quick-reference

guide



See also
--------

Further introductory material can be found on the `Introduction to LUMI <https://lumi-supercomputer.github.io/lumi-self-learning/>`__ and
`HPC carpentry <https://carpentries-incubator.github.io/hpc-intro/>`__ pages.



Credits
-------

The lesson file structure and browsing layout is inspired by and derived from
`work <https://github.com/coderefinery/sphinx-lesson>`__ by `CodeRefinery
<https://coderefinery.org/>`__ licensed under the `MIT license
<http://opensource.org/licenses/mit-license.html>`__. We have copied and adapted
most of their license text.

Instructional Material
^^^^^^^^^^^^^^^^^^^^^^

This instructional material is made available under the
`Creative Commons Attribution license (CC-BY-4.0) <https://creativecommons.org/licenses/by/4.0/>`__.
The following is a human-readable summary of (and not a substitute for) the
`full legal text of the CC-BY-4.0 license
<https://creativecommons.org/licenses/by/4.0/legalcode>`__.
You are free to:

- **share** - copy and redistribute the material in any medium or format
- **adapt** - remix, transform, and build upon the material for any purpose,
even commercially.

The licensor cannot revoke these freedoms as long as you follow these license terms:

- **Attribution** - You must give appropriate credit (mentioning that your work
is derived from work that is Copyright (c) ENCCS and individual contributors and, where practical, linking
to `<https://enccs.github.io/sphinx-lesson-template>`_), provide a `link to the license
<https://creativecommons.org/licenses/by/4.0/>`__, and indicate if changes were
made. You may do so in any reasonable manner, but not in any way that suggests
the licensor endorses you or your use.
- **No additional restrictions** - You may not apply legal terms or
technological measures that legally restrict others from doing anything the
license permits.

With the understanding that:

- You do not have to comply with the license for elements of the material in
the public domain or where your use is permitted by an applicable exception
or limitation.
- No warranties are given. The license may not give you all of the permissions
necessary for your intended use. For example, other rights such as
publicity, privacy, or moral rights may limit how you use the material.


Software
^^^^^^^^

Except where otherwise noted, the example programs and other software provided
with this repository are made available under the `OSI <http://opensource.org/>`__-approved
`MIT license <https://opensource.org/licenses/mit-license.html>`__.
2 changes: 2 additions & 0 deletions _sources/quick-reference.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Quick Reference
===============
Loading

0 comments on commit 2cf94ab

Please sign in to comment.