generated from ENCCS/sphinx-lesson-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 2cf94ab
Showing
525 changed files
with
80,713 additions
and
0 deletions.
There are no files selected for viewing
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
Connecting to a HPC resource | ||
============================== | ||
|
||
|
||
If using SSH keys, once the keys are created and uploaded on the PDC interface, entering the cluster is as simple as: | ||
|
||
.. code-block:: console | ||
$ ssh -Y <username>@dardel.pdc.kth.se | ||
Which should get you into the PDC supercomputer. The ``-Y`` flag is used to be able to open graphical windows on the supercomputer, e.g. | ||
to visualise images. This will work only if you have a running local X server (if you are on Linux/WSL, you most likely do). | ||
Alternatively, you may choose to use Kerberos as an authentication method. To do that, you first need to ask for a Kerberos ticket: | ||
|
||
.. code-block:: console | ||
$ kinit -f <username>@NADA.PDC.KTH.SE | ||
After that, the SSH command looks like the following: | ||
|
||
.. code-block:: console | ||
$ ssh -o GSSAPIAuthentication=yes -Y <username>@dardel.pdc.kth.se | ||
More information about Kerberos can be found at `this <https://www.pdc.kth.se/support/documents/login/configuration.html>`__ address. | ||
|
||
.. type-along:: | ||
|
||
Let us check on which node we ended up. The name of the machine can be checked with the `hostname` command: | ||
|
||
.. code-block:: console | ||
$ hostname | ||
We can get a sense of the size of Dardel by using the ``sinfo`` command: | ||
|
||
.. code-block:: console | ||
$ sinfo -s | ||
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST | ||
gpu up 1-00:00:00 49/9/4/62 nid[002792-002853] | ||
main up 1-00:00:00 604/256/112/972 nid[001012-001531,001756-001816,001818-001819,001821-001896,001898-002007,002009-002023,002552-002567,002588-002759] | ||
scania up 4-00:00:00 22/187/15/224 nid[001532-001755] | ||
scania-hf up 4-00:00:00 0/3/1/4 nid[000011-000014] | ||
memory up 7-00:00:00 34/8/0/42 nid[000101-000118,001772-001779,002552-002567] | ||
shared up 7-00:00:00 27/5/0/32 nid[001000-001011,002568-002587] | ||
long up 7-00:00:00 76/4/0/80 nid[001800-001819,002588-002647] | ||
eggnog up 7-00:00:00 4/0/0/4 nid[002536-002539] | ||
supernova up 14-00:00:0 5/6/5/16 nid[001817,001820,001897,002008,002540-002551] | ||
E.g. the ``main`` partition has 972 nodes, each containing 128 cores. | ||
|
||
A general sense of the amount of work load can be gained with the ``squeue`` command, which shows all the jobs (running, queued): | ||
|
||
.. code-block:: console | ||
$ squeue |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
Move between folders, ls, transferring to/from local storage | ||
=============================================================== | ||
|
||
Upon logging in, you should be in your "home" folder, as reported by the prompt: | ||
|
||
.. code:: console | ||
fiusco@login1:~> | ||
where ``login1`` is the name of the host (in this case, the login node) and ``~`` represents | ||
the home folder. The full path of this directory can be printed using the ``pwd`` command | ||
(**p**\ rint **w**\ orking **d**\ irectory): | ||
|
||
.. code:: console | ||
$ pwd | ||
/cfs/klemming/home/f/fiusco | ||
The contents of a directory can be listed with the ``ls`` command: | ||
|
||
.. code-block:: console | ||
$ ls | ||
Private Public spack-user | ||
The ``cd`` (**c**\ hange **d**\ irectory) command can be used to navigate the filesystem. | ||
|
||
Moving files/folders from/to the cluster can be achieved via the ``scp`` command to be run locally | ||
(i.e. not on the cluster): | ||
|
||
.. code-block:: console | ||
$ scp [-r] /path/to/local/source [email protected]:/path/to/destination | ||
The optional ``-r`` flag is used to indicate recursive copying of whole folders and their contents. | ||
|
||
.. type-along:: | ||
|
||
In this workshop, our working folder will be in ``/cfs/klemming/projects/supr/testingsharedbus/``. You can create your own folder: | ||
|
||
.. code-block:: console | ||
$ cd /cfs/klemming/projects/supr/bustestingshared | ||
$ mkdir my_name | ||
We can now clone the repository containing the material for the workshop: | ||
|
||
.. code-block:: console | ||
$ cd my_name | ||
$ git clone https://github.com/ENCCS/supercomputing4ai_demo | ||
$ cd supercomputing4ai_demo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
Instructor's guide | ||
================== | ||
|
||
Why we teach this lesson | ||
------------------------ | ||
|
||
|
||
|
||
Intended learning outcomes | ||
-------------------------- | ||
|
||
|
||
|
||
Timing | ||
------ | ||
|
||
|
||
|
||
Preparing exercises | ||
------------------- | ||
|
||
e.g. what to do the day before to set up common repositories. | ||
|
||
|
||
|
||
Other practical aspects | ||
----------------------- | ||
|
||
|
||
|
||
Interesting questions you might get | ||
----------------------------------- | ||
|
||
|
||
|
||
Typical pitfalls | ||
---------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
Introduction to supercomputing for AI | ||
===================================== | ||
|
||
High performance computing (HPC) resources can be used to accelerate AI workflows. The EuroHPC Joint Undertaking (JU) offers | ||
free access to such resources to SMEs as well as larger companies. In this hands-on, you will learn: | ||
|
||
* What is an HPC resource and how it is different from a cloud environment; | ||
* What are the available HPC resources through the EuroHPC JU; | ||
* How to connect to a cluster and explore resources; | ||
* How to run a demo AI workflow based on Singularity. | ||
|
||
|
||
|
||
.. prereq:: | ||
|
||
You will need to have credentials to access the `PDC <https://pdc.kth.se/>`__ cluster. A working SSH client is needed: | ||
it is included on macOS and most Linux flavours; it is also available on Windows in the Powershell or under the `Windows Subsystem for Linux (WSL) <https://learn.microsoft.com/en-us/windows/wsl/install>`__. | ||
|
||
|
||
|
||
|
||
|
||
Who is the course for? | ||
---------------------- | ||
|
||
This course is intended for data scientists that want to take advantage of higher computing power to perform their workflows. | ||
Some degree of familiarity with a command-line shell is recommended, but no expertise is required. No previous knowledge of | ||
supercomputing environments is required. | ||
|
||
|
||
|
||
|
||
About the course | ||
---------------- | ||
|
||
We will train a Unet model to be able to recognise water in satellite pictures. The source code can be found at `this <https://github.com/ENCCS/supercomputing4ai_demo.git>`__ | ||
repo. The example is based on Tensorflow and will be run using `Singularity <https://docs.sylabs.io/guides/3.5/user-guide/introduction.html>`__. | ||
The structure of the example is the following: | ||
|
||
:: | ||
|
||
./supercomputing4ai_demo | ||
├── build_singularity.def | ||
├── images | ||
│ └── generated-images | ||
├── models | ||
│ ├── serving | ||
│ │ └── main.py | ||
│ └── unet | ||
│ ├── data | ||
│ │ └── water | ||
│ │ ├── Images | ||
│ │ └── Masks | ||
│ ├── main.py | ||
│ └── result | ||
│ ├── models | ||
│ └── training | ||
└── README.md | ||
|
||
|
||
The ``models`` subfolder contains the model to be trained (``unet``) and the inference code (``serving``). Under ``data``, the training | ||
dataset can be found, with the ``Images`` being some satellite images and ``Masks`` being the water-covered areas in those images. Upon | ||
running ``main.py`` in the ``unet`` folder, a Unet will be trained, producing a set of weights in the ``models/`` subfolder and training | ||
statistics (binary cross-entropy loss and accuracy). Inference is then performed with the ``models/serving/main.py`` script, which takes | ||
as an input an image and generates a mask of the water parts. | ||
|
||
.. csv-table:: | ||
:widths: auto | ||
:delim: ; | ||
|
||
20 min ; :doc:`supercomputer_why` | ||
20 min ; :doc:`connect_to_cluster` | ||
20 min ; :doc:`folders_and_transfer` | ||
20 min ; :doc:`software_modules` | ||
20 min ; :doc:`sbatch_singularity` | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: The lesson | ||
|
||
supercomputer_why | ||
connect_to_cluster | ||
folders_and_transfer | ||
software_modules | ||
sbatch_singularity | ||
|
||
|
||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Reference | ||
|
||
quick-reference | ||
|
||
guide | ||
|
||
|
||
|
||
See also | ||
-------- | ||
|
||
Further introductory material can be found on the `Introduction to LUMI <https://lumi-supercomputer.github.io/lumi-self-learning/>`__ and | ||
`HPC carpentry <https://carpentries-incubator.github.io/hpc-intro/>`__ pages. | ||
|
||
|
||
|
||
Credits | ||
------- | ||
|
||
The lesson file structure and browsing layout is inspired by and derived from | ||
`work <https://github.com/coderefinery/sphinx-lesson>`__ by `CodeRefinery | ||
<https://coderefinery.org/>`__ licensed under the `MIT license | ||
<http://opensource.org/licenses/mit-license.html>`__. We have copied and adapted | ||
most of their license text. | ||
|
||
Instructional Material | ||
^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
This instructional material is made available under the | ||
`Creative Commons Attribution license (CC-BY-4.0) <https://creativecommons.org/licenses/by/4.0/>`__. | ||
The following is a human-readable summary of (and not a substitute for) the | ||
`full legal text of the CC-BY-4.0 license | ||
<https://creativecommons.org/licenses/by/4.0/legalcode>`__. | ||
You are free to: | ||
|
||
- **share** - copy and redistribute the material in any medium or format | ||
- **adapt** - remix, transform, and build upon the material for any purpose, | ||
even commercially. | ||
|
||
The licensor cannot revoke these freedoms as long as you follow these license terms: | ||
|
||
- **Attribution** - You must give appropriate credit (mentioning that your work | ||
is derived from work that is Copyright (c) ENCCS and individual contributors and, where practical, linking | ||
to `<https://enccs.github.io/sphinx-lesson-template>`_), provide a `link to the license | ||
<https://creativecommons.org/licenses/by/4.0/>`__, and indicate if changes were | ||
made. You may do so in any reasonable manner, but not in any way that suggests | ||
the licensor endorses you or your use. | ||
- **No additional restrictions** - You may not apply legal terms or | ||
technological measures that legally restrict others from doing anything the | ||
license permits. | ||
|
||
With the understanding that: | ||
|
||
- You do not have to comply with the license for elements of the material in | ||
the public domain or where your use is permitted by an applicable exception | ||
or limitation. | ||
- No warranties are given. The license may not give you all of the permissions | ||
necessary for your intended use. For example, other rights such as | ||
publicity, privacy, or moral rights may limit how you use the material. | ||
|
||
|
||
Software | ||
^^^^^^^^ | ||
|
||
Except where otherwise noted, the example programs and other software provided | ||
with this repository are made available under the `OSI <http://opensource.org/>`__-approved | ||
`MIT license <https://opensource.org/licenses/mit-license.html>`__. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Quick Reference | ||
=============== |
Oops, something went wrong.