VM was set up by sysadmin (David R) following https://osc.github.io/ood-documentation/latest/installation/install-software.html up to and including setting up SSL
Sysadmin additions to that on the top of this:
- copy SSH host keys in /etc/ssh from old servers before they are re-built
- in /etc/ssh/ssh_config.d/00-chpc-config on OOD servers enable host based authentication
- add all cluster file systems mounts
- install maria-db-server to allow resolveip - used to find compute node hostname by OOD (hostfromroute.sh)
- add all HPC scratch mounts
- passwordless ssh to all interactive nodes
- Modify
/etc/security/access.conf
to add:+:ALL:LOCAL
Some info on other sites implementation at (https://discourse.openondemand.org/t/implementing-authentication-via-cas/34/9).
Build mod_auth_cas from source, based on https://linuxtut.com/en/69296a1f9b6bf93f076f/
$ yum install libcurl-devel pcre-devel
$ cd /usr/local/src
$ wget https://github.com/apereo/mod_auth_cas/archive/v1.2.tar.gz
$ tar xvzf v1.2.tar.gz
$ cd mod_auth_cas-1.2
$ autoreconf -iv
$ ./configure --with-apxs=/usr/bin/apxs
$ make
$ make check
$ make install
or install_scripts/build_cas.sh
Further setup of CAS
$ mkdir -p /var/cache/httpd/mod_auth_cas
$ chown apache:apache /var/cache/httpd/mod_auth_cas
# chmod a+rX /var/cache/httpd/mod_auth_cas
$ vi /etc/httpd/conf.d/auth_cas.conf
LoadModule auth_cas_module modules/mod_auth_cas.so
CASCookiePath /var/cache/httpd/mod_auth_cas/
CASCertificatePath /etc/pki/tls/certs/ca-bundle.crt
CASLoginURL https://go.utah.edu/cas/login
CASValidateURL https://go.utah.edu/cas/serviceValidate
or install_scripts/setup_cas.sh
OOD base config files:
# cd /etc/ood/config
# cp ood_portal.yml ood_portal.yml.org
# scp [email protected]:/etc/ood/config/ood_portal.yml .
OR # wget https://raw.githubusercontent.com/CHPC-UofU/OnDemand-info/master/config/ood_portal.yml
# vi ood_portal.yml
- search for "ondemand.chpc.utah.edu", replace with "ondemand-test.chpc.utah.edu"
- (for ondemand-test - set Google Analytics " id: 'UA-122259839-4'"
- copy the
SSLCertificate
part fromood_portal.yml.org
- (comment out line
" - 'Include "/root/ssl/ssl-standard.conf"'"
Update Apache and start it
# /opt/ood/ood-portal-generator/sbin/update_ood_portal
# systemctl try-restart httpd.service htcacheclean.service
Once this is done one should be able to log into https://ondemand-test.chpc.utah.edu and see the vanilla OOD interface.
Mainly for performance reasons if > 10s simultaneous users.
# vi /etc/httpd/conf.modules.d/00-mpm.conf
LoadModule mpm_event_module modules/mod_mpm_event.so
<IfModule mpm_event_module>
ServerLimit 32
StartServers 2
MaxRequestWorkers 512
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 32
MaxRequestsPerChild 0
ThreadLimit 512
ListenBacklog 511
</IfModule>
Check Apache config syntax:
# /sbin/httpd -t
Then restart Apache:
# systemctl try-restart httpd.service htcacheclean.service
Check that the Server MPM is event:
# /sbin/httpd -V
or install_scripts/check_apache_config.sh
$ sudo dnf install munge-devel munge munge-libs
$ sudo rsync -av kingspeak1:/etc/munge/ /etc/munge/
$ sudo systemctl enable munge
$ sudo systemctl start munge
scp -r [email protected]:/etc/ood/config/clusters.d /etc/ood/config
- !!!! in all /etc/ood/config/clusters.d/*.yml replace ondemand.chpc.utah.edu with ondemand-test.chpc.utah.edu
- !!!! may replace websockify/0.8.0 with websockify/0.8.0.r8
Logo images
# scp -r [email protected]:/var/www/ood/public/CHPC-logo35.png /var/www/ood/public
# scp -r [email protected]:/var/www/ood/public/chpc_logo_block.png /var/www/ood/public
# scp -r [email protected]:/var/www/ood/public/CHPC-logo.png /var/www/ood/public
Locales
# mkdir -p /etc/ood/config/locales/
# scp -r [email protected]:/etc/ood/config/locales/en.yml /etc/ood/config/locales/
Dashboard, incl. logos, quota warnings,...
# mkdir -p /etc/ood/config/apps/dashboard/initializers/
# scp -r [email protected]:/etc/ood/config/apps/dashboard/initializers/ood.rb /etc/ood/config/apps/dashboard/initializers/
# scp -r [email protected]:/etc/ood/config/apps/dashboard/env /etc/ood/config/apps/dashboard
Test disk quota
vi /etc/ood/config/apps/dashboard/env
temporarily modify OOD_QUOTA_THRESHOLD="0.10"
, in OOD web interface Restart Web Server to verify that the quota warnings appear.
Active jobs environment
# mkdir -p /etc/ood/config/apps/activejobs
# scp -r [email protected]:/etc/ood/config/apps/activejobs/env /etc/ood/config/apps/activejobs
Base apps configs
# scp -r [email protected]:/etc/ood/config/apps/bc_desktop /etc/ood/config/apps/
# scp -r [email protected]:/etc/ood/config/apps/shell /etc/ood/config/apps/
# scp [email protected]:/var/www/ood/apps/sys/shell/bin/ssh /var/www/ood/apps/sys/shell/bin/
Announcements, XdMoD
# scp -r [email protected]:/etc/ood/config/announcement.md.motd /etc/ood/config/
# scp -r [email protected]:/etc/ood/config/nginx_stage.yml /etc/ood/config/
Widgets/pinned apps
# mkdir /etc/ood/config/ondemand.d/
# scp -r [email protected]:/etc/ood/config/ondemand.d/ondemand.yml /etc/ood/config/ondemand.d/
SLURM job templates
# mkdir -p /etc/ood/config/apps/myjobs
# ln -s /uufs/chpc.utah.edu/sys/ondemand/chpc-myjobs-templates /etc/ood/config/apps/myjobs/templates
OR install_scripts/get_customizations.sh
# /uufs/chpc.utah.edu/sys/ondemand/chpc-apps/update.sh
# cd /var/www/ood/apps/sys
# mkdir org
# mv bc_desktop/ org
# cd /var/www/ood/apps
# ln -s /uufs/chpc.utah.edu/sys/ondemand/chpc-apps/app-templates templates
# cd /var/www/ood/apps/templates
# source /etc/profile.d/chpc.sh
# ./genmodulefiles.sh
OR install_scripts/get_apps.sh
(NB - modules are set up differently, don't run ./genmodulefiles.sh
Restart web server in the client to see all the Interactive Apps. If seen proceed to testing the apps. Including check cluster status app.
Described in CHPC OOD's readme and below, it involves modification of /etc/ood/config/apps/dashboard/initializers/ood.rb
to read in the information, which is then used/parsed in the interactive apps (mainly form.yml.erb
and form.js
).
Supporting infrastructure includes running script that produces a text file which lists the GPUs and partitions. The user accounts/partitions list is curled from portal.
Curled from portal via a cron job that runs on the ondemand server.
Display node status for each node, e.g. for notchpeak. See that URL for description of what cron jobs are run and what and where they produce. Cron job on notchrm runs getmodules.sh once a day to generate file /uufs/chpc.utah.edu/sys/ondemand/chpc-apps/app-templates/modules/notchpeak.json
which is then symlinked to /var/www/ood/apps/templates/modules/notchpeak.json
. As each cluster requires its own json
file, other clusters files are symlinks to notchpeak.json
(incl. redwood.json
as PE uses a copy of the sys branch from the GE).
vi /etc/ood/config/ondemand.d/ondemand.yml.erb
# single endpoint for all file systems (home, scratch, group)
globus_endpoints:
- path: "/"
endpoint: "7cf0baa1-8bd0-4e91-a1e6-c19042952a7c"
endpoint_path: "/"
Using OOD's built in way to auto-set available module versions for interactive apps.
Only in pe-ondemand, not in the GE.
yum install ondemand-selinux
setsebool -P ondemand_use_slurm=on
getsebool -a |grep ondemand
!!!! Netdata webserver monitoring
- Dashboard allocation balance warnings: https://osc.github.io/ood-documentation/latest/customization.html#balance-warnings-on-dashboard