Node_Status_and_Application_Status

Table of Contents

1. Node status
2. Application status
3. nodestate

xCAT nodelist table holds node reachability status (status) and application status (appstatus). To turn on the status monitoring, run the following commands:

  **monadd xcatmon -n -s [ping-interval=5]**     (The default ping-interval is 3 minutes).
  **monstart xcatmon**

To turn off the status monitoring, run:

  **monstop xcatmon**

1. Node status

xCAT is now using fping to get the node status. We will switch to use nmap (to query ssh port) on Linux for performance reason.

2. Application status

Format: app1=status1,app2=status2.... Example: ssh="up",ll="down",gpfs="not working at all"

The basic idea is to use nmap to query the ports for application deamons. If the ports is open then the application is healthy. However, some application may need further checking even though the port is open. For such applications, user can surpply a command (scripts) that checks the status. The input to the command is a comma separated list of node names, the output is the application status on each given node. The output format is:

node1:status

node2:status

...

It can be a local command, or a command that will be run remotely on the nodes.

Settings:

Table monsetting:

name key value

xcatmon apps ssh,ll,gpfs,someapp

xcatmon gpfs cmd=/tmp/mycmd,group=compute,group=service

xcarmon ll port=5001,group=compute

xcatmon someapp rmccondname=xxxx,group=all

Keywords:

apps --- a list of comma separated application names whose status will be queried. For how to get the status of each app, look for app name in the key filed in a different row.

**port ** --- the application port number, if not specified, use internal list, then /etc/services. If there is no key specified for an app, assume"port" and "group=all".

group -- the name of a node group that needs to get the application status from. If not specified, assume all the nodes in the nodelist table.

**cmd ** ---- the command will be run locally on mn or sn.

dcmd ---- the command will be run distributed on the nodes (xdsh <nodes> ...).

rmccondname --- the RMC condition name. xCAT needs to associate the condition with LogEventToxCATDatabase response first. Then goto eventlog table, get the events since last observation. (This has not implemented yet.)

3. nodestate

A new flag for nodestat command:

nodestat <nodelist> -u|--updatedb -m|--usemon

It displays the node status and application status, it also writes the status on the nodelist table.

By default, it works as before, that is:

1. gets the ssh,pbs,xend port status;

2. if none of them are open, it gets the fping status;

3. for pingable nodes that are in the middle of deployment, it gets the deployment status;

4. for non-pingable nodes, it shows 'no ping'.

But when -m is specified and there are settings in the monsetting table, it displays the status of the applications specified in the monsetting table. When -u is spcified it saves the status info into the xCAT database. Node's pingable status and deployment status is saved in the nodelist.status column. Node's application status is saved in the nodelist.appstatus column.

News

Nov 13, 2024: xCAT 2.17 released.
Mar 08, 2023: xCAT 2.16.5 released.
Jun 20, 2022: xCAT 2.16.4 released.
Nov 17, 2021: xCAT 2.16.3 released.
May 25, 2021: xCAT 2.16.2 released.
Nov 06, 2020: xCAT 2.16.1 released.
Jun 17, 2020: xCAT 2.16 released.
Mar 06, 2020: xCAT 2.15.1 released.
Nov 11, 2019: xCAT 2.15 released.
Mar 29, 2019: xCAT 2.14.6 released.
Dec 07, 2018: xCAT 2.14.5 released.
Oct 19, 2018: xCAT 2.14.4 released.
Aug 24, 2018: xCAT 2.14.3 released.
Jul 13, 2018: xCAT 2.14.2 released.
Jun 01, 2018: xCAT 2.14.1 released.
Apr 20, 2018: xCAT 2.14 released.
Mar 14, 2018: xCAT 2.13.11 released.
Jan 26, 2018: xCAT 2.13.10 released.
Dec 18, 2017: xCAT 2.13.9 released.
Nov 03, 2017: xCAT 2.13.8 released.
Sep 22, 2017: xCAT 2.13.7 released.
Aug 10, 2017: xCAT 2.13.6 released.
Jun 30, 2017: xCAT 2.13.5 released.
May 19, 2017: xCAT 2.13.4 released.
Apr 14, 2017: xCAT 2.13.3 released.
Feb 24, 2017: xCAT 2.13.2 released.
Jan 13, 2017: xCAT 2.13.1 released.
Dec 09, 2016: xCAT 2.13 released.
Dec 06, 2016: xCAT 2.9.4 (AIX only) released.
Nov 11, 2016: xCAT 2.12.4 released.
Sep 30, 2016: xCAT 2.12.3 released.
Aug 19, 2016: xCAT 2.12.2 released.
Jul 08, 2016: xCAT 2.12.1 released.
May 20, 2016: xCAT 2.12 released.
Apr 22, 2016: xCAT 2.11.1 released.
Mar 11, 2016: xCAT 2.9.3 (AIX only) released.
Dec 11, 2015: xCAT 2.11 released.
Nov 11, 2015: xCAT 2.9.2 (AIX only) released.
Jul 30, 2015: xCAT 2.10 released.
Jul 30, 2015: xCAT migrates from sourceforge to github
Jun 26, 2015: xCAT 2.7.9 released.
Mar 20, 2015: xCAT 2.9.1 released.
Dec 12, 2014: xCAT 2.9 released.
Sep 5, 2014: xCAT 2.8.5 released.
May 23, 2014: xCAT 2.8.4 released.
Jan 24, 2014: xCAT 2.7.8 released.
Nov 15, 2013: xCAT 2.8.3 released.
Jun 26, 2013: xCAT 2.8.2 released.
May 17, 2013: xCAT 2.7.7 released.
May 10, 2013: xCAT 2.8.1 released.
Feb 28, 2013: xCAT 2.8 released.
Nov 30, 2012: xCAT 2.7.6 released.
Oct 29, 2012: xCAT 2.7.5 released.
Aug 27, 2012: xCAT 2.7.4 released.
Jun 22, 2012: xCAT 2.7.3 released.
May 25, 2012: xCAT 2.7.2 released.
Apr 20, 2012: xCAT 2.7.1 released.
Mar 19, 2012: xCAT 2.7 released.
Mar 15, 2012: xCAT 2.6.11 released.
Jan 23, 2012: xCAT 2.6.10 released.
Nov 15, 2011: xCAT 2.6.9 released.
Sep 30, 2011: xCAT 2.6.8 released.
Aug 26, 2011: xCAT 2.6.6 released.
May 20, 2011: xCAT 2.6 released.
Feb 14, 2011: Watson plays on Jeopardy and is managed by xCAT!
xCAT OS And Hw Support Matrix

History

Oct 22, 2010: xCAT 2.5 released.
Apr 30, 2010: xCAT 2.4 is released.
Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
Apr 16, 2009: xCAT 2.2 released.
Oct 31, 2008: xCAT 2.1 released.
Sep 12, 2008: Support for xCAT 2 can now be purchased!
June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
May 30, 2008: xCAT 2.0 for Linux officially released!
Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
Oct 31, 1999: xCAT 1.0 is born!
xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.

Provide feedback

Saved searches