Skip to content

Node_Status_and_Application_Status

ligc edited this page Jul 30, 2015 · 6 revisions

Table of Contents

{{:Design Warning}}

xCAT nodelist table holds node reachability status (status) and application status (appstatus). To turn on the status monitoring, run the following commands:

  **monadd xcatmon -n -s [ping-interval=5]**     (The default ping-interval is 3 minutes).
  **monstart xcatmon**

To turn off the status monitoring, run:

  **monstop xcatmon**

1. Node status

xCAT is now using fping to get the node status. We will switch to use nmap (to query ssh port) on Linux for performance reason.

2. Application status

Format: app1=status1,app2=status2.... Example: ssh="up",ll="down",gpfs="not working at all"

The basic idea is to use nmap to query the ports for application deamons. If the ports is open then the application is healthy. However, some application may need further checking even though the port is open. For such applications, user can surpply a command (scripts) that checks the status. The input to the command is a comma separated list of node names, the output is the application status on each given node. The output format is:

node1:status

node2:status

...

It can be a local command, or a command that will be run remotely on the nodes.

Settings:

Table monsetting:

name key value

xcatmon apps ssh,ll,gpfs,someapp

xcatmon gpfs cmd=/tmp/mycmd,group=compute,group=service

xcarmon ll port=5001,group=compute

xcatmon someapp rmccondname=xxxx,group=all

Keywords:

apps --- a list of comma separated application names whose status will be queried. For how to get the status of each app, look for app name in the key filed in a different row.

**port ** --- the application port number, if not specified, use internal list, then /etc/services. If there is no key specified for an app, assume"port" and "group=all".

group -- the name of a node group that needs to get the application status from. If not specified, assume all the nodes in the nodelist table.

**cmd ** ---- the command will be run locally on mn or sn.

dcmd ---- the command will be run distributed on the nodes (xdsh <nodes> ...).

rmccondname --- the RMC condition name. xCAT needs to associate the condition with LogEventToxCATDatabase response first. Then goto eventlog table, get the events since last observation. (This has not implemented yet.)

3. nodestate

A new flag for nodestat command:

nodestat <nodelist> -u|--updatedb -m|--usemon

It displays the node status and application status, it also writes the status on the nodelist table.

By default, it works as before, that is:

1. gets the ssh,pbs,xend port status;

2. if none of them are open, it gets the fping status;

3. for pingable nodes that are in the middle of deployment, it gets the deployment status;

4. for non-pingable nodes, it shows 'no ping'.

But when -m is specified and there are settings in the monsetting table, it displays the status of the applications specified in the monsetting table. When -u is spcified it saves the status info into the xCAT database. Node's pingable status and deployment status is saved in the nodelist.status column. Node's application status is saved in the nodelist.appstatus column.

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally