-
Notifications
You must be signed in to change notification settings - Fork 173
XCAT_Monitoring_Failure_Analysis_Conditions_Batched_Events
Table of Contents
- Integrate xCAT with event logging software
- HW service events collected from HMCs to EMS
- when the mn is down for a period of time
- when we failover to the backup mn
- when an HMC goes down
- when an HMC fails over to the backup hmc (if we even support this)
##Integrate xCAT with event logging software
The event logging software is closely tied with xCAT. It uses xCAT's database to store the events to the table defined by this software. xCAT's configuration/setup for the event logging software includes the following:
- xCAT's external table support allows 3rd party to add its own tables into xCAT's database.
- xCAT helps to set up the management domain for RMC monitoring to be used by the event logging software
- xCAT helps it to create conditions, responses and sensors on the mn, sn and the nodes
In order for consolidate all the events into one table, xCAT will also provides responses to write the events from the conditions that are monitored by xCAT into the same event log table for the event logging software.
##HW service events collected from HMCs to EMS
The SFP connector the event logging software collects the servicable events from a HMC. xCAT will configure the HMC so that it sends the events to the mn to be captured by the SFP connector. we need to reliably get all of the hw service events from the hmcs to the ems. Specifically, we don't want to miss events in the following situations: #when the mn is down for a period of time #when we failover to the backup mn #when an HMC goes down #when an HMC fails over to the backup hmc (if we even support this) The proposed approach is to use the rmc batched event capability and create a condition on each hmc that monitors the sensor and tells errm to batch up the events in batch files on the hmc (probably with a short duration so service events come to the ems pretty quickly). The ems then needs some code that monitors the conditions on the hmcs and retrieves new batch files when created. It also needs to record which batch files have been retrieved so that when the ems first comes (back) up, it can look at its records and look at the batch files on the hmcs and retrieve any that haven't been retrieved yet.
##Predefined conditions to support Failure analysis
More predefined conditions, responses and sensors will be provided to monitor the p7 HW and HPC software.
##Exploitation of batched event hierarchical support Events can be batched in RMC and only one response can be send out for batched event. Batched events are stored in a file. There are some attributes on the Condition class that indicate when a batched event file is ready to be processed. xCAT will supply some new responses that will be installed on mn. When called, they go to the sn to get the batch event file and parse it, then call the action commands.
- Nov 13, 2024: xCAT 2.17 released.
- Mar 08, 2023: xCAT 2.16.5 released.
- Jun 20, 2022: xCAT 2.16.4 released.
- Nov 17, 2021: xCAT 2.16.3 released.
- May 25, 2021: xCAT 2.16.2 released.
- Nov 06, 2020: xCAT 2.16.1 released.
- Jun 17, 2020: xCAT 2.16 released.
- Mar 06, 2020: xCAT 2.15.1 released.
- Nov 11, 2019: xCAT 2.15 released.
- Mar 29, 2019: xCAT 2.14.6 released.
- Dec 07, 2018: xCAT 2.14.5 released.
- Oct 19, 2018: xCAT 2.14.4 released.
- Aug 24, 2018: xCAT 2.14.3 released.
- Jul 13, 2018: xCAT 2.14.2 released.
- Jun 01, 2018: xCAT 2.14.1 released.
- Apr 20, 2018: xCAT 2.14 released.
- Mar 14, 2018: xCAT 2.13.11 released.
- Jan 26, 2018: xCAT 2.13.10 released.
- Dec 18, 2017: xCAT 2.13.9 released.
- Nov 03, 2017: xCAT 2.13.8 released.
- Sep 22, 2017: xCAT 2.13.7 released.
- Aug 10, 2017: xCAT 2.13.6 released.
- Jun 30, 2017: xCAT 2.13.5 released.
- May 19, 2017: xCAT 2.13.4 released.
- Apr 14, 2017: xCAT 2.13.3 released.
- Feb 24, 2017: xCAT 2.13.2 released.
- Jan 13, 2017: xCAT 2.13.1 released.
- Dec 09, 2016: xCAT 2.13 released.
- Dec 06, 2016: xCAT 2.9.4 (AIX only) released.
- Nov 11, 2016: xCAT 2.12.4 released.
- Sep 30, 2016: xCAT 2.12.3 released.
- Aug 19, 2016: xCAT 2.12.2 released.
- Jul 08, 2016: xCAT 2.12.1 released.
- May 20, 2016: xCAT 2.12 released.
- Apr 22, 2016: xCAT 2.11.1 released.
- Mar 11, 2016: xCAT 2.9.3 (AIX only) released.
- Dec 11, 2015: xCAT 2.11 released.
- Nov 11, 2015: xCAT 2.9.2 (AIX only) released.
- Jul 30, 2015: xCAT 2.10 released.
- Jul 30, 2015: xCAT migrates from sourceforge to github
- Jun 26, 2015: xCAT 2.7.9 released.
- Mar 20, 2015: xCAT 2.9.1 released.
- Dec 12, 2014: xCAT 2.9 released.
- Sep 5, 2014: xCAT 2.8.5 released.
- May 23, 2014: xCAT 2.8.4 released.
- Jan 24, 2014: xCAT 2.7.8 released.
- Nov 15, 2013: xCAT 2.8.3 released.
- Jun 26, 2013: xCAT 2.8.2 released.
- May 17, 2013: xCAT 2.7.7 released.
- May 10, 2013: xCAT 2.8.1 released.
- Feb 28, 2013: xCAT 2.8 released.
- Nov 30, 2012: xCAT 2.7.6 released.
- Oct 29, 2012: xCAT 2.7.5 released.
- Aug 27, 2012: xCAT 2.7.4 released.
- Jun 22, 2012: xCAT 2.7.3 released.
- May 25, 2012: xCAT 2.7.2 released.
- Apr 20, 2012: xCAT 2.7.1 released.
- Mar 19, 2012: xCAT 2.7 released.
- Mar 15, 2012: xCAT 2.6.11 released.
- Jan 23, 2012: xCAT 2.6.10 released.
- Nov 15, 2011: xCAT 2.6.9 released.
- Sep 30, 2011: xCAT 2.6.8 released.
- Aug 26, 2011: xCAT 2.6.6 released.
- May 20, 2011: xCAT 2.6 released.
- Feb 14, 2011: Watson plays on Jeopardy and is managed by xCAT!
- xCAT OS And Hw Support Matrix
- Oct 22, 2010: xCAT 2.5 released.
- Apr 30, 2010: xCAT 2.4 is released.
- Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
- Apr 16, 2009: xCAT 2.2 released.
- Oct 31, 2008: xCAT 2.1 released.
- Sep 12, 2008: Support for xCAT 2 can now be purchased!
- June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
- May 30, 2008: xCAT 2.0 for Linux officially released!
- Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
- Oct 31, 1999: xCAT 1.0 is born!
xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.