diff --git a/README.md b/README.md index 85cae00..bb4481a 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,29 @@ -# xai-classification -eXplainable AI for eXtreme Gradient Boosting classification of lateral spreading datasets +# XAI - Lateral Spreading +This project investigates the application of eXplainable AI (XAI) techniques on predictive machine learning models for lateral spreading phenomena. We have developed multiple XGBoost models using a dataset sourced from [Durante and Rathje (2022)](https://www.designsafe-ci.org/data/browser/public/designsafe.storage.published/PRJ-2998v2). The repository provides resources for data preprocessing, model training, and interpretation using SHAP (SHapley Additive exPlanations) explainers. + +## Folder Structure +**`data` Folder**: Contains both the original and processed datasets. The original dataset, derived from [Durante and Rathje (2021)](https://doi.org/10.1177/87552930211004613), comprises 6,500 datapoints from Christchurch, New Zealand, pertaining to the 2011 Christchurch Earthquake. It includes various features such as geometry features, event-specific features like groundwater depth (GWD) and peak ground acceleration (PGA), CPT (cone penetration test) related features, and binary indicators for lateral spreading. Refer to Table 1 for a breakdown of features used in each model. + + +**Table 1.** Summary of features used in each XGBoost model. +|Model|L
(km)|GWD
(m)|PGA
(g)|Elevation
(m)|Slope
(%)|Ic
(med)|Ic
(std)|qc1Ncs
(med)|qc1Ncs
(std)| +|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| +|A|✓|✓|✓|✓|✓|O|O|O|O| +|B|✓|✓|✓|✓|✓|✓|✓|✓|✓| +|C|✓|✓|✓|✓|O|✓|✓|O|O| +
+ +**`model_development` Folder**: Includes Jupyter notebooks for data preprocessing (_`data_preprocessing.ipynb`_) and XGBoost model training (_`xgb_training.ipynb`_). The data preprocessing notebook loads the dataset, performs data splitting, and feature selection according to Table 1, saving the processed data as pickle files (_`data_x.pkl`_) in **`data`** folder. The XGBoost training notebook demonstrates the model training process and saves the trained models as pickle files (_`opt_XGB_X.pkl`_) in the **`xgb_models`** folder. +
+ +**`model_usage` Folder**: Contains Jupyter notebooks (_`shap_explainer_X.ipynb`_) for generating SHAP explanations for each XGBoost model. These notebooks load the trained models and corresponding data to create SHAP visualizations. +
+ +**`xgb_models` Folder**: Stores the trained XGBoost models developed from different datasets in the **`data`** folder. + +## References +Durante, M. G. and Rathje, E. (2022). Machine learning models for the evaluation of the lateral spreading hazard in the Avon river area following the 2011 Christchurch earthquake. doi:10.17603/DS2-3ZDJ-4937 + +Durante, M. G. and Rathje, E. M. (2021). An exploration of the use of machine learning to predict lateral spreading. Earthquake Spectra 37, 2288–2314. doi:10.1177/87552930211004613 + +## Citation