https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset
- Data Preprocessing and EDA
- Model Training and Evaluation (Logistic Regression, MLP, XG-Boost)
- Training Pipeline
- Inference Pipeline
- Data Ingestion and Transformation
- Model Trainer
- Hyperparameter Tuning
- Automatic Data Augmentation
- Docker Image Creation Script
- CI/CD Workflow (GitHub Actions to Amazon ECR to Amazon EC2)
- Reverse Proxy Setup for HTTPS Requests
- SSL & TLS Certificates
- Clone the repository:
git clone https://github.com/yourusername/IBM_Attrition_Predictor.git
- Navigate to the project directory:
cd IBM_Attrition_Predictor
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
- Install the required packages:
pip install -r requirements.txt
- Open the Jupyter notebook for EDA:
jupyter notebook src/notebooks/EDA.ipynb
- Run the cells to preprocess the data and perform exploratory data analysis.
- Open the Jupyter notebook for model training:
jupyter notebook src/notebooks/models.ipynb
- Run the cells to train the models and evaluate their performance.
- Navigate to the
backend
directory:cd backend
- Run the backend server:
fastapi dev main.py
This project includes a CI/CD workflow using GitHub Actions to build and deploy Docker images to Amazon ECR and then to an Amazon EC2 instance. The workflow also sets up a reverse proxy on the server to handle HTTPS requests and manage SSL & TLS certificates.
The configuration settings are stored in the config/config.yaml
file. You can modify this file to change the settings for the project.
Logs are stored in the logs
directory. Each log file is named with the timestamp of when it was created.
This project is licensed under the MIT License.