SEOULTECH Junior year
Business Analytics Team Project: Electricity Usage Prediction Model
We utilize weather data to predict optimal electricity charging demand. This approach aims to minimize losses from self-discharge by charging energy to meet the future demand, resulting in cost savings.
- preprocess.py: Data preprocessing
- experiment_basic.py: Basic model experiment
- experiment_clustering.py: Advanced model experiment with clustering
- Clone the repository.
- Navigate to the cloned folder.
- Run Python files in the terminal:
- Example:
python experiment_basic.py
- Augmented data is generated only in the first experiment; subsequently, stored files are used.
- Example:
- All results are displayed: clustering results, optimal parameters, feature importances, classification performance, errors.
- Graphs are saved: augmented data, clustering results, usage prediction results.
- Augmented data is saved for consistent experiments: 'data/augmented_data.pickle'.
- Overview: This file imports and preprocesses multiple Excel files. The preprocessed data is saved in a CSV file.
- Usage: Execute directly in the terminal.
- Code Description:
- Load data files and store them as dataframes.
- Vectorize time information and add it as features.
- Transform other features into appropriate formats.
- Replace missing values with the recent 3-hour average.
- Remove unnecessary features.
- Resample data at 24-hour intervals.
- Add recent usage and moving averages.
- Save the preprocessed data to a CSV file.
- Overview: This module provides functions for data augmentation and easy utilization in experiments.
- Usage: Import the DataProc class in the file.
- Code Description:
- data_augmentation: Augments existing data and returns a dataframe.
- augmentation: Utilized in data_augmentation.
- RFE_featureSelection: Performs feature selection using RFE and returns the selected feature list.
- view_figure: Generates and saves graphs based on the input type.
- clf_by_label: Separates X and y by label and returns them in a list.
- Overview: This model clusters each time point based on the features of the input samples.
- Usage: Import the ClusterPattern class in the file.
- Code Description:
- dim_reduction: Performs dimension reduction for pattern extraction and visualization.
- clustering: Clusters the input samples and returns labels in dataframe format.
- Overview: This model predicts which cluster a given time point belongs to.
- Usage: Import the ClassifyLabel class in the file.
- Code Description:
- validation: Conducts 5-fold cross-validation for model validation.
- fit: Trains a classification model to predict labels for a specific time point.
- predict: Predicts labels, saves them as a pickle file, and returns them.
- Overview: Train the ML model, predict on test dataset and comfirm the performance by metrics
- Usage: Import the PredictUsage class in the file.
- Code Description:
- fit: Find the optimal hyper paramter with cross validation
- predict: Predict with the optimal hyperparameter on test dataset
- calculate_error: Calculates MAE, RMSE, MAPE errors and returns them in a dictionary.
- Overview: File for basic model experiments training on the entire dataset.
- Usage: Run in the terminal.
- Experiment Steps:
- Load data: DataProc.
- Data augmentation: DataProc.
- Use stored files for consistency after the first experiment.
- Model training: PredictUsage.
- Electricity usage prediction: PredictUsage.
- Visualize and save at intermediate steps: DataProc.
- Overview: File for advanced model experiments incorporating clustering and classification models.
- Usage: Run in the terminal.
- Experiment Steps:
- Load data: DataProc.
- Data augmentation: DataProc.
- Use stored files for consistency after the first experiment.
- Dimension reduction: ClusterPattern.
- Generate labels through clustering: ClusterPattern.
- Train models separately for each label: PredictUsage.
- Validate label prediction models: ClassifyLabel.
- Train label prediction models: ClassifyLabel.
- Predict labels for test data at each time point: ClassifyLabel.
- Predict electricity usage using label-specific models: PredictUsage.
- Visualize and save at intermediate steps: DataProc.