Add Grid Search Functionality for Best Hyperparameter Tuning #52

Reinaldo-Kn · 2024-10-16T14:22:56Z

The GridSearchModel class encapsulates the process of hyperparameter tuning using GridSearchCV, providing a modular and flexible interface for model selection and training. It includes methods for fitting a model, predicting, saving/loading models, and additional utility functions.

Parameters

model: sklearn estimator, optional (default=RandomForestRegressor())
    The machine learning model to be tuned. This can be any model compatible with GridSearchCV.

param_grid: dict, optional (default={'n_estimators': [10, 100], 'max_depth': [None, 10 ], 'min_samples_split': [2, 4 ]})
    A dictionary containing the hyperparameters and their respective ranges for the grid search. The grid will search through all possible combinations of these hyperparameters.

scoring: str, optional (default='neg_mean_absolute_error')
    The scoring metric used for evaluating the models during grid search. This should be a valid scoring metric recognized by scikit-learn.

cv: int, optional (default=5)
    The number of cross-validation folds to be used during the grid search.

test_size: float, optional (default=0.01)
    The proportion of the dataset to use as the test set. The default is 1% of the data.

Methods

fit(self, df, target_column)
Fits the model using grid search with the specified dataset and target variable.

    Parameters:
        df (pandas.DataFrame): The input dataset containing both features and the target variable.
        target_column (str): The name of the column that represents the target variable (the variable to predict).
    Returns:
        best_estimator_ (sklearn estimator): The model fitted with the best hyperparameters found during the grid search.

predict(self, X)

Makes predictions using the best model found by grid search.

    Parameters:
        X (pandas.DataFrame or numpy.ndarray): Input data to make predictions on.
    Returns:
        y_pred (numpy.ndarray): The predicted values.

score(self, X_test, y_test)

Evaluates the performance of the best model on a test dataset.

    Parameters:
        X_test (pandas.DataFrame or numpy.ndarray): Features of the test set.
        y_test (pandas.Series or numpy.ndarray): True values of the test set.
    Returns:
        score (float): The performance score of the best model on the test set.

get_best_params(self)

Returns the best hyperparameters found by the grid search.

    Returns:
        best_params (dict): Dictionary of the best hyperparameters.

save_model(self, filename)

Saves the best model to a file.

    Parameters:
        filename (str): The path where the model should be saved.
    Returns:
        None

load_model(self, filename)

Loads a previously saved model from a file.

    Parameters:
        filename (str): The path where the model is saved.
    Returns:
        None

plot_feature_importance(self, feature_names)

Plots the feature importances from the best model. Only works for models that have the feature_importances_ attribute, such as RandomForest.

    Parameters:
        feature_names (list): List of feature names in the same order as they appear in the dataset.
    Returns:
        None

cross_val_score_summary(self, X, Y)

Generates a summary of cross-validation scores for the best model.

    Parameters:
        X (pandas.DataFrame): The features for cross-validation.
        Y (pandas.Series): The target variable for cross-validation.
    Returns:
        summary (dict): A dictionary containing the mean and standard deviation of the cross-validation scores, along with individual fold scores.

Example Usage

from sklearn.ensemble import RandomForestRegressor
from _gridsearch import GridSearchModel

# Initialize the model with default or custom hyperparameters
grid_model = GridSearchModel(
    model=RandomForestRegressor(),
    param_grid={'n_estimators': [50, 100], 'max_depth': [None, 10]},
    scoring='neg_mean_squared_error',
    cv=5,
    test_size=0.2
)

# Fit the model to the dataset
best_model = grid_model.fit(df, 'target_column')

# Get the best hyperparameters
best_params = grid_model.get_best_params()
print("Best Parameters: ", best_params)

# Predict on new data
y_pred = grid_model.predict(X_new)

# Evaluate the model on the test set
test_score = grid_model.score(X_test, y_test)
print("Test Score: ", test_score)

# Save the model
grid_model.save_model('best_model.joblib')

# Plot feature importance
grid_model.plot_feature_importance(feature_names)

# Get cross-validation summary
cv_summary = grid_model.cross_val_score_summary(X, Y)
print("CV Mean Score: ", cv_summary['mean_score'])
print("CV Standard Deviation: ", cv_summary['std_dev'])

You can view the new functions in Colab

…r ini inspection

* fixed relative imports * fixed relative imports * removed debbug prints

zRafaF and others added 8 commits October 15, 2024 13:00

added find_df_transitions to _bibmon_tools

d0dca7f

Created test requirements and implemented find_df_transitions unity test

3b72b57

implemented df splitting tool and its unity test

72be3cc

Added 3w dataset example to code base, added 3w loader and tooling fo…

65c119d

…r ini inspection

added split_dataset to 3w tools

db3d911

Fix relative imports (#1)

f016640

* fixed relative imports * fixed relative imports * removed debbug prints

added grid search

393ff83

added more func for preprocess

0e5ec7a

Reinaldo-Kn mentioned this pull request Oct 17, 2024

Preprocessing Functions for Time-Based Windowing, Differencing, and Data Transformation. Xgboost for GridSearch #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Grid Search Functionality for Best Hyperparameter Tuning #52

Add Grid Search Functionality for Best Hyperparameter Tuning #52

Reinaldo-Kn commented Oct 16, 2024 •

edited

Loading

Add Grid Search Functionality for Best Hyperparameter Tuning #52

Are you sure you want to change the base?

Add Grid Search Functionality for Best Hyperparameter Tuning #52

Conversation

Reinaldo-Kn commented Oct 16, 2024 • edited Loading

Parameters

Methods

Example Usage

Reinaldo-Kn commented Oct 16, 2024 •

edited

Loading