Add Functions for Evaluating Target Variables in Predictive Modeling #57

Reinaldo-Kn · 2024-10-18T14:15:19Z

This pull request introduces a set of functions _best_columns.py designed to evaluate and identify the best target variable from a given dataset based on various statistical measures and predictive performance. The functions leverage correlation metrics, regression error metrics, feature importance, and mutual information to provide insights into the most relevant columns for predictive modeling.

Functions

bestColumn_pearson_spearman( )

Calculates the Pearson and Spearman correlation coefficients between all columns in the provided DataFrame. It identifies the column with the highest average correlation (positive) with other columns using both correlation methods. Pearson measures linear relationships, while Spearman measures monotonic relationships.

Parameters:
      df (pd.DataFrame): The input DataFrame containing the dataset.
  Returns:
      dict: A dictionary containing the best column for each correlation method:
          pearson: The column with the highest average Pearson correlation.
          spearman: The column with the highest average Spearman correlation.

bestColumn_with_least_mae_or_r2( )

Evaluates each column as a target variable for regression and calculates the Mean Absolute Error (MAE) and R-squared (R²) scores for predictions made by an XGBoost regressor. It identifies which column minimizes MAE and maximizes R², providing insights on the best target variable based on predictive performance.


    Parameters:
        df (pd.DataFrame): The input DataFrame containing the dataset.
    Returns:
        dict: A dictionary with sorted results for MAE and R²:
            mae: Sorted list of columns minimizing Mean Absolute Error.
            r2: Sorted list of columns maximizing R-squared.

bestColumn_feature_importance( )

Evaluates the importance of each feature by training an XGBoost regressor for each column and computing the average feature importance. This helps to identify which columns contribute most to predicting the target variable.

    Parameters:
        df (pd.DataFrame): The input DataFrame containing the dataset.
    Returns:
        list: A sorted list of feature importances for each column, indicating their contribution to predictive modeling.

bestColumn_mutual_information( )

Calculates the mutual information scores between each column and the other columns in the DataFrame. Mutual information quantifies the amount of information obtained about one variable through the other, helping to determine which features are most informative for predicting the target variable.

    Parameters:
        df (pd.DataFrame): The input DataFrame containing the dataset.
    Returns:
        list: A sorted list of mutual information scores for each column, providing insights into their informational value.

You can view the new functions in Colab

…r ini inspection

* fixed relative imports * fixed relative imports * removed debbug prints

zRafaF and others added 10 commits October 15, 2024 13:00

added find_df_transitions to _bibmon_tools

d0dca7f

Created test requirements and implemented find_df_transitions unity test

3b72b57

implemented df splitting tool and its unity test

72be3cc

Added 3w dataset example to code base, added 3w loader and tooling fo…

65c119d

…r ini inspection

added split_dataset to 3w tools

db3d911

Fix relative imports (#1)

f016640

* fixed relative imports * fixed relative imports * removed debbug prints

added grid search

393ff83

added more func for preprocess

0e5ec7a

more func for alarms detection

6a9fd8f

new ways to search for the best columns to train your model

2e2a0c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Functions for Evaluating Target Variables in Predictive Modeling #57

Add Functions for Evaluating Target Variables in Predictive Modeling #57

Reinaldo-Kn commented Oct 18, 2024

Add Functions for Evaluating Target Variables in Predictive Modeling #57

Are you sure you want to change the base?

Add Functions for Evaluating Target Variables in Predictive Modeling #57

Conversation

Reinaldo-Kn commented Oct 18, 2024

Functions