Added functions for outlier detection and handling #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added functions for outlier detection and handling
This pull request adds two new functions to the
bibmon.PreProcess
class for outlier detection and handling:detect_outliers_iqr(df, cols)
:remove_outliers(df, cols, method='remove')
:remove
: Removes outliers from the DataFrame.median
: Replaces outliers with the median value of the column.winsorize
: Applies winsorization to limit extreme values.Motivation:
Outliers can significantly affect the performance of machine learning models, especially those used for anomaly detection. By detecting and handling outliers, we can improve the accuracy and reliability of the models, which is crucial for the effective use of BibMon with real-world datasets like the 3W Dataset, known for its diverse and potentially noisy data.
Benefits:
Example usage:
By creating this pull request, I confirm that I have read and fully accept and agree with one of the Petrobras' Contributor License Agreements (CLAs):
Our CLAs are based on the Apache Software Foundation's CLAs: