You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WARNING: These fixes may change the behavior of DataFrameWithInfo class (attributes and methods) and its APIs
This a list of TODOs in DataFrameWithInfo class that need further inspection now that the package scope is more clear.
In column_list_by_type property exclude NaN columns (self.nan_cols) from col_list too (so they will not be included in num_categorical_cols just for one not-Nan value)
In column_list_by_type property, when returning ColumnListByType instance, numerical_cols should not includebool_cols
In to_be_encoded_cat_cols property, probably cols_by_type.num_categorical_cols (that are categorical columns with numerical values) should not need encoding and should not be included in categorical_cols returned variable.
In least_nan_cols method, add argument col_list to select the columns to analyze when the user wants to find which columns have the lowest count of NaNs
Rename check_duplicated_features method to contains_duplicated_features because it returns a boolean value
In check_duplicated_features method, in case there are columns with the same name, check if their values are the same too and inform the user appropriately.
Think about the case where the same column is among both "original_columns" and "derived_columns". Does this make sense? Should we raise an error? Would we add the same FeatureOperation twice to the column? That could create problems when looking for that operation because two are found! This case is supposed to happen when a column provides the original values but it is also used to store the resulting values. Is this a good practice? (see the following checkbox)
In get_enc_column_from_original, the found operation found_operat should never be also among the found_operat.derived_columns (according to the previous checkbox). This similar check should be removed in get_original_from_enc_column method.
In add_operation method, the feature_operation argument should not contain original_columns or derived_columns = None because it means they are not specified (instead they are supposed to be set to ()). So the code to raise this error should be implemented.
List of TODOs in FeatureOperation class:
Should this class be moved to another file along with the other DataFrameWithInfo supplementary functions?
Remove details attribute since the value in docstring is the same as encoded_values_map
Remove one between encoder and encoding_function attribute (or maybe just distinguish between the instance and the class type. I do not think this is necessary since the encoding function is simply the type of the encoder)
The text was updated successfully, but these errors were encountered:
WARNING: These fixes may change the behavior of DataFrameWithInfo class (attributes and methods) and its APIs
This a list of TODOs in DataFrameWithInfo class that need further inspection now that the package scope is more clear.
column_list_by_type
property exclude NaN columns (self.nan_cols) fromcol_list
too (so they will not be included in num_categorical_cols just for one not-Nan value)column_list_by_type
property, when returning ColumnListByType instance,numerical_cols
should not includebool_cols
to_be_encoded_cat_cols
property, probablycols_by_type.num_categorical_cols
(that are categorical columns with numerical values) should not need encoding and should not be included incategorical_cols
returned variable.least_nan_cols
method, add argumentcol_list
to select the columns to analyze when the user wants to find which columns have the lowest count of NaNscheck_duplicated_features
method tocontains_duplicated_features
because it returns a boolean valuecheck_duplicated_features
method, in case there are columns with the same name, check if their values are the same too and inform the user appropriately.get_enc_column_from_original
, the found operationfound_operat
should never be also among thefound_operat.derived_columns
(according to the previous checkbox). This similar check should be removed inget_original_from_enc_column
method.add_operation
method, thefeature_operation
argument should not containoriginal_columns
orderived_columns
= None because it means they are not specified (instead they are supposed to be set to()
). So the code to raise this error should be implemented.List of TODOs in FeatureOperation class:
details
attribute since the value in docstring is the same as encoded_values_mapencoder
andencoding_function
attribute (or maybe just distinguish between the instance and the class type. I do not think this is necessary since the encoding function is simply the type of the encoder)The text was updated successfully, but these errors were encountered: