Skip to content

Commit

Permalink
Random state in feature selection (#168)
Browse files Browse the repository at this point in the history
* Upgraded pymatgen and matminer requirements

* backward compatibility warning

* Possibility to remove all NaNs features or not after featurization.

* Arg in featurize.

* Arg in preset because there are clean_df there as well.

* Easier setting of drop_allnan.

* Let this for another PR.

* Possibility to tune random_state in feature selection. Useful when segfaults appear with very small datasets (testing).

* update doscstring

---------

Co-authored-by: ppdebreuck <[email protected]>
  • Loading branch information
gbrunin and ppdebreuck authored Jun 24, 2024
1 parent f49b1ca commit e3c9208
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion modnet/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -805,6 +805,7 @@ def feature_selection(
drop_thr: float = 0.2,
n_jobs: int = None,
ignore_names: Optional[List] = [],
random_state: int = None,
):
"""Compute the mutual information between features and targets,
then apply relevance-redundancy rankings to choose the top `n`
Expand All @@ -823,6 +824,7 @@ def feature_selection(
n_jobs: max. number of processes to use when calculating cross NMI.
ignore_names (List): Optional list of property names to ignore during feature selection.
Feature selection will be performed w.r.t. all properties except the ones in ignore_names.
random_state (int): Seed used to compute the NMI.
"""
if getattr(self, "df_featurized", None) is None:
Expand Down Expand Up @@ -867,7 +869,11 @@ def feature_selection(
else:
df = self.df_featurized.copy()
self.cross_nmi, self.feature_entropy = get_cross_nmi(
df, return_entropy=True, drop_thr=drop_thr, n_jobs=n_jobs
df,
return_entropy=True,
drop_thr=drop_thr,
n_jobs=n_jobs,
random_state=random_state,
)

if self.cross_nmi.isna().sum().sum() > 0:
Expand Down Expand Up @@ -897,6 +903,7 @@ def feature_selection(
df,
df_target,
task_type,
random_state=random_state,
)[name]

LOG.info("Computing optimal features...")
Expand Down

0 comments on commit e3c9208

Please sign in to comment.