Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use featurize twice in a notebook without restarting the kernel #202

Closed
shivang-22 opened this issue Mar 27, 2024 · 5 comments
Closed

Comments

@shivang-22
Copy link

I am using the following function to use MODNet on a custom dataset with compositions only:

def GNN(df, target, extra_feat, batch, lr):
    
    modnet_mask = (~df[target].isna())
    for f in range(len(extra_feat)):
        modnet_mask = modnet_mask & (~df[extra_feat[f]].isna())
        
    mod_df = df[modnet_mask]
    mod_df.reset_index(inplace=True, drop=True)
    
    data = MODData(
        materials=mod_df["Name"],
        targets=mod_df[target],
        target_names=[target]
    )
    
    data.featurize()
    
    for f in range(len(extra_feat)):
        data.df_featurized[extra_feat[f]] = mod_df[extra_feat[f]].values
    
    split = train_test_split(range(len(mod_df)), test_size=0.1, random_state=1234)
    train, test = data.split(split)
    train.feature_selection(n=-1)
    
    model = MODNetModel([[[target]]],
                        weights={target:1},
                        num_neurons = [[128], [64], [8], [2]],
                        n_feat = 100,
                        act =  "relu"
                       )
    
    model.fit(train,
              val_fraction = 0.1,
              lr = lr,
              batch_size = batch,
              loss = 'mae',
              epochs = 100,
              verbose = 1,
              callbacks=[ReduceLROnPlateau()]
             )
    
    pred = model.predict(test)
    mae_test = np.absolute(pred.values-test.df_targets.values).mean()
    print(f'mae: {mae_test}')

It runs fine the first time I use it, but if I change the inputs to the function and run it again in another cell, it gets stuck forever on the featurize step. So,

GNN(data_df, 'y', ['x1', 'x2'], 32, 0.02)
works fine, but then in the very next cell,
GNN(data_df, 'y', ['x1', 'x2'], 32, 0.04)
get stuck. Am I missing something?

@ml-evs
Copy link
Collaborator

ml-evs commented Mar 28, 2024

Hi @shivang-22, could you give us any more info? When you say it "gets stuck" what was the last output? featurize is directly calling matminer's featurize_many under the hood by default, which has been known to be a bit iffy with parallelism (though I'm not sure why it would work the first time on the same data). You could try explicitly setting the number of "jobs" in the featurizer with e.g. data.featurize(n_jobs=1).

@shivang-22
Copy link
Author

Certainly! So this is the error log I get when I interrupt the kernel because it got 'stuck'. I'm not pasting the message in its entirety because that would be too long, but this might help maybe.

The top of the error log is:

Cell In[15], line 18, in GNN(df, target, extra_feat, batch, lr)
     10 mod_df.reset_index(inplace=True, drop=True)
     12 data = MODData(
     13     materials=mod_df["Name"],
     14     targets=mod_df[target],
     15     target_names=[target]
     16 )
---> 18 data.featurize()
     20 for f in range(len(extra_feat)):
     21     data.df_featurized[extra_feat[f]] = mod_df[extra_feat[f]].values

File /scratch/micromamba/envs/alembic/lib/python3.10/site-packages/modnet/preprocessing.py:783, in MODData.featurize(self, fast, db_file, n_jobs, drop_allnan)
    779         df_final = df_done
    781 # otherwise, no structures were loaded, so we need to compute all
    782 else:
--> 783     df_final = self.featurizer.featurize(self.df_structure)
    785 # replace infinite values by nan that are handled during the fit
    786 df_final = clean_df(df_final, drop_allnan=drop_allnan)

File /scratch/micromamba/envs/alembic/lib/python3.10/site-packages/modnet/featurizers/featurizers.py:91, in MODFeaturizer.featurize(self, df)
     89 df_composition = pd.DataFrame([])
     90 if self.composition_featurizers or self.oxid_composition_featurizers:
---> 91     df_composition = self.featurize_composition(df)
     93 df_structure = pd.DataFrame([])
     94 if self.structure_featurizers:

This points to the fact that its still computing the features. The bottom of the error log was more interesting to me, and reads as follows:

File /scratch/micromamba/envs/alembic/lib/python3.10/site-packages/matminer/featurizers/base.py:476, in BaseFeaturizer.featurize_many(self, entries, ignore_errors, return_errors, pbar)
    470 with Pool(self.n_jobs, maxtasksperchild=1) as p:
    471     func = partial(
    472         self.featurize_wrapper,
    473         return_errors=return_errors,
    474         ignore_errors=ignore_errors,
    475     )
--> 476     res = p.map(func, entries, chunksize=self.chunksize)
    477     return res

File /scratch/micromamba/envs/alembic/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
    362 def map(self, func, iterable, chunksize=None):
    363     '''
    364     Apply `func` to each element in `iterable`, collecting the results
    365     in a list that is returned.
    366     '''
--> 367     return self._map_async(func, iterable, mapstar, chunksize).get()

File /scratch/micromamba/envs/alembic/lib/python3.10/multiprocessing/pool.py:768, in ApplyResult.get(self, timeout)
    767 def get(self, timeout=None):
--> 768     self.wait(timeout)
    769     if not self.ready():
    770         raise TimeoutError

File /scratch/micromamba/envs/alembic/lib/python3.10/multiprocessing/pool.py:765, in ApplyResult.wait(self, timeout)
    764 def wait(self, timeout=None):
--> 765     self._event.wait(timeout)

File /scratch/micromamba/envs/alembic/lib/python3.10/threading.py:607, in Event.wait(self, timeout)
    605 signaled = self._flag
    606 if not signaled:
--> 607     signaled = self._cond.wait(timeout)
    608 return signaled

File /scratch/micromamba/envs/alembic/lib/python3.10/threading.py:320, in Condition.wait(self, timeout)
    318 try:    # restore state no matter what (e.g., KeyboardInterrupt)
    319     if timeout is None:
--> 320         waiter.acquire()
    321         gotit = True
    322     else:

KeyboardInterrupt: 

Its seems to me that the code is waiting indefinitely?

@ml-evs
Copy link
Collaborator

ml-evs commented Mar 28, 2024

So it gets stuck in the parallel internals of matminer (maybe -- depends on your luck when you actually interrupt). I would rerun with n_jobs=1 as suggested above and see if you get the same problem. Otherwise you can also try changing the featurizer mode between multi and single which will change the parallelism to be over structures rather than features.

e.g. add to the snippet above:

data.featurizer.featurizer_mode = "single"

This will either "just work" or it will give us better debug info on which featurizer is causing it to hang.

@shivang-22
Copy link
Author

Okay, so both n_jobs=1 and data.featurizer.featurizer_mode = "single" work, but the speed is significantly slower than the default. The latter still (understandably) does better, but is there a way to make this method faster?

@ml-evs
Copy link
Collaborator

ml-evs commented Mar 29, 2024

The speed is just a limitation of matminer unfortunately. Glad it is working now though. You can see hackingmaterials/matminer#902 for the full description of the problem of parallelism in matminer.

@ml-evs ml-evs closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants