Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Parallel/multiprocessing do not work for Describers #637

Open
2 of 4 tasks
kavanase opened this issue Jun 12, 2024 · 1 comment
Open
2 of 4 tasks

[Bug]: Parallel/multiprocessing do not work for Describers #637

kavanase opened this issue Jun 12, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@kavanase
Copy link
Contributor

Email (Optional)

No response

Version

v2023.9.9

Which OS(es) are you using?

  • MacOS
  • Windows
  • Linux

What happened?

Firstly, thanks for developing a really nice package!
When trying to parse a list of Structure objects with M3GNetStructure(n_jobs=4).transform (to then perform DIRECT sampling), using the n_jobs argument to run this command in parallel (to speed up parsing as suggested in the example notebook), the following error is obtained, stating that the parsing functions are not pickle-able and thus unable to be used in parallel:

joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/externals/loky/backend/queues.py", line 159, in _feed
    obj_ = dumps(obj, reducers=reducers)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/externals/loky/backend/reduction.py", line 215, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/externals/loky/backend/reduction.py", line 208, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'weakref.ReferenceType' object
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "M3GNet_Structure_DIRECT_generation.py", line 66, in <module>
    m3gnet_struct.transform(collated_data["structures"][:1000])
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/maml/base/_describer.py", line 122, in transform
    features = Parallel(n_jobs=self.n_jobs)(delayed(cached_transform_one)(self, obj) for obj in objs)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/parallel.py", line 1952, in __call__
    return output if self.return_generator else list(output)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/parallel.py", line 1595, in _get_outputs
    yield from self._retrieve()
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/parallel.py", line 1699, in _retrieve
    self._raise_error_fast()
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/parallel.py", line 736, in get_result
    return self._return_or_raise()
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/joblib/parallel.py", line 754, in _return_or_raise
    raise self._result
_pickle.PicklingError: Could not pickle the task to send it to the workers.
Exception ignored in: <function _CheckpointRestoreCoordinatorDeleter.__del__ at 0x36761e290>
Traceback (most recent call last):
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/tensorflow/python/checkpoint/checkpoint.py", line 197, in __del__
TypeError: 'NoneType' object is not callable

When I tried to instead run the transform function in parallel using multiprocessing (Pool.imap_unordered()) rather than joblib's Parallel, I get a similar error about pickle-ability:

  File "M3GNet_Structure_DIRECT_generation.py", line 77, in <module>
    results = list(tqdm(
  File "/Users/kavanase/miniconda3/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/Users/kavanase/miniconda3/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
  File "/Users/kavanase/miniconda3/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/Users/kavanase/miniconda3/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Users/kavanase/miniconda3/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function _lambdifygenerated at 0x36e5f5000>: attribute lookup _lambdifygenerated on __main__ failed

For now, I can get around this by manually dividing up my dataset and running separate python jobs to parse these individual chunks, but it would be much easier for users if parallel processing was possible, as it can take quite a while.

Code snippet

M3GNetStructure(n_jobs=4).transform([list_of_structures])

Log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@kavanase kavanase added the bug Something isn't working label Jun 12, 2024
@zz11ss11zz
Copy link
Contributor

Hi Kavanase, I suggest using M3GNetStructure().transform_one(). Then, you can parallelly generate respective features and combine them for analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants