Adding PyTorch support in data_utils #79

jackmiller2003 · 2024-02-19T08:13:22Z

In this pull request, I add the option of using PyTorch for the data_utils class. This should allow users to get out a PyTorch dataset via the load_ncdata_with_generator function.

The following changes occurred:

Modification of data_utils.py where one can choose an ml_backend (currently either PyTorch or Tensorflow, defaulting to Tensorflow) which will be used in load_ncdata_with_generator and thus elsewhere in the class.
Modification of setup.py where one can now set an environment variable to install PyTorch over Tensorflow. Note that one can run setup.py naively as usual and it will install Tensorflow.
Creation of testing_data_utils_with_backends.py which is a small script to test that one can indeed use the backends correctly and that one can still save things to NumPy arrays.

I was also informed that a new testing framework is coming -- I am happy to do another PR at that time with proper testing. For now, I have tested the logic in data_utils.py with the script testing_data_utils_with_backends.py (including a comparison of output arrays) and the changes to setup.py via the use of different virtual environments.

… (either torch or tensorflow).

…acement. Have not tested

…rols whether to use tensorflow or pytorch as the backend

…ng to this_self inside the nested class IterableTorchDataset

… and that saving to NumPy still works.

…ds.py

…ng their outputs to NumPy

jerrylin96

This looks good to me! Thanks for adding it!

jackmiller2003 added 9 commits February 19, 2024 13:14

feat: added conditional importing dependning on the backend of choice…

7598b6a

… (either torch or tensorflow).

feat: added iterable torch datasets, including as_numpy_iterator repl…

3ebce7d

…acement. Have not tested

feat: changed setup.py to detect an OS environment variable that cont…

6b184e2

…rols whether to use tensorflow or pytorch as the backend

chore: modified some aspects to adhere to modern tf and changed wordi…

e640f4d

…ng to this_self inside the nested class IterableTorchDataset

chore: changed back to old formatting to enable easier git diff

03540af

chore: added small testing script to make sure backends could be used…

0055a2c

… and that saving to NumPy still works.

chore: updated comments.

f0615b8

chore: changed logic and name of tests/testing_data_utils_with_backen…

76e938e

…ds.py

chore: added capability of testing multiple backends and then compari…

7a63db8

…ng their outputs to NumPy

jerrylin96 approved these changes Feb 19, 2024

View reviewed changes

jerrylin96 merged commit 6c52b96 into leap-stc:main Feb 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding PyTorch support in data_utils #79

Adding PyTorch support in data_utils #79

jackmiller2003 commented Feb 19, 2024 •

edited

Loading

jerrylin96 left a comment

Adding PyTorch support in data_utils #79

Adding PyTorch support in data_utils #79

Conversation

jackmiller2003 commented Feb 19, 2024 • edited Loading

jerrylin96 left a comment

Choose a reason for hiding this comment

jackmiller2003 commented Feb 19, 2024 •

edited

Loading