Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FusedAdam requires cuda extensions #11

Open
mehranagh20 opened this issue Apr 15, 2021 · 6 comments
Open

FusedAdam requires cuda extensions #11

mehranagh20 opened this issue Apr 15, 2021 · 6 comments

Comments

@mehranagh20
Copy link

I have built the apex module based on the procedure explained but when trying to train the model on cifar10, I get:

/lustre03/project/6054857/mehranag/vdvae/data.py:147: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  trX = np.vstack(data['data'] for data in tr_data)
Traceback (most recent call last):
  File "train.py", line 144, in <module>
    main()
  File "train.py", line 140, in main
    train_loop(H, data_train, data_valid_or_test, preprocess_fn, vae, ema_vae, logprint)
  File "train.py", line 59, in train_loop
    optimizer, scheduler, cur_eval_loss, iterate, starting_epoch = load_opt(H, vae, logprint)
  File "/lustre03/project/6054857/mehranag/vdvae/train_helpers.py", line 180, in load_opt
    optimizer = AdamW(vae.parameters(), weight_decay=H.wd, lr=H.lr, betas=(H.adam_beta1, H.adam_beta2))
  File "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", line 79, in __init__
    raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions')
RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

I understand that this is an apex-related issue since I get the following error when trying to run examples/simple/distributed in the apex repo:

Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ImportError("/lib64/libm.so.6: version `GLIBC_2.29' not found (required by /home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/amp_C.cpython-36m-x86_64-linux-gnu.so)",)
final loss =  tensor(0.5392, device='cuda:0', grad_fn=<MseLossBackward>)

I have tried many things to fix this issue but no luck. I have two questions:

  • Does anybody know why I get FusedAdam requires cuda extensions even though I build apex with --global-option="--cpp_ext" --global-option="--cuda_ext" options?
  • How can I avoid using apex? - I am only trying to test some stuff on cifar10 and don't need the distributed training feature considering that I'm getting some weird errors!
@rewonc
Copy link
Contributor

rewonc commented May 3, 2021

@mehranagh20 -- Are you using the code on a GPU, and do you have the appropriate CUDA drivers enabled?

If you want to avoid using apex, you can swap out the AdamW optimizer for pytorch's AdamW. I think you might need to adjust some of the arguments.

@Chiang97912
Copy link

Chiang97912 commented Nov 28, 2022

This is because of apex cannot import amp_C,you can check the file "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", also you can use your python shell to verify this:

import torch
import amp_C  # must import torch before import amp_C

Maybe you can get error like: libstdc++.so.6: version 'GLIBCXX_3.4.20' not found, If so, you can try the following commands:

conda install libgcc
export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH
cd /path/to/anaconda/envs/myenv/lib
ln -s libstdc++.so.6.0.30 libstdc++.so.6

And you can add export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH to ~/.bashrc file.

@ShoufaChen
Copy link

I solved this problem by building with

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

rather than

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

My pip version is 22.3.1.

@AanchalChugh
Copy link

I tried this thing: pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ but it did not solve the problem

@barikata1984
Copy link

barikata1984 commented Sep 30, 2023

Try below:

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext ./

It worked with pip 23.2.1 on python 3.9

@Guodanding
Copy link

Try below:

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext ./

It worked with pip 23.2.1 on python 3.9

This works for me! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants