Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #201

Open
xc012 opened this issue Jan 19, 2023 · 2 comments

Comments

@xc012
Copy link

xc012 commented Jan 19, 2023

Problem: The error “apex is not installed” during training, but it has been tried to install in various ways, including direct "pip install apex" and download source code and compilation.

Command: python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py

Error
apex is not installed
Traceback (most recent call last):
File "tools/train.py", line 15, in
from mmdet.apis import set_random_seed, train_detector
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/apis/init.py", line 1, in
from .inference import (async_inference_detector, inference_detector,
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/apis/inference.py", line 11, in
from mmdet.datasets import replace_ImageToTensor
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/datasets/init.py", line 10, in
from .utils import (NumClassCheckHook, get_loading_pipeline,
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/datasets/utils.py", line 9, in
from mmdet.models.dense_heads import GARPNHead, RPNHead
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/models/init.py", line 1, in
from .backbones import * # noqa: F401,F403
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/models/backbones/init.py", line 13, in
from .swin_transformer import SwinTransformer
File "/root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection/mmdet/models/backbones/swin_transformer.py", line 13, in
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
File "/root/.local/lib/python3.7/site-packages/timm/init.py", line 2, in
from .models import create_model, list_models, is_model, list_modules, model_entrypoint,
File "/root/.local/lib/python3.7/site-packages/timm/models/init.py", line 1, in
from .beit import *
File "/root/.local/lib/python3.7/site-packages/timm/models/beit.py", line 49, in
from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
File "/root/.local/lib/python3.7/site-packages/timm/data/init.py", line 5, in
from .dataset import ImageDataset, IterableImageDataset, AugMixDataset
File "/root/.local/lib/python3.7/site-packages/timm/data/dataset.py", line 12, in
from .parsers import create_parser
File "/root/.local/lib/python3.7/site-packages/timm/data/parsers/init.py", line 1, in
from .parser_factory import create_parser
File "/root/.local/lib/python3.7/site-packages/timm/data/parsers/parser_factory.py", line 3, in
from .parser_image_folder import ParserImageFolder
File "/root/.local/lib/python3.7/site-packages/timm/data/parsers/parser_image_folder.py", line 11, in
from timm.utils.misc import natural_key
File "/root/.local/lib/python3.7/site-packages/timm/utils/init.py", line 4, in
from .cuda import ApexScaler, NativeScaler
File "/root/.local/lib/python3.7/site-packages/timm/utils/cuda.py", line 8, in
from apex import amp
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/init.py", line 27, in
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/init.py", line 4, in
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/pipeline_parallel/init.py", line 1, in
# -- coding: utf-8 --
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/pipeline_parallel/schedules/init.py", line 3, in
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py", line 10, in
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/pipeline_parallel/schedules/common.py", line 9, in
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/pipeline_parallel/p2p_communication.py", line 25, in
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 668, in _load_unlocked
File "", line 638, in _load_backward_compatible
File "/opt/conda/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/transformer/utils.py", line 11, in
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

Environment
Python 3.7.6
torch 1.7.1+cu110
torchaudio 0.7.2
torchvision 0.8.2+cu110
apex 0.1
mmcv-full 1.2.4
mmdet 2.11.0 /root/Swin-Transformer-Object-Detection-master/Swin-Transformer-Object-Detection

Installation method:
Apex Installation method:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./

MMCV Installation method:
pip install mmcv-full==1.2.4 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html

mmdet Installation method:
git clone https://github.com/SwinTransformer/Swin-Transformer-Object-Detection.git
cd Swin-Transformer-Object-Detection
python setup.py develop

@f2367976412
Copy link

I also encountered the same problem, please tell me if you solved it

@JankinHou
Copy link

我也遇到了同样的问题,请问最终解决了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants