Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dino_eva_02_vitdet_*_1024_* configs throw tensor shape mismatch error #356

Open
dgcnz opened this issue Jul 10, 2024 · 2 comments
Open

dino_eva_02_vitdet_*_1024_* configs throw tensor shape mismatch error #356

dgcnz opened this issue Jul 10, 2024 · 2 comments

Comments

@dgcnz
Copy link
Contributor

dgcnz commented Jul 10, 2024

Description

Tested all dino_eva_02_vitdet models from here and the models with image_size=1024 seem to be failing.

Used this image from the installation tutorial.

Working:

  • projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_4attn_1280_lrd0p8_4scale_12ep.py
  • projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_8attn_1536_lrd0p8_4scale_12ep.py
  • projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_6attn_win32_1536_lrd0p7_4scale_12ep.py

Not working:

  • projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_4attn_1024_lrd0p8_4scale_12ep.py
  • projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py

Looking at the logs, the culprit seems to be this snippet of code:

if square_size < max_size and square_size != 0:
warnings.warn("square_size={}, is smaller than max_size={} in batch".format(
self.backbone.padding_constraints['square_size'], max_size))
padding_constraints['square_size'] = max_size

Log info example

Command

python demo/demo.py --config-file projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py \
                    --input idea.jpg \
                    --output visualized_results_eva_no_window_gpu.jpg \
                    --opts train.init_checkpoint="dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth"

Logs

[07/10 11:00:20 detectron2]: Arguments: Namespace(config_file='projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py', webcam=False, video_input=None, input=['idea.jpg'], output='visualized_results_eva_no_window_gpu.jpg', min_size_test=800, max_size_test=1333, img_format='RGB', metadata_dataset='coco_2017_val', confidence_threshold=0.5, opts=['train.init_checkpoint=dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth'])
======== shape of rope freq torch.Size([256, 64]) ========
======== shape of rope freq torch.Size([4096, 64]) ========
[07/10 11:00:24 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth ...
[07/10 11:00:24 fvcore.common.checkpoint]: [Checkpointer] Loading from dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth ...
  0% 0/1 [00:00<?, ?it/s]/content/detrex/./projects/dino_eva/modeling/dino.py:530: UserWarning: square_size=1024, is smaller than max_size=1199 in batch
  warnings.warn("square_size={}, is smaller than max_size={} in batch".format(
  0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/detrex/demo/demo.py", line 141, in <module>
    predictions, visualized_output = demo.run_on_image(img, args.confidence_threshold)
  File "/content/detrex/./demo/predictors.py", line 80, in run_on_image
    predictions = self.predictor(image)
  File "/content/detrex/./demo/predictors.py", line 207, in __call__
    predictions = self.model([inputs])[0]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./projects/dino_eva/modeling/dino.py", line 198, in forward
    features = self.backbone(images.tensor)  # output feature dict
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva.py", line 583, in forward
    bottom_up_features = self.net(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 431, in forward
    x = blk(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 275, in forward
    x = self.attn(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 117, in forward
    q = self.rope(q).type_as(v)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02_utils.py", line 349, in forward
    return  t * self.freqs_cos + rotate_half(t) * self.freqs_sin
RuntimeError: The size of tensor a (5476) must match the size of tensor b (4096) at non-singleton dimension 2
@dgcnz dgcnz changed the title dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep config throws tensor shape mismatch error dino_eva_02_vitdet_*_1024_* configs throw tensor shape mismatch error Jul 10, 2024
@dgcnz
Copy link
Contributor Author

dgcnz commented Jul 10, 2024

Commenting line 532 in the snippet above silences the error but results in bounding boxes with a vertical offset:

visualized_results_eva_no_window_gpu

@dgcnz
Copy link
Contributor Author

dgcnz commented Jul 10, 2024

Okay, it seems that there is some padding happening somewhere that messes up predictions. If I manually resize the image to have square dimensions, then everything works as expected.

visualized_results_eva_no_window_gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant