`dino_eva_02_vitdet__1024_` configs throw tensor shape mismatch error #356

dgcnz · 2024-07-10T11:04:25Z

Description

Tested all dino_eva_02_vitdet models from here and the models with image_size=1024 seem to be failing.

Used this image from the installation tutorial.

Working:

projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_4attn_1280_lrd0p8_4scale_12ep.py
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_8attn_1536_lrd0p8_4scale_12ep.py
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_6attn_win32_1536_lrd0p7_4scale_12ep.py

Not working:

projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_4attn_1024_lrd0p8_4scale_12ep.py
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py

Looking at the logs, the culprit seems to be this snippet of code:

detrex/projects/dino_eva/modeling/dino.py

Lines 529 to 532 in 03e02cb

    
           if square_size < max_size and square_size != 0: 
        
               warnings.warn("square_size={}, is smaller than max_size={} in batch".format( 
        
                   self.backbone.padding_constraints['square_size'], max_size)) 
        
               padding_constraints['square_size'] = max_size

Log info example

Command

python demo/demo.py --config-file projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py \
                    --input idea.jpg \
                    --output visualized_results_eva_no_window_gpu.jpg \
                    --opts train.init_checkpoint="dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth"

Logs

[07/10 11:00:20 detectron2]: Arguments: Namespace(config_file='projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py', webcam=False, video_input=None, input=['idea.jpg'], output='visualized_results_eva_no_window_gpu.jpg', min_size_test=800, max_size_test=1333, img_format='RGB', metadata_dataset='coco_2017_val', confidence_threshold=0.5, opts=['train.init_checkpoint=dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth'])
======== shape of rope freq torch.Size([256, 64]) ========
======== shape of rope freq torch.Size([4096, 64]) ========
[07/10 11:00:24 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth ...
[07/10 11:00:24 fvcore.common.checkpoint]: [Checkpointer] Loading from dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth ...
  0% 0/1 [00:00<?, ?it/s]/content/detrex/./projects/dino_eva/modeling/dino.py:530: UserWarning: square_size=1024, is smaller than max_size=1199 in batch
  warnings.warn("square_size={}, is smaller than max_size={} in batch".format(
  0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/detrex/demo/demo.py", line 141, in <module>
    predictions, visualized_output = demo.run_on_image(img, args.confidence_threshold)
  File "/content/detrex/./demo/predictors.py", line 80, in run_on_image
    predictions = self.predictor(image)
  File "/content/detrex/./demo/predictors.py", line 207, in __call__
    predictions = self.model([inputs])[0]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./projects/dino_eva/modeling/dino.py", line 198, in forward
    features = self.backbone(images.tensor)  # output feature dict
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva.py", line 583, in forward
    bottom_up_features = self.net(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 431, in forward
    x = blk(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 275, in forward
    x = self.attn(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 117, in forward
    q = self.rope(q).type_as(v)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/detrex/./detrex/modeling/backbone/eva_02_utils.py", line 349, in forward
    return  t * self.freqs_cos + rotate_half(t) * self.freqs_sin
RuntimeError: The size of tensor a (5476) must match the size of tensor b (4096) at non-singleton dimension 2

The text was updated successfully, but these errors were encountered:

dgcnz · 2024-07-10T11:56:30Z

Commenting line 532 in the snippet above silences the error but results in bounding boxes with a vertical offset:

dgcnz · 2024-07-10T12:33:28Z

Okay, it seems that there is some padding happening somewhere that messes up predictions. If I manually resize the image to have square dimensions, then everything works as expected.

dgcnz changed the title ~~dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep config throws tensor shape mismatch error~~ dino_eva_02_vitdet_*_1024_* configs throw tensor shape mismatch error Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`dino_eva_02_vitdet__1024_` configs throw tensor shape mismatch error #356

`dino_eva_02_vitdet__1024_` configs throw tensor shape mismatch error #356

dgcnz commented Jul 10, 2024 •

edited

Loading

dgcnz commented Jul 10, 2024

dgcnz commented Jul 10, 2024

dino_eva_02_vitdet_*_1024_* configs throw tensor shape mismatch error #356

dino_eva_02_vitdet_*_1024_* configs throw tensor shape mismatch error #356

Comments

dgcnz commented Jul 10, 2024 • edited Loading

Description

Log info example

dgcnz commented Jul 10, 2024

dgcnz commented Jul 10, 2024

`dino_eva_02_vitdet__1024_` configs throw tensor shape mismatch error #356

`dino_eva_02_vitdet__1024_` configs throw tensor shape mismatch error #356

dgcnz commented Jul 10, 2024 •

edited

Loading