Image Feature Map #7

yixiao1 · 2022-09-16T13:04:16Z

Hi,

Congrat to this impressive work.

I have a question relating to the image feature map. In the paper, you mentioned that you used ResNet34 pretrained on ImageNet as the image encoder. Could you please provide more details about the layers that you have used? Did you remove the last global average pooling and FC layers of the ResNet backbone?

I assume that you finally encoded each input image to only one feature maps F, since later you calculated an attention map and mapped it back to this feature map F for each step. If so, you should have added some decoder layers after the last Conv. layers block of the ResNet-34, right? Please correct me if I understand it wrongly. Thanks!

Best wishes

jiaxiaosong1002 · 2022-10-11T07:03:06Z

Hi yixiao1,

Since the code have been released, you could find out that we use both the 2D feature map and flattened one of ResNet34. No additional layers are used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Feature Map #7

Image Feature Map #7

yixiao1 commented Sep 16, 2022

jiaxiaosong1002 commented Oct 11, 2022

Image Feature Map #7

Image Feature Map #7

Comments

yixiao1 commented Sep 16, 2022

jiaxiaosong1002 commented Oct 11, 2022