Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the data of the VIST Dataset #13

Open
SaulZhang opened this issue Mar 6, 2023 · 1 comment
Open

Regarding the data of the VIST Dataset #13

SaulZhang opened this issue Mar 6, 2023 · 1 comment

Comments

@SaulZhang
Copy link

Hi @xichenpan
When I tried to reproduce the experiment on the VIST dataset, I noticed that there are numerous duplicate story images in the testing set as illustrated in the figure below, although their text descriptions differ. Is this because some image URLs were inaccessible during the download process? I utilized the vist_img_download.py script to download a total of 184011 images, but I am unsure if some images may have been missing. Would it be possible for you to share the vist.h5 file through Google Drive?
QQ截图20230306160245

@xichenpan
Copy link
Owner

Hi @SaulZhang , sorry for the delayed reply, I was busy with ICCV last week. The VIST dataset does contain duplicate images. It is due to for a same visual stories, there are multiple captions, so your downloaded images are actually correct.
Unfortunately, I do not have a vist.h5 file, because we use nas table in Alibaba for data storage, and these h5 scripts are only written for users to accelerate IO. Hope this can help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants