-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chinese Text2video retrieval support? #201
Comments
You could use https://huggingface.co/OpenGVLab/InternVideo2-CLIP-1B-224p-f8, it supports Chinese text search! |
Thanks for your reply. I use 'InternVideo2-stage2_1b-224p-f4.pt'+'1B_clip.pth' as the vision encoder, 'chinese_alpaca_lora_7b' as tokenizer, 'internvl_c_13b_224px.pth' as the text encoder, but I got the error:
I found the vision feature is Lineard to 512d, but the text feature is 768d and cannot be Lineard to 512d, so how to multiply this two mat? I did something wrong? |
Thank you for contributing such outstanding work, I would like to ask InternVideo2 support Chinese text search video? What model do I need to replace the VisionEncoder and TextEncoder with? Or how to modify our finetune? Thank you very much
———————————————————————————————————————————————————————
您好,感谢贡献如此杰出的工作,我想请问InternVideo2支持中文文字检索视频吗?我需要把VisionEncoder和TextEncoder换成什么模型呢?或者需要怎么修改我们finetune吗?非常感谢
The text was updated successfully, but these errors were encountered: