Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Caption Model #195

Open
zhiyuanyou opened this issue Oct 15, 2024 · 1 comment
Open

Question about Caption Model #195

zhiyuanyou opened this issue Oct 15, 2024 · 1 comment

Comments

@zhiyuanyou
Copy link

Hello,

Thanks for you great work! I am trying to caption some videos using your caption model, THUDM/cogvlm2-llama3-caption. However, I meet the following warnings:

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

I can revise the max_position_embeddings parameter in THUDM/cogvlm2-llama3-caption/config.json to change the predefined maximum length.

However, I am not sure whether directly changing the max_position_embeddings parameter will degrade the performance.

Thanks for your time.

@zhiyuanyou
Copy link
Author

Thanks for you great work again! I am also curious about three questions.

  1. How many frames for one video should I input to get the best performance?
  2. Considering the name max_position_embeddings, are the position_embeddings sine embeddings or learned embeddings?
  3. If I change the max_position_embeddings from 2048 to 4096, is there any interpolation operation to obtain 4096 embeddings from predefined 2048 embeddings?

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant