Question about Caption Model #195

zhiyuanyou · 2024-10-15T04:35:19Z

Hello,

Thanks for you great work! I am trying to caption some videos using your caption model, THUDM/cogvlm2-llama3-caption. However, I meet the following warnings:

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

I can revise the max_position_embeddings parameter in THUDM/cogvlm2-llama3-caption/config.json to change the predefined maximum length.

However, I am not sure whether directly changing the max_position_embeddings parameter will degrade the performance.

Thanks for your time.

The text was updated successfully, but these errors were encountered:

zhiyuanyou · 2024-10-15T04:44:04Z

Thanks for you great work again! I am also curious about three questions.

How many frames for one video should I input to get the best performance?
Considering the name max_position_embeddings, are the position_embeddings sine embeddings or learned embeddings?
If I change the max_position_embeddings from 2048 to 4096, is there any interpolation operation to obtain 4096 embeddings from predefined 2048 embeddings?

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Caption Model #195

Question about Caption Model #195

zhiyuanyou commented Oct 15, 2024

zhiyuanyou commented Oct 15, 2024

Question about Caption Model #195

Question about Caption Model #195

Comments

zhiyuanyou commented Oct 15, 2024

zhiyuanyou commented Oct 15, 2024