Regarding the issue of training templates in Qwen2VLDataCollator #57

Asunatan · 2024-11-28T08:06:21Z

Hello, I am a beginner in the field of VLM and have a question regarding the training template issue. In the Qwen2VLDataCollator you provided, I noticed there are some additional fields.

This differs from directly applying
apply_chat_template_text= self.processor.apply_chat_template(cur_text, tokenize=False, add_generation_prompt=True,)which seems to result in some differences. Could this lead to discrepancies during prediction?
Below is the result obtained directly from applying apply_chat_template:

The gpt_response obtained from apply_chat_template appears to lack the fields such as [ {"type":"text", "text":. I found that the source of the issue seems to be:

I am curious whether the differences between these two could lead to training biases.

The text was updated successfully, but these errors were encountered:

Elenore1997 · 2024-12-03T09:24:13Z

I found the same problem as you mention. So I directly edit the code:
cur_text.append({
"role": "assistant",
# "content": [
# {"type": "text", "text": text},
# ]
"content": text
})

Asunatan · 2024-12-03T09:29:29Z

I found the same problem as you mention. So I directly edit the code: cur_text.append({ "role": "assistant", # "content": [ # {"type": "text", "text": text}, # ] "content": text })

Yes, I have adopted the same strategy as you, but is it correct to do so?

Elenore1997 · 2024-12-03T09:44:35Z

I think it is ok to edit like this, I have trained model using this repository (edit this few lines of code) and inference with qwen2vl official code (using apply_chat_template) and get the correct result.

Asunatan · 2024-12-03T09:50:46Z

I think it is ok to edit like this, I have trained model using this repository (edit this few lines of code) and inference with qwen2vl official code (using apply_chat_template) and get the correct result.

Thank you, this is very helpful to me.

zjysteven · 2024-12-03T17:51:46Z

@Asunatan @Elenore1997 Yes I think the proposed fix looks good. Sorry for not being able to respond earlier; I was moving the past few days. Tagging @linyueqian to be aware of this.

linyueqian · 2024-12-03T20:02:14Z

Yes, as mentioned in #56, we should directly use the text as the content value. Just updated in the latest commit.

zjysteven closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the issue of training templates in Qwen2VLDataCollator #57

Regarding the issue of training templates in Qwen2VLDataCollator #57

Asunatan commented Nov 28, 2024

Elenore1997 commented Dec 3, 2024

Asunatan commented Dec 3, 2024

Elenore1997 commented Dec 3, 2024

Asunatan commented Dec 3, 2024

zjysteven commented Dec 3, 2024

linyueqian commented Dec 3, 2024

Regarding the issue of training templates in Qwen2VLDataCollator #57

Regarding the issue of training templates in Qwen2VLDataCollator #57

Comments

Asunatan commented Nov 28, 2024

Elenore1997 commented Dec 3, 2024

Asunatan commented Dec 3, 2024

Elenore1997 commented Dec 3, 2024

Asunatan commented Dec 3, 2024

zjysteven commented Dec 3, 2024

linyueqian commented Dec 3, 2024