关于Loss的疑问 #51

tulvgengenr · 2024-11-12T07:03:10Z

您好！在models/modeling_showo.py文件中，关于Loss的计算代码如下：

        if labels is not None:
            # 1. Mask token prediction (discrete diffusion) for image generation
            # Note that, max_seq_length indicates the maximum number of text tokens, maybe a bit confused.
            loss_t2i = F.cross_entropy(
                logits[:batch_size_t2i, max_seq_length + 1:].contiguous().view(-1, self.output_size),
                labels[:batch_size_t2i, max_seq_length + 1:].contiguous().view(-1), ignore_index=-100,
            )

            # 2. Next token prediction for language modeling
            loss_lm = F.cross_entropy(
                logits[batch_size_t2i:batch_size_t2i + batch_size_lm, :-1].contiguous().view(-1, self.output_size),
                labels[batch_size_t2i:batch_size_t2i + batch_size_lm, 1:].contiguous().view(-1), ignore_index=-100,
            )

            # 3. Next token prediction for captioning/multimodal understanding
            loss_mmu = F.cross_entropy(
                logits[-batch_size_mmu:, :-1].contiguous().view(-1, self.output_size),
                labels[-batch_size_mmu:, 1:].contiguous().view(-1), ignore_index=-100,
            )

我有一个疑惑，在t2i任务中，使用的是logits[:batch_size_t2i, max_seq_length + 1:]和labels[:batch_size_t2i, max_seq_length + 1:]计算交叉熵loss，这貌似表示在预测image的时候，logits不再表示next token的概率，而就是当前token的概率。并没有像lm和mmu任务中，logits和labels错位1。这与传统的自回归生成不同？

The text was updated successfully, but these errors were encountered:

Sierkinhane · 2024-11-12T08:12:12Z

image generation我们是采用了discrete diffusion（or mask token prediction），具体细节可以看文章的preliminary和方法部分哈

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于Loss的疑问 #51

关于Loss的疑问 #51

tulvgengenr commented Nov 12, 2024 •

edited

Loading

Sierkinhane commented Nov 12, 2024

关于Loss的疑问 #51

关于Loss的疑问 #51

Comments

tulvgengenr commented Nov 12, 2024 • edited Loading

Sierkinhane commented Nov 12, 2024

tulvgengenr commented Nov 12, 2024 •

edited

Loading