Is a completely flattened latent token in THW the optimal solution? #19

LinB203 · 2024-03-03T10:42:38Z

LinB203
Mar 3, 2024
Maintainer

We are currently exploring relevant aspects in this regard. On one hand, a completely flattened latent token in THW requires a significant amount of memory to allow for the operation of attention blocks. On the other hand, we have observed that its convergence is not particularly fast.

Additionally, as a side note, OpenAI's Sora does not explicitly state that their transformer has visibility of every spatiotemporal token, which would be computationally expensive for generating 2k videos. Is it possible that only the encoder is spatiotemporal, while diffusion is 2+1D?

ttttt12345 · 2024-03-06T05:51:12Z

ttttt12345
Mar 6, 2024

orphan

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is a completely flattened latent token in THW the optimal solution? #19

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Is a completely flattened latent token in THW the optimal solution? #19

LinB203 Mar 3, 2024 Maintainer

Replies: 1 comment

ttttt12345 Mar 6, 2024

LinB203
Mar 3, 2024
Maintainer

ttttt12345
Mar 6, 2024