Latent Coordinate Networks for Image and Video Memorization

We use a coordinate MLP with a learned latent code attached to the input in order to learn multiple images and videos with a single network. We use selected videos from the WAIC-TSR dataset, showing only our video results below:

Using a Coordinate MLP to Remember Multiple Videos

We use the same positional encoding scheme as the 2d image MLP, $\gamma(\mathbf{x}) = [\sin(2\pi\mathbf{Bx}), \cos(2\pi\mathbf{Bx})]^T$, only changing the smaller dimension of the $\mathbf{B}$ matrix from 2 to 3. We vary the hyperparameter $\sigma$, the standard deviation of elements in the positional encoding matrix $\mathbf{B}$. Notice the blurriness of the video generated without positional encoding, as well as the "static" texture of the $\sigma = 100$ positional encoding.

🔴 Note: if you are on mobile it may be helpful to zoom in on the videos. 🔴

Results After Learning 2 Videos

Ground Truth	No Pos. Enc.	$\sigma = 1$	$\sigma = 10$	$\sigma = 100$

Results After Learning 4 Videos

Ground Truth	No Pos. Enc.	$\sigma = 1$	$\sigma = 10$	$\sigma = 100$

Interpolation Between Latent Codes After Learning 2 Videos

Similar to our experiments for images, we also interpolate between latent codes and show the results below:

	0.0	0.25	0.5	0.75	1.0
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$

Interpolation Between Latent Codes After Learning 4 Videos

Curiously, we notice that the interpolations after learning 4 videos are higher quality, having more faithful color and shape reconstruction than the above.

	0.0	0.25	0.5	0.75	1.0
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$

Results After Learning a Single Video

To compare against the multi-video scenario, we also train MLPs which learn only a single video at a time.

Ground Truth	No Pos. Enc.	$\sigma = 1$	$\sigma = 10$	$\sigma = 100$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Latent Coordinate Networks for Image and Video Memorization

Using a Coordinate MLP to Remember Multiple Videos

Results After Learning 2 Videos

Results After Learning 4 Videos

Interpolation Between Latent Codes After Learning 2 Videos

Interpolation Between Latent Codes After Learning 4 Videos

Results After Learning a Single Video

Files

README.md

Latest commit

History

README.md

File metadata and controls

Latent Coordinate Networks for Image and Video Memorization

Using a Coordinate MLP to Remember Multiple Videos

Results After Learning 2 Videos

Results After Learning 4 Videos

Interpolation Between Latent Codes After Learning 2 Videos

Interpolation Between Latent Codes After Learning 4 Videos

Results After Learning a Single Video