You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Latent Coordinate Networks for Image and Video Memorization
We use a coordinate MLP with a learned latent code attached to the input in order to learn multiple images and videos with a single network. We use selected videos from the WAIC-TSR dataset, showing only our video results below:
Using a Coordinate MLP to Remember Multiple Videos
We use the same positional encoding scheme as the 2d image MLP, $\gamma(\mathbf{x}) = [\sin(2\pi\mathbf{Bx}), \cos(2\pi\mathbf{Bx})]^T$, only changing the smaller dimension of the $\mathbf{B}$ matrix from 2 to 3. We vary the hyperparameter $\sigma$, the standard deviation of elements in the positional encoding matrix $\mathbf{B}$. Notice the blurriness of the video generated without positional encoding, as well as the "static" texture of the $\sigma = 100$ positional encoding.
🔴 Note: if you are on mobile it may be helpful to zoom in on the videos. 🔴
Results After Learning 2 Videos
Ground Truth
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$
Results After Learning 4 Videos
Ground Truth
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$
Interpolation Between Latent Codes After Learning 2 Videos
Similar to our experiments for images, we also interpolate between latent codes and show the results below:
0.0
0.25
0.5
0.75
1.0
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$
Interpolation Between Latent Codes After Learning 4 Videos
Curiously, we notice that the interpolations after learning 4 videos are higher quality, having more faithful color and shape reconstruction than the above.
0.0
0.25
0.5
0.75
1.0
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$
Results After Learning a Single Video
To compare against the multi-video scenario, we also train MLPs which learn only a single video at a time.