The aim of this fork is to use FILM model on Unity for frame interpolation. It is based on asus4/tf-lite-unity-sample and uses TensorFlow Lite to perform the operations. Make sure to check the compatibility of this library with respect to your target device. It is the following as of writing this text:
iOS | Android | macOS | Ubuntu | Windows | |
---|---|---|---|---|---|
Core CPU | ✅ | ✅ | ✅ | ✅ | ✅ |
Metal Delegate | ✅ | - | ✅ | - | - |
GPU Delegate | - | ✅ | - | ✅ Experimental | - |
NNAPI Delegate | - | ✅ | - | - | - |
The pretrained model given in the project is converted to the .tflite
format. You may use it for other projects and purposes too. Unfortunately, I was not able to make it GPU delegate compatible.
You may perform the conversion operation yourself.
--
Here are the shapes required for inference:
// IT: Input Tensor
// OT: Output Tensor
(1, 1)[[0.5]] -> IT[0] // Time parameter, it is always inferred a frame at t = 0.5 (interpolator.py#L102)
(1, WIDTH, HEIGHT, 3) -> IT[1] // Input image 1
(1, WIDTH, HEIGHT, 3) -> IT[2] // Input image 2
.
.
.
OT[1] -> (1, WIDTH, HEIGHT, 3) // Output image, retrieved from index 1 of OT
For the Unity project, download and place the .tflite
file in the StreamingAssets folder.
PS: I also tried converting to .onnx
format using tf2onnx using multiple opsets to make it compatible with Barracuda, however, I could not get it to work due to some uncompatible operations, such as this one. Feel free to create an issue/PR if you manage to make it work.
The official Tensorflow 2 implementation of our high quality frame interpolation neural network. We present a unified single-network approach that doesn't use additional pre-trained networks, like optical flow or depth, and yet achieve state-of-the-art results. We use a multi-scale feature extractor that shares the same convolution weights across the scales. Our model is trainable from frame triplets alone.
FILM: Frame Interpolation for Large Motion
Fitsum Reda1, Janne Kontkanen1, Eric Tabellion1, Deqing Sun1, Caroline Pantofaru1, Brian Curless1,2
1Google Research, 2University of Washington
In ECCV 2022.
FILM transforms near-duplicate photos into a slow motion footage that look like it is shot with a video camera.
Integrated into Hugging Face Spaces 🤗 using Gradio. Try out the Web Demo:
Try the interpolation model with the replicate web demo at
Try FILM to interpolate between two or more images with the PyTTI-Tools at
An alternative Colab for running FILM on arbitrarily more input images, not just on two images,
- Nov 28, 2022: Upgrade
eval.interpolator_cli
for high resolution frame interpolation.--block_height
and--block_width
determine the total number of patches (block_height*block_width
) to subdivide the input images. By default, both arguments are set to 1, and so no subdivision will be done. - Mar 12, 2022: Support for Windows, see WINDOWS_INSTALLATION.md.
- Mar 09, 2022: Support for high resolution frame interpolation. Set
--block_height
and--block_width
ineval.interpolator_test
to extract patches from the inputs, and reconstruct the interpolated frame from the iteratively interpolated patches.
- Get Frame Interpolation source codes
git clone https://github.com/google-research/frame-interpolation
cd frame-interpolation
- Optionally, pull the recommended Docker base image
docker pull gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest
-
If you do not use Docker, set up your NVIDIA GPU environment with:
-
Install frame interpolation dependencies
pip3 install -r requirements.txt
sudo apt-get install -y ffmpeg
See WINDOWS_INSTALLATION for Windows Support
- Create a directory where you can keep large files. Ideally, not in this directory.
mkdir -p <pretrained_models>
- Download pre-trained TF2 Saved Models from
google drive
and put into
<pretrained_models>
.
The downloaded folder should have the following structure:
<pretrained_models>/
├── film_net/
│ ├── L1/
│ ├── Style/
│ ├── VGG/
├── vgg/
│ ├── imagenet-vgg-verydeep-19.mat
The following instructions run the interpolator on the photos provided in 'frame-interpolation/photos'.
To generate an intermediate photo from the input near-duplicate photos, simply run:
python3 -m eval.interpolator_test \
--frame1 photos/one.png \
--frame2 photos/two.png \
--model_path <pretrained_models>/film_net/Style/saved_model \
--output_frame photos/output_middle.png
This will produce the sub-frame at t=0.5
and save as 'photos/output_middle.png'.
It takes in a set of directories identified by a glob (--pattern). Each directory
is expected to contain at least two input frames, with each contiguous frame
pair treated as an input to generate in-between frames. Frames should be named such that when sorted (naturally) with natsort
, their desired order is unchanged.
python3 -m eval.interpolator_cli \
--pattern "photos" \
--model_path <pretrained_models>/film_net/Style/saved_model \
--times_to_interpolate 6 \
--output_video
You will find the interpolated frames (including the input frames) in 'photos/interpolated_frames/', and the interpolated video at 'photos/interpolated.mp4'.
The number of frames is determined by --times_to_interpolate
, which controls
the number of times the frame interpolator is invoked. When the number of frames
in a directory is num_frames
, the number of output frames will be
(2^times_to_interpolate+1)*(num_frames-1)
.
We use Vimeo-90K as our main training dataset. For quantitative evaluations, we rely on commonly used benchmark datasets, specifically:
The training and benchmark evaluation scripts expect the frame triplets in the
TFRecord storage format.
We have included scripts that encode the relevant frame triplets into a
tf.train.Example
data format, and export to a TFRecord file.
You can use the commands python3 -m datasets.create_<dataset_name>_tfrecord --help
for more information.
For example, run the command below to create a TFRecord for the Middlebury-other
dataset. Download the images and point --input_dir
to the unzipped folder path.
python3 -m datasets.create_middlebury_tfrecord \
--input_dir=<root folder of middlebury-other> \
--output_tfrecord_filepath=<output tfrecord filepath> \
--num_shards=3
The above command will output a TFRecord file with 3 shards as <output tfrecord filepath>@3
.
Below are our training gin configuration files for the different loss function:
training/
├── config/
│ ├── film_net-L1.gin
│ ├── film_net-VGG.gin
│ ├── film_net-Style.gin
To launch a training, simply pass the configuration filepath to the desired
experiment.
By default, it uses all visible GPUs for training. To debug or train
on a CPU, append --mode cpu
.
python3 -m training.train \
--gin_config training/config/<config filename>.gin \
--base_folder <base folder for all training runs> \
--label <descriptive label for the run>
- When training finishes, the folder structure will look like this:
<base_folder>/
├── <label>/
│ ├── config.gin
│ ├── eval/
│ ├── train/
│ ├── saved_model/
Optionally, to build a SavedModel format from a trained checkpoints folder, you can use this command:
python3 -m training.build_saved_model_cli \
--base_folder <base folder of training sessions> \
--label <the name of the run>
- By default, a SavedModel is created when the training loop ends, and it will be saved at
<base_folder>/<label>/saved_model
.
Below, we provided the evaluation gin configuration files for the benchmarks we have considered:
eval/
├── config/
│ ├── middlebury.gin
│ ├── ucf101.gin
│ ├── vimeo_90K.gin
│ ├── xiph_2K.gin
│ ├── xiph_4K.gin
To run an evaluation, simply pass the configuration file of the desired evaluation dataset.
If a GPU is visible, it runs on it.
python3 -m eval.eval_cli \
--gin_config eval/config/<eval_dataset>.gin \
--model_path <pretrained_models>/film_net/L1/saved_model
The above command will produce the PSNR and SSIM scores presented in the paper.
If you find this implementation useful in your works, please acknowledge it appropriately by citing:
@inproceedings{reda2022film,
title = {FILM: Frame Interpolation for Large Motion},
author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
@misc{film-tf,
title = {Tensorflow 2 Implementation of "FILM: Frame Interpolation for Large Motion"},
author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/google-research/frame-interpolation}}
}
We would like to thank Richard Tucker, Jason Lai and David Minnen. We would also like to thank Jamie Aspinall for the imagery included in this repository.
- 2 spaces for indentation
- 80 character line length
- PEP8 formatting
This is not an officially supported Google product.