Merge pull request NVlabs#2 from swook/master

First code commit for open-source release of FAZE
Hamid293 · Dec 9, 2019 · 72e548b · 72e548b
2 parents e2b4b08 + a2cf1a7
commit 72e548b
Show file tree

Hide file tree

Showing 24 changed files with 3,201 additions and 11 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+__pycache__/
+src/output*
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "preprocess"]
+	path = preprocess
+	url = https://github.com/swook/faze_preprocess
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,49 @@
+Nvidia Source Code License (1-Way Commercial)
+
+
+1. Definitions
+
+“Licensor” means any person or entity that distributes its Work.
+
+“Software” means the original work of authorship made available under this License.
+
+“Work” means the Software and any additions to or derivative works of the Software that are made available under this License.
+
+“Nvidia Processors” means any central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC) or any combination thereof designed, made, sold, or provided by Nvidia or its affiliates.
+
+The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
+
+Works, including the Software, are “made available” under this License by including in or with the Work either (a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License.
+
+
+2. License Grants
+
+2.1 Copyright Grant. Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
+
+2.2 Patent Grant. Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free patent license to make, have made, use, sell, offer for sale, import, and otherwise transfer its Work, in whole or in part. The foregoing license applies only to the patent claims licensable by Licensor that would be infringed by Licensor’s Work (or portion thereof) individually and excluding any combinations with any other materials or technology.
+
+
+3. Limitations
+
+3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.
+
+3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.
+
+3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially.  The Work or derivative works thereof may be used or intended for use by Nvidia or it’s affiliates commercially or non-commercially.  As used herein, “non-commercially” means for research or evaluation purposes only.
+
+3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grants in Sections 2.1 and 2.2) will terminate immediately.
+
+3.5 Trademarks. This License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this License.
+
+3.6 Termination. If you violate any term of this License, then your rights under this License (including the grants in Sections 2.1 and 2.2) will terminate immediately.
+
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. SOME STATES’ CONSUMER LAWS DO NOT ALLOW EXCLUSION OF AN IMPLIED WARRANTY, SO THIS DISCLAIMER MAY NOT APPLY TO YOU.
+
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
diff --git a/README.md b/README.md
@@ -1,18 +1,72 @@
-# Faze: Few-Shot Adaptive Gaze Estimation
+# FAZE: Few-Shot Adaptive Gaze Estimation
 
-This repository will contain the code for training, evaluation, and live demonstration of our ICCV 2019 work, which was presented as an Oral presentation in Seoul, Korea. Faze is a framework for few-shot adaptation of gaze estimation networks, consisting of equivariance learning (via the **DT-ED** or Disentangling Transforming Encoder-Decoder architecture) and meta-learning with gaze embeddings as input.
+This repository contains the code for training and evaluation of our ICCV 2019 work, which was presented as an Oral presentation. FAZE is a framework for few-shot adaptation of gaze estimation networks, consisting of equivariance learning (via the **DT-ED** or Disentangling Transforming Encoder-Decoder architecture) and meta-learning with gaze-direction embeddings as input.
+
+![The FAZE Framework](https://ait.ethz.ch/projects/2019/faze/banner.jpg)
+
+
+## Links
+* [NVIDIA Project Page](https://research.nvidia.com/publication/2019-10_Few-Shot-Adaptive-Gaze)
+* [ETH Zurich Project Page](https://ait.ethz.ch/projects/2019/faze/)
+* [arXiv Page](https://arxiv.org/abs/1905.01941)
+* [CVF Open Access PDF](http://openaccess.thecvf.com/content_ICCV_2019/papers/Park_Few-Shot_Adaptive_Gaze_Estimation_ICCV_2019_paper.pdf)
+* [ICCV 2019 Presentation](https://conftube.com/video/ByfFufRhuRc?tocitem=17)
+* [Pre-processing Code GitHub Repository](https://github.com/swook/faze_preprocess) _(also included as a submodule in this repository)_
 
-![The Faze Framework](https://ait.ethz.ch/projects/2019/faze/banner.jpg)
 
 ## Setup
-Further setup instructions will be made available soon. For now, please pre-process the *GazeCapture* and *MPIIGaze* datasets using the code-base at https://github.com/swook/faze_preprocess
 
-## Additional Resources
-* Project Page (ETH Zurich): https://ait.ethz.ch/projects/2019/faze/
-* Project Page (Nvidia): https://research.nvidia.com/publication/2019-10_Few-Shot-Adaptive-Gaze
-* arXiv Page: https://arxiv.org/abs/1905.01941
-* CVF Open Access PDF: http://openaccess.thecvf.com/content_ICCV_2019/papers/Park_Few-Shot_Adaptive_Gaze_Estimation_ICCV_2019_paper.pdf
-* Pre-processing Code: https://github.com/swook/faze_preprocess
+### 1. Datasets
+
+Pre-process the *GazeCapture* and *MPIIGaze* datasets using the code-base at https://github.com/swook/faze_preprocess which is also available as a git submodule at the relative path, `preprocess/`.
+
+If you have already cloned this `few_shot_gaze` repository without pulling the submodules, please run:
+
+    git submodule update --init --recursive
+
+After the dataset preprocessing procedures have been performed, we can move on to the next steps.
+
+### 2. Prerequisites
+
+This codebase should run on most standard Linux systems. We specifically used Ubuntu 
+
+Please install the following prerequisites manually (as well as their dependencies), by following the instructions found below:
+* PyTorch 1.3 - https://pytorch.org/get-started/locally/
+* NVIDIA Apex - https://github.com/NVIDIA/apex#quick-start 
+  * *please note that only NVIDIA Volta and newer architectures can benefit from AMP training via NVIDIA Apex.*
+
+The remaining Python package dependencies can be installed by running:
+
+    pip3 install --user --upgrade -r requirements.txt
+
+### 3. Pre-trained weights for the DT-ED architecture
+
+You can obtain a copy of the pre-trained weights for the Disentangling Transforming Encoder-Decoder from the following location.
+
+    cd src/
+    wget -N https://ait.ethz.ch/projects/2019/faze/downloads/outputs_of_full_train_test_and_plot.zip
+    unzip -o outputs_of_full_train_test_and_plot.zip
+
+### 4. Training, Meta-Learning, and Final Evaluation
+
+Run the all-in-one example bash script with:
+
+    cd src/
+    bash full_train_test_and_plot.bash
+
+The bash script should be self-explanatory and can be edited to replicate the final FAZE model evaluation procedure, given that hardware requirements are satisfied (8x GPUs, where each are Tesla V100 GPUs with 32GB of memory).
+
+The pre-trained DT-ED weights should be loaded automatically by the script `1_train_dt_ed.py`. Please note that this model can take a long time to train when training from scratch, so we recommend adjusting batch sizes and the using multiple GPUs (the code is multi-GPU-ready).
+
+The Meta-Learning step is also very time consuming, particularly because it must be run for every value of `k` or *number of calibration samples*. The code pertinent to this step is `2_meta_learning.py`, and its execution is recommended to be done in parallel as shown in `full_train_test_and_plot.bash`.
+
+### 5. Outputs
+
+When the full pipeline successfully runs, you will find some outputs in the path `src/outputs_of_full_train_test_and_plot`, in particular:
+* **walks/**: mp4 videos of latent space walks in gaze direction and head orientation
+* **Zg_OLR1e-03_IN5_ILR1e-05_Net64/**: outputs of the meta-learning step.
+* **Zg_OLR1e-03_IN5_ILR1e-05_Net64 MAML MPIIGaze.pdf**: plotted results of the few-shot learning evaluations on MPIIGaze.
+* **Zg_OLR1e-03_IN5_ILR1e-05_Net64 MAML GazeCapture (test).pdf**: plotted results of the few-shot learning evaluations on the GazeCapture test set.
 
 ## Bibtex
 Please cite our paper when referencing or using our code.
@@ -25,6 +79,6 @@ Please cite our paper when referencing or using our code.
       location  = {Seoul, Korea}
     }
 
-## Acknowledgements
 
+## Acknowledgements
 Seonwook Park carried out this work during his internship at Nvidia. This work was supported in part by the ERC Grant OPTINT (StG-2016-717054).
diff --git a/preprocess b/preprocess
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,9 @@
+apex
+h5py
+imageio
+moviepy
+numpy
+opencv_python
+torch
+torchvision
+tqdm
diff --git a/src/.flake8 b/src/.flake8
@@ -0,0 +1,7 @@
+[flake8]
+doctests = True
+enable-extensions = docstrings
+ignore = E402, W503
+max-line-length = 100
+statistics = True
+show-source = True