AF-SfMLearner

This is the official PyTorch implementation for training and testing depth estimation models using the method described in

Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy: Appearance Flow to the Rescue

Shuwei Shao, Zhongcai Pei, Weihai Chen, Wentao Zhu, Xingming Wu, Dianmin Sun and Baochang Zhang

accepted by Medical Image Analysis (arXiv pdf)

and

Self-Supervised Learning for Monocular Depth Estimation on Minimally Invasive Surgery Scenes

Shuwei Shao, Zhongcai Pei, Weihai Chen, Baochang Zhang, Xingming Wu, Dianmin Sun and David Doermann

ICRA 2021 (pdf).

Overview

✏️ 📄 Citation

If you find our work useful in your research please consider citing our paper:

@article{shao2022self,
  title={Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue},
  author={Shao, Shuwei and Pei, Zhongcai and Chen, Weihai and Zhu, Wentao and Wu, Xingming and Sun, Dianmin and Zhang, Baochang},
  journal={Medical image analysis},
  volume={77},
  pages={102338},
  year={2022},
  publisher={Elsevier}
}

@inproceedings{shao2021self,
  title={Self-Supervised Learning for Monocular Depth Estimation on Minimally Invasive Surgery Scenes},
  author={Shao, Shuwei and Pei, Zhongcai and Chen, Weihai and Zhang, Baochang and Wu, Xingming and Sun, Dianmin and Doermann, David},
  booktitle={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={7159--7165},
  year={2021},
  organization={IEEE}
}

⚙️ Setup

We ran our experiments with PyTorch 1.2.0, torchvision 0.4.0, CUDA 10.2, Python 3.7.3 and Ubuntu 18.04.

🖼️ Prediction for a single image or a folder of images

You can predict scaled disparity for a single image or a folder of images with:

CUDA_VISIBLE_DEVICES=0 python test_simple.py --model_path <your_model_path> --image_path <your_image_or_folder_path>

💾 Datasets

You can download the Endovis or SCARED dataset by signing the challenge rules and emailing them to max.allan@intusurg.com, the EndoSLAM dataset, the SERV-CT dataset, and the Hamlyn dataset.

Endovis split

The train/test/validation split for Endovis dataset used in our works is defined in the splits/endovis folder.

Endovis data preprocessing

We use the ffmpeg to convert the RGB.mp4 into images.png:

find . -name "*.mp4" -print0 | xargs -0 -I {} sh -c 'output_dir=$(dirname "$1"); ffmpeg -i "$1" "$output_dir/%10d.png"' _ {}

We only use the left frames in our experiments and please refer to extract_left_frames.py. For dataset 8 and 9, we rephrase keyframes 0-4 as keyframes 1-5.

Data structure

The directory of dataset structure is shown as follows:

/path/to/endovis_data/
  dataset1/
    keyframe1/
      image_02/
        data/
          0000000001.png

⏳ Endovis training

Stage-wise fashion:

Stage one:

CUDA_VISIBLE_DEVICES=0 python train_stage_one.py --data_path <your_data_path> --log_dir <path_to_save_model (optical flow)>

Stage two:

CUDA_VISIBLE_DEVICES=0 python train_stage_two.py --data_path <your_data_path> --log_dir <path_to_save_model (depth, pose, appearance flow, optical flow)> --load_weights_folder <path_to_the_trained_optical_flow_model_in_stage_one>

End-to-end fashion:

CUDA_VISIBLE_DEVICES=0 python train_end_to_end.py --data_path <your_data_path> --log_dir <path_to_save_model (depth, pose, appearance flow, optical flow)>

📊 Endovis evaluation

To prepare the ground truth depth maps run:

CUDA_VISIBLE_DEVICES=0 python export_gt_depth.py --data_path endovis_data --split endovis

...assuming that you have placed the endovis dataset in the default location of ./endovis_data/.

The following example command evaluates the epoch 19 weights of a model named mono_model:

CUDA_VISIBLE_DEVICES=0 python evaluate_depth.py --data_path <your_data_path> --load_weights_folder ~/mono_model/mdp/models/weights_19 --eval_mono

Appearance Flow

Depth Estimation

Visual Odometry

3D Reconstruction

📦 Model zoo

Model	Abs Rel	Sq Rel	RMSE	RMSE log	Link
Stage-wise (ID 5 in Table 8)	0.059	0.435	4.925	0.082	baidu (code:n6lh); google
End-to-end (ID 3 in Table 8)	0.059	0.470	5.062	0.083	baidu (code:z4mo); google
ICRA	0.063	0.489	5.185	0.086	baidu (code:wbm8); google

Important Note

If you use the latest PyTorch version,

Note1: please try to add 'align_corners=True' to 'F.interpolate' and 'F.grid_sample' when you train the network, to get a good camera trajectory.

Note2: please revise color_aug=transforms.ColorJitter.get_params(self.brightness,self.contrast,self.saturation,self.hue) to color_aug=transforms.ColorJitter(self.brightness,self.contrast,self.saturation,self.hue).

Contact

If you have any questions, please feel free to contact swshao@buaa.edu.cn.

Acknowledgement

Our code is based on the implementation of Monodepth2. We thank Monodepth2's authors for their excellent work and repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AF-SfMLearner

Overview

✏️ 📄 Citation

⚙️ Setup

🖼️ Prediction for a single image or a folder of images

💾 Datasets

⏳ Endovis training

📊 Endovis evaluation

Appearance Flow

Depth Estimation

Visual Odometry

3D Reconstruction

📦 Model zoo

Important Note

Contact

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

AF-SfMLearner

Overview

✏️ 📄 Citation

⚙️ Setup

🖼️ Prediction for a single image or a folder of images

💾 Datasets

⏳ Endovis training

📊 Endovis evaluation

Appearance Flow

Depth Estimation

Visual Odometry

3D Reconstruction

📦 Model zoo

Important Note

Contact

Acknowledgement