Skip to content

verlab/DescriptorReasoning_ACCV_2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence
⭐ACCV 2024⭐

Felipe Cadar · Guilherme Potje · Renato Mastins · Cédric Demonceaux · Erickson R Nascimento


example
Leveraging semantic information for improving visual correspondence.

Installation

To set up the environment for training, run the following command to create a new conda environment. We recommend using Python 3.9:

conda create -n reason  python=3.9

Activate the environment before proceeding:

conda activate reason

Install the package:

pip install -e .

Inference

from reasoning.features.desc_reasoning import load_reasoning_from_checkpoint, Reasoning

# load the model with pre-trained weights
semantic_reasoning = load_reasoning_from_checkpoint('models/xfeat/')
# load it into the auxiliary class
reasoning_model = Reasoning(semantic_reasoning['model'])

# match two images
match_response = reasoning_model.match({
    'image0': image0, # BxCxHxW normalized to [0,1]
    'image1': image1  # BxCxHxW normalized to [0,1]
})

# get the matches
mkpts0 = match_response['matches0'] # BxNx2
mkpts1 = match_response['matches1'] # BxNx2

The example.py script shows how to automatically download and run a specific model.

The following table contains links to all the models and weights we used in our experiments.

Descriptor Pre-trained weights Size
xfeat Download 91.6 MB
superpoint Download 91.0 MB
alike Download 92.1 MB
aliked Download 91.9 MB
dedode_B Download 92.2 MB
dedode_G Download 94.1 MB
xfeat-12_layers-dino_G Download 221.0 MB
xfeat-12_layers Download 219.0 MB
xfeat-3_layers Download 57.1 MB
xfeat-7_layers Download 132 MB
xfeat-9_layers Download 167 MB
xfeat-dino-G Download 94.3 MB
xfeat-dino_B Download 92.3 MB
xfeat-dino_L Download 92.6 MB

Training

You might want to train your own model to reason about your own descriptors. You need to take some preparations:

1. Scannet Data Preparation

The processed dataset is available for download here: h5_scannet.zip

But if you want to follow the same steps we took to create it, take a look at the steps bellow.

To prepare the Scannet dataset for training, follow these steps:

  1. Download Scannet: First, download the Scannet dataset. Make sure to read and accept the terms of use.
python reasoning/scripts/scannet/01_download_scannet.py --out_dir datasets/scannet
  1. Extract Frames: Extract frames from the downloaded dataset, skipping every 15 frames.
python reasoning/scripts/scannet/02_extract_scannet.py --data_path datasets/scannet
  1. Calculate Covisibility: Calculate the covisibility between frames to identify good pairs for training.
python reasoning/scripts/scannet/03_calculate_scannet_covisibility.py --data_path datasets/scannet
  1. Convert to H5 Files: Convert the prepared data into H5 files for easier handling during training. It also helps to keep the number of files small in cluster enviroments.
python reasoning/scripts/scannet/04_build_h5.py --data_path datasets/scannet --output datasets/h5_scannet/

2. Feature Extraction

To speed up the training process, pre-extract some features from the dataset. Ours scripts read the h5 dataset and save the features to the save directory

DINOv2-S Features Extraction

Extract DINOv2-S features from the H5 dataset. You can adjust the batch size according to your system's capabilities.

python reasoning/scripts/export_dino.py --data ./datasets/h5_scannet --batch_size 4 --dino_model dinov2_vits14

For larger models, simply change the --dino_model argument to one of the following: dinov2_vitb14, dinov2_vitl14, or dinov2_vitg14.

XFeat Features Extraction

Extract XFeat features from the dataset. Adjust the batch size as needed.

python reasoning/scripts/export_xfeat.py --data ./datasets/h5_dataset --batch_size 4 --num_keypoints 2048 h5_scannet

Your dataset folder should look like this:

datasets/
├── h5_scannet/
│   ├── train/
│   ├── features/
│   │   ├── dino-scannet-dinov2_vits14/
│   │   └── xfeat-scannet-n2048/
└── scannet/
    └── scans/

For other descriptors, please check the reasoning/scripts/export_*.py scripts.

3. Training the Model

All training and experiments were conducted on a SLURM cluster with 4xV100 32GB GPUs. Adjust the batch size to match your system's capabilities.

To start training, run the following command:

python reasoning/train_multigpu_reasoning.py \
    --batch_size 16 \ 
    --data ./datasets/h5_scannet \ # dataset folder with images and features
    --plot_every 200 \ # tensorboard matching plots
    --extractor_cache 'xfeat-scannet-n2048' \ # local features
    --dino_cache 'dino-scannet-dinov2_vits14' \ # semantic features
    -C xfeat-dinov2 # comment for tracking your exps

If you want to skip all the multi-gpu shenanigans, you can simply add the --local flag.

Acknowledgements

This work was partially supported by grants from CAPES, CNPq, FAPEMIG, Google, ANER MOVIS from Conseil Régional BFC and ANR (ANR-23-CE23-0003-01), to whom we are grateful. This project was also provided with AI computing and storage resources by GENCI at IDRIS thanks to the grant 2024-AD011015289 on the supercomputer Jean Zay’s V100 partitions.

Shout out to the authors of DeDoDe for this readme header. Its quite nice.