Skip to content

Deep Similarity learning for Stereo Image Matching

Notifications You must be signed in to change notification settings

DaliCHEBBI/DeepSimNets

Repository files navigation

DeepSimNets

Official repository for DeepSim-Nets: Deep Similarity Networks for Stereo Image Matching paper 📄 accepted for CVPR 2023 EarthVision Workshop.

The paper code is divided into two parts:

  • Training and Testing of the classifiers performance: The code in this repo should do it !
  • Inference: The code is integrated under MicMac and is written in C++ (including Torch C++)

Overall training pipeline

                                    Epipolar                               Our MS-AFF                                     PSMNet                           Normalized Cross Correlation   

We propose to learn dense similarities by training three multi-scale learning architectures on wider images tiles. To enable robust self-supervised contrastive learning, a sample mining is developed. Our main idea lies in relying on wider suppport regions to leverage pixel-level similarity-aware embeddings. Then, the whole set of pixel embeddings of a reference image are matched to their corresponding ones at once. Our approach alleviates the block matching distinctiveness shotcomings by exploiting the image wider context. We therefore leverage quite distinctive similarity measures that outcome standard hand-crafted correlation (NCC) and deep learning patch based approaches (MC-CNN). Compared to end-to-end methods, our DeepSim-Nets are highly versatile and readily suited for standard mutli resolution and large scale stereo matching pipelines.

Multi-Scale Attentional Feature Fusion (MS-AFF)

We additionally propose a lightweight architecture baptized MS-AFF where inputs are 4 multi-scale or multi-resolution tiles as highlighted below. The generated multi-scale features are iteratively fused based on an adpated attention mechanism from Attentional Feature Fusion. Here is the architecture together with the multi-scale attention module.

Training

DeepSim-Nets are trained on Aerial data from Dublin dataset on 4 GPUs. The following summarizes the training environment:

  • Ubuntu 18.04.6 LTS/CentOS Linux release 7.9.2009
  • Python 3.9.12
  • PyTorch 1.11.0
  • pytorch_lightning 1.6.3
  • CUDA 10.2, 11.2 and 11.4
  • NVIDIA V100 32G/ NVIDIA A100 40G
  • 64G RAM

Dataset structure:

To train DeepSim-Nets in general, datasets should include the following elementary batch compositin:

  • Left image tile
  • Right image tile
  • Ground truth densified diparity map
  • Occlusion mask
  • Definition mask: Sometimes, disparity map contain NaN data where no information is provided, this should be considered to define the ROI of interest.

The following is an example of what should the aformentioned image tiles look like:

The project follows the structure described below:

├─ configs                 # Configuration files for training 
├─ datasets                # Datasets classes 
├─ models                  # Models' architectures 
├─ trained_models          # Holds some models' checkpoints 
├─ utils                   # Scripts for logging 
└─ Trainer.py              # Main script for traininig DeepSim-Nets 
└─ Tester.py               # Main script for testing DeepSim-Nets classification accuracy (Joint probabilities, AUC, etc)

To train DeepSim-Nets, this command can be run :

python3 Trainer.py -h
usage: Trainer.py [-h] --config_file CONFIG_FILE --model MODEL --checkpoint CHECKPOINT

optional arguments:
  -h, --help            show this help message and exit
  --config_file CONFIG_FILE, -cfg CONFIG_FILE
                        Path to the yaml config file
  --model MODEL, -mdl MODEL
                        Model name to train, possible names are: 'MS-AFF', 'U-Net32', 'U-Net_Attention'
  --checkpoint CHECKPOINT, -ckpt CHECKPOINT
                        Model checkpoint to load

Evaluation

To evaluate our classifiers performance, we estimate joint distributions of matching and non-similarity random variables on test data. These metrics are obtained by running the testing python script.

python3 Tester.py -h
usage: Tester.py [-h] --model MODEL --checkpoint CHECKPOINT --output_folder OUTPUT_FOLDER

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL, -mdl MODEL
                        Model name to train, possible names are: 'MS-AFF', 'U-Net32', 'U-Net_Attention'
  --checkpoint CHECKPOINT, -ckpt CHECKPOINT
                        Model checkpoint to load
  --output_folder OUTPUT_FOLDER, -o OUTPUT_FOLDER
                        output folder to store results

Inference

Models

After training, models are scripted and arranged so that similarities could be computed by:

  • normalized dot product between embeddings : This relies on feature extractor ouptput feature maps.
  • learned similarity function from the MLP decision network (feature extractor+ MLP).
Model name Dataset Joint_Probability(JP) 💾 👇
MS-AFF feature Dublin/Vaihingen/Enschede -- 4 M link
MS-AFF decision (MLP) Dublin/Vaihingen/Enschede 89.6 1,4 M link
Unet32 Dublin/Vaihingen/Enschede -- 31,4 M link
Unet32 decision (MLP) Dublin/Vaihingen/Enschede 88.6 1,4 M link
Unet Attention Dublin/Vaihingen/Enschede -- 38,1 M link
Unet Attention decision (MLP) Dublin/Vaihingen/Enschede 88.9 1,4 M link

Inference requires an SGM implementation for cost volume regularization. Our similarty models are scripted (*.pt files) and fed to our C++ implementation under the MicMac photogrammetry software. The main C++ production code is located at MMVII/src/LearningMatching. Our approach is embedded into the MicMac multi-resolution image matching pipeline and can be parametrized using a MicMac compliant xml file. The figure below illustrates image

To reproduce the obtained results, we provide an epipolar pair consisting of high resolution aerial images (GSD=6cm). To run our code, we recommand to run the following script:

Docker Image

Pull the deepsim-Nets docker image :

 docker pull dali1210/micmac_deepsimnets:latest

Path to models Feature extractor + MLP

We provide MICMAC .xml configuration files that should be edited according to models locations. More specifically, the tag FileModeleParams should contain the path to both the feature extractor and MLP scripted models (*.pt).

 
 <EtapeMEC>
    <DeZoom> 4 </DeZoom> <!-- DeepSim-Nets run @ zoom 4-->
    <CorrelAdHoc>
        <SzBlocAH> 40000000 </SzBlocAH>
        <TypeCAH>
            <ScoreLearnedMMVII>
                <FileModeleCost> MVCNNCorrel</FileModeleCost>
                <FileModeleParams>./MODEL_AERIAL_MSNET_DECISION/.*.pt</FileModeleParams>
                <FileModeleArch>UnetMLPMatcher</FileModeleArch>
            </ScoreLearnedMMVII>
        </TypeCAH>
    </CorrelAdHoc>
</EtapeMEC>

 

Steps to run dense matching with DeepSim-Nets

#1. Download scripted models 
#2. Gather each model feature extractor and decision (MLP) under the same folder 
#3. Update models path explained above <FileModeleParams> bla bla bla </FileModeleParams>
#4. run docker image
docker run --gpus all --network=host --privileged --shm-size 25G -v path_to_images_folder:/process -it dali1210/micmac_deepsimnets:latest
#5. go to images folder 
cd /process
#6. run micmac with appropriate xml file (examples are in 
mm3d MICMAC XML_CONFIGURATION_FILE.xml +Im1=Epip1.tif +Im2=Epip2.tif +DirMEC=TEST_DEEPSIM_NETS +ZReg=0.002 +IncPix=100
#7. The ouput disparity maps follow the MicMac naming conventions

Contact information

please contact us @ [email protected] or [email protected]

About

Deep Similarity learning for Stereo Image Matching

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages