Official repository for DeepSim-Nets: Deep Similarity Networks for Stereo Image Matching paper 📄 accepted for CVPR 2023 EarthVision Workshop.
The paper code is divided into two parts:
- Training and Testing of the classifiers performance: The code in this repo should do it !
- Inference: The code is integrated under MicMac and is written in C++ (including Torch C++)
Overall training pipeline
Epipolar Our MS-AFF PSMNet Normalized Cross CorrelationWe propose to learn dense similarities by training three multi-scale learning architectures on wider images tiles. To enable robust self-supervised contrastive learning, a sample mining is developed. Our main idea lies in relying on wider suppport regions to leverage pixel-level similarity-aware embeddings. Then, the whole set of pixel embeddings of a reference image are matched to their corresponding ones at once. Our approach alleviates the block matching distinctiveness shotcomings by exploiting the image wider context. We therefore leverage quite distinctive similarity measures that outcome standard hand-crafted correlation (NCC) and deep learning patch based approaches (MC-CNN). Compared to end-to-end methods, our DeepSim-Nets are highly versatile and readily suited for standard mutli resolution and large scale stereo matching pipelines.
We additionally propose a lightweight architecture baptized MS-AFF where inputs are 4 multi-scale or multi-resolution tiles as highlighted below. The generated multi-scale features are iteratively fused based on an adpated attention mechanism from Attentional Feature Fusion. Here is the architecture together with the multi-scale attention module.
DeepSim-Nets are trained on Aerial data from Dublin dataset on 4 GPUs. The following summarizes the training environment:
- Ubuntu 18.04.6 LTS/CentOS Linux release 7.9.2009
- Python 3.9.12
- PyTorch 1.11.0
- pytorch_lightning 1.6.3
- CUDA 10.2, 11.2 and 11.4
- NVIDIA V100 32G/ NVIDIA A100 40G
- 64G RAM
To train DeepSim-Nets in general, datasets should include the following elementary batch compositin:
- Left image tile
- Right image tile
- Ground truth densified diparity map
- Occlusion mask
- Definition mask: Sometimes, disparity map contain NaN data where no information is provided, this should be considered to define the ROI of interest.
The following is an example of what should the aformentioned image tiles look like:
The project follows the structure described below:
├─ configs # Configuration files for training
├─ datasets # Datasets classes
├─ models # Models' architectures
├─ trained_models # Holds some models' checkpoints
├─ utils # Scripts for logging
└─ Trainer.py # Main script for traininig DeepSim-Nets
└─ Tester.py # Main script for testing DeepSim-Nets classification accuracy (Joint probabilities, AUC, etc)
To train DeepSim-Nets, this command can be run :
python3 Trainer.py -h
usage: Trainer.py [-h] --config_file CONFIG_FILE --model MODEL --checkpoint CHECKPOINT
optional arguments:
-h, --help show this help message and exit
--config_file CONFIG_FILE, -cfg CONFIG_FILE
Path to the yaml config file
--model MODEL, -mdl MODEL
Model name to train, possible names are: 'MS-AFF', 'U-Net32', 'U-Net_Attention'
--checkpoint CHECKPOINT, -ckpt CHECKPOINT
Model checkpoint to load
To evaluate our classifiers performance, we estimate joint distributions of matching and non-similarity random variables on test data. These metrics are obtained by running the testing python script.
python3 Tester.py -h
usage: Tester.py [-h] --model MODEL --checkpoint CHECKPOINT --output_folder OUTPUT_FOLDER
optional arguments:
-h, --help show this help message and exit
--model MODEL, -mdl MODEL
Model name to train, possible names are: 'MS-AFF', 'U-Net32', 'U-Net_Attention'
--checkpoint CHECKPOINT, -ckpt CHECKPOINT
Model checkpoint to load
--output_folder OUTPUT_FOLDER, -o OUTPUT_FOLDER
output folder to store results
After training, models are scripted and arranged so that similarities could be computed by:
- normalized dot product between embeddings : This relies on feature extractor ouptput feature maps.
- learned similarity function from the MLP decision network (feature extractor+ MLP).
Model name | Dataset | Joint_Probability(JP) | 💾 | 👇 |
---|---|---|---|---|
MS-AFF feature | Dublin/Vaihingen/Enschede | -- | 4 M | link |
MS-AFF decision (MLP) | Dublin/Vaihingen/Enschede | 89.6 | 1,4 M | link |
Unet32 | Dublin/Vaihingen/Enschede | -- | 31,4 M | link |
Unet32 decision (MLP) | Dublin/Vaihingen/Enschede | 88.6 | 1,4 M | link |
Unet Attention | Dublin/Vaihingen/Enschede | -- | 38,1 M | link |
Unet Attention decision (MLP) | Dublin/Vaihingen/Enschede | 88.9 | 1,4 M | link |
Inference requires an SGM implementation for cost volume regularization. Our similarty models are scripted (*.pt files) and fed to our C++ implementation under the photogrammetry software. The main C++ production code is located at MMVII/src/LearningMatching. Our approach is embedded into the MicMac multi-resolution image matching pipeline and can be parametrized using a MicMac compliant xml file. The figure below illustrates
To reproduce the obtained results, we provide an epipolar pair consisting of high resolution aerial images (GSD=6cm). To run our code, we recommand to run the following script:
Pull the deepsim-Nets docker image :
docker pull dali1210/micmac_deepsimnets:latest
We provide MICMAC .xml configuration files that should be edited according to models locations. More specifically, the tag FileModeleParams should contain the path to both the feature extractor and MLP scripted models (*.pt).
<EtapeMEC>
<DeZoom> 4 </DeZoom> <!-- DeepSim-Nets run @ zoom 4-->
<CorrelAdHoc>
<SzBlocAH> 40000000 </SzBlocAH>
<TypeCAH>
<ScoreLearnedMMVII>
<FileModeleCost> MVCNNCorrel</FileModeleCost>
<FileModeleParams>./MODEL_AERIAL_MSNET_DECISION/.*.pt</FileModeleParams>
<FileModeleArch>UnetMLPMatcher</FileModeleArch>
</ScoreLearnedMMVII>
</TypeCAH>
</CorrelAdHoc>
</EtapeMEC>
#1. Download scripted models
#2. Gather each model feature extractor and decision (MLP) under the same folder
#3. Update models path explained above <FileModeleParams> bla bla bla </FileModeleParams>
#4. run docker image
docker run --gpus all --network=host --privileged --shm-size 25G -v path_to_images_folder:/process -it dali1210/micmac_deepsimnets:latest
#5. go to images folder
cd /process
#6. run micmac with appropriate xml file (examples are in
mm3d MICMAC XML_CONFIGURATION_FILE.xml +Im1=Epip1.tif +Im2=Epip2.tif +DirMEC=TEST_DEEPSIM_NETS +ZReg=0.002 +IncPix=100
#7. The ouput disparity maps follow the MicMac naming conventions
please contact us @ [email protected] or [email protected]