Skip to content

reshalfahsi/instance-segmentation-vit-maskrcnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Instance Segmentation Using ViT-based Mask R-CNN

colab
qualitative-3

Instance segmentation aims at dichotomizing a pixel acting as a sub-object of a unique entity in the scene. One of the approaches, which combines object detection and semantic segmentation, is Mask R-CNN. Furthermore, we can also incorporate ViT as the backbone of Mask R-CNN. In this project, the pre-trained ViT-based Mask R-CNN model is fine-tuned and evaluated on the dataset from the Penn-Fudan Database for Pedestrian Detection and Segmentation. With a ratio of 80:10:10, the train, validation, and test sets are distributed.

Experiment

Leap into this link that harbors a Jupyter Notebook of the entire experiment.

Result

Quantitative Result

The following table delivers the performance results of ViT-based Mask R-CNN, quantitatively.

Test Metric Score
mAPbox@0.5:0.95 96.85%
mAPmask@0.5:0.95 79.58%

Loss Curve

loss_curve
Loss curves of ViT-based Mask R-CNN on the Penn-Fudan Database for Pedestrian Detection and Segmentation train and validation sets.

Qualitative Result

Below, the qualitative results are presented.

qualitative-1qualitative-2qualitative-3qualitative-4qualitative-5qualitative-6qualitative-7
Few samples of qualitative results from the ViT-based Mask R-CNN model.

Credit

Releases

No releases published

Packages

No packages published