Instance Segmentation Using ViT-based Mask R-CNN

Instance segmentation aims at dichotomizing a pixel acting as a sub-object of a unique entity in the scene. One of the approaches, which combines object detection and semantic segmentation, is Mask R-CNN. Furthermore, we can also incorporate ViT as the backbone of Mask R-CNN. In this project, the pre-trained ViT-based Mask R-CNN model is fine-tuned and evaluated on the dataset from the Penn-Fudan Database for Pedestrian Detection and Segmentation. With a ratio of 80:10:10, the train, validation, and test sets are distributed.

Experiment

Leap into this link that harbors a Jupyter Notebook of the entire experiment.

Result

Quantitative Result

The following table delivers the performance results of ViT-based Mask R-CNN, quantitatively.

Test Metric	Score
mAP^box@0.5:0.95	96.85%
mAP^mask@0.5:0.95	79.58%

Loss Curve

Loss curves of ViT-based Mask R-CNN on the Penn-Fudan Database for Pedestrian Detection and Segmentation train and validation sets.

Qualitative Result

Below, the qualitative results are presented.

Few samples of qualitative results from the ViT-based Mask R-CNN model.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
Instance_Segmentation_Using_ViT_based_Mask_RCNN.ipynb		Instance_Segmentation_Using_ViT_based_Mask_RCNN.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instance Segmentation Using ViT-based Mask R-CNN

Experiment

Result

Quantitative Result

Loss Curve

Qualitative Result

Credit

About

Releases

Packages

Languages

reshalfahsi/instance-segmentation-vit-maskrcnn

Folders and files

Latest commit

History

Repository files navigation

Instance Segmentation Using ViT-based Mask R-CNN

Experiment

Result

Quantitative Result

Loss Curve

Qualitative Result

Credit

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages