This is a TensorFlow implementation of the FCN-8s model architecture for semantic image segmentation introduced by Shelhamer et al. in the paper Fully Convolutional Networks for Semantic Segmentation.
This repository only contains the 'all-at-once' version of the FCN-8s model, which converges significantly faster than the version trained in stages. A convolutionalized VGG-16 model trained on ImageNet classification is provided and serves as the encoder of the FCN-8s. Sufficient documentation and a tutorial on how to train, evaluate and use the model for prediction are also provided. Some useful TensorBoard summaries can be recorded out of the box.
Below are some prediction examples of the model trained on the Cityscapes dataset for 13,000 steps at batch size 16, at which point the model achieves a mean IoU of 38.2% on the validation dataset. This is far from convergence of course, the purpose of these examples is just to demonstrate that the code works and the model learns. You can watch the model in action on the Cityscapes demo videos here.
- Python 3.x
- TensorFlow 1.x
- Numpy
- Scipy
- OpenCV (for data augmentation)
- tqdm
fcn8s_tutorial.ipynb explains how to train and evaluate the model and how to make and visualize predictions.
You can download the pre-trained, convolutionalized VGG-16 model here