A TensorFlow model for real-world image text extraction problems.
This folder contains the code needed to train a new Attention OCR model on the FSNS dataset dataset to transcribe street names in France. You can also use it to train it on your own data.
More details can be found in our paper:
"Attention-based Extraction of Structured Information from Street View Imagery"
Authors: Zbigniew Wojna [email protected], Alexander Gorban [email protected]
Pull requests: alexgorban
- Installed TensorFlow library (instructions).
- At least 158Gb of free disk space to download FSNS dataset:
aria2c -c -j 20 -i ../street/python/fsns_urls.txt
- 16Gb of RAM or more, 32Gb is recommended.
- The train.py works with in both modes CPU and GPU, using GPU is preferable. The GPU mode was tested with Titan X and GTX980.
To run all unit tests:
python -m unittest discover -p '*_test.py'
To train from scratch:
python train.py
To train a model using a pre-trained inception weights as initialization:
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
python train.py --checkpoint_inception=inception_v3.ckpt
To fine tune the Attention OCR model using a checkpoint:
wget http://download.tensorflow.org/models/attention_ocr_2017_05_01.tar.gz
tar xf attention_ocr_2017_05_01.tar.gz
python train.py --checkpoint=model.ckpt-232572
This code is a modified version of the internal model we used for our paper. Currently it reaches 82.71% full sequence accuracy after 215k steps of training. The main difference between this version and the version used in the paper - for the paper we used a distributed training with 50 GPU (K80) workers (asynchronous updates), the provided checkpoint was created using this code after ~60 hours of training on a single GPU (Titan X).