Skip to content

Latest commit

 

History

History
28 lines (28 loc) · 1.62 KB

README.md

File metadata and controls

28 lines (28 loc) · 1.62 KB

Denoising Dirty Documents

Author

Nhat Pham (https://github.com/nhatsmrt) & Hoang Phan (https://github.com/petrpan26)

Introduction

This project is based on Kaggle's competition: https://www.kaggle.com/c/denoising-dirty-documents
The challenge is to removed different types of synthetic noises from scanned texts.
NOTE: This project is writen in Tensorflow 1.9.

Approach

Small windows (e.g of size equation) of the scanned texts are passed through an autoencoder-like neural network.
The network has a convolutional encoder with residual connections. For the decoder component, a simple feedforward layer is sufficient. However, a deconvolutional layer is used because it has less parameters, which speeds up training time.
Detailed architecture can be found in code and project report.

Some demo (from competition's test files)

Before:

Before

After:

After

Before:

Before

After:

After

Before:

Before

After:

After