Denoising Dirty Documents

Author

Nhat Pham (https://github.com/nhatsmrt) & Hoang Phan (https://github.com/petrpan26)

Introduction

This project is based on Kaggle's competition: https://www.kaggle.com/c/denoising-dirty-documents
The challenge is to removed different types of synthetic noises from scanned texts.
NOTE: This project is writen in Tensorflow 1.9.

Approach

Small windows (e.g of size ) of the scanned texts are passed through an autoencoder-like neural network.
The network has a convolutional encoder with residual connections. For the decoder component, a simple feedforward layer is sufficient. However, a deconvolutional layer is used because it has less parameters, which speeds up training time.
Detailed architecture can be found in code and project report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Denoising Dirty Documents

Author

Introduction

Approach

Some demo (from competition's test files)

Before:

After:

Before:

After:

Before:

After:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Denoising Dirty Documents

Author

Introduction

Approach

Some demo (from competition's test files)

Before:

After:

Before:

After:

Before:

After: