Skip to content

hackNY-labs-2018/dany-parsing

Repository files navigation

DANY Bank Parser

Setup

Before handover to DANY, we will have dockerized this code for easy deployment (I will check with Alex if this is a good idea.

  • Install virtualenv
  • You should have a virtualenv created. This repository automatically ignores the venv folder so it would be a good name for a virtualenv.
  • Remember to activate your virtualenv with source venv/bin/activate
  • Install all the dependencies with pip install -r requirements.txt.
  • Install imagemagick and libimagemagickdev (differs by platform). Available via apt on Ubuntu
  • Install tesseract (see above.)

Invoking

The main point of entry for this program is currently ocr_parser.py. To do OCR on a file, run: python ocr_parser.py <FILE_NAME>. If your file is a PDF, we will start by converting it to an image and then run OCR on it. If it is an image file, OCR will be run directly. We detect whether filetype entirely via extension.

About

Bank record parser for DANY

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages