pdf_ocr

Summary

A web interface for performing Optical Character Recognition on PDF files. Simply upload a PDF file using the provided form and you will, after a while, be presented with a zipfile containing its pages in text format. Note: Processing may be very slow and so either great hardware or great patience (and sometimes both) are advised.

Usage

This web interface is best deployed as a Docker image either locally or in a more advanced configuration with an ingress service. For this purpose a Dockerfile is provided, ready to build.

Security Warning

Apart from very rudimentary input sanitation there is no security or authentication provided, therefore great caution is advised when exposing the interface to an untrusted network. In addition, since OCR processing can be very CPU-intensive, performing a denial-of-service attack through request flooding is extremely easy.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
pages		pages
public		public
styles		styles
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package.json		package.json
pdf_ocr.sh		pdf_ocr.sh
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf_ocr

Summary

Usage

Security Warning

About

Languages

License

nilssonk/pdf_ocr

Folders and files

Latest commit

History

Repository files navigation

pdf_ocr

Summary

Usage

Security Warning

About

Topics

Resources

License

Stars

Watchers

Forks

Languages