Ocr miner , Tesseract based image to text service.
Ocr Miner can detect followed types of data:
- Phone Number
- 555-543-2109
- 0212-9876543
- 543-987-6543
- 222 987 6543
- (501)234-5678
- +90539.456.7890
- TR identity number, US Social Security Number, Europe VAT
- BG1214317890
- 60925736682
- 001-26-4753
- Credit Card
- Visa,Master...
- 3530-1113-3330-0000
- 6011000990139424
- 5105 1051 0510 5109
- Plate
- USA,Germnay,China,Russia,Turkey
- Date
- 02-02-1337
- 02 02 1339
- 12/02/1555
- 22.02.1556
- Email
- email_validator
- Domain
- google.io
- Strong validation IANA tld list
- Url
- Hash
- Strong validation with Shannon entropy calculation.
- Md5
- Md4
- Sha1
- Sha256
- Sha512
- NTLM
- Combolist
- [email protected]:pa@ssw0rd!
- usern:password
- Fastapi
- Docker
- Redis
- Tesseract
- SqlAlchemy
- Pyvat
- python-magic
- email-validator
- opencv-python-headless
- jinja2
-
Edit envs/.env file
- HOST="psql-service-name"
- REDIS_HOST="redis-service-name"
- USERNAME="psql-username"
- PASSWORD="psql-password"
- UPLOAD_FOLDER="data/uploads"
- CLOUDFLARE_TURNSTILE="cloudflare-private-key"
- change sitekey in ocrminer.js for CloudFlare.
For more information : https://fastapi.tiangolo.com/deployment/server-workers/
set docker-compose > fastapi-service > command to :
gunicorn ocr_miner.api.ocr_miner_api:APP --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
set docker-compose > fastapi-service > command to :
python3.11 manage.py --api
docker-compose up