Gemini JSON OCR is a proof of concept showing how easy it is to use the latest Google Gemini to extract structured JSONs from documents.
$ export GOOGLE_API_KEY=<get your API key at https://aistudio.google.com/app/apikey>
$ uv run scan.py /Users/maurycy/Desktop/test
INFO:root:Processing file: MX-C304W_16122024_143019.pdf
Results for MX-C304W_16122024_143019.pdf have been written to /Users/maurycy/Desktop/test/MX-C304W_16122024_143019.pdf.json
Resulting in a JSON, such as:
{
"waybill": {
"scac": "SEAU",
"booking_no": "4803804131",
"bl_no": "4803804131",
"vessel": "MERIDIAN",
"containers": [ "TLLU5242619", "MSKU829454" ]
}
}
Make sure that you've got uv installed:
# macOS
brew install uv
(No need to install Python etc. uv
will take care of that!)
The prompt is in the prompt.txt
.
Supported environment variables:
GEMINI_MODEL
, by defaultgemini-2.0-flash-exp
GOOGLE_API_KEY
, to be retrieved from Google AI Studio