Skip to content

Tool to help organize your invoices, documents...

License

Notifications You must be signed in to change notification settings

ymettier/fileganizer

Repository files navigation

Lint Status Build Status

Fileganizer

Fileganizer is a tool that will

  • run a command to extract text from an input file,
  • parse the extracted text with grok-like patterns,
  • choose a pre-configured go-template depending on parsing results,
  • generate a result with go-template,
  • optionaly run the result (as a command).o

The use-case is to run some pdftotext command to extract text from your invoices and other similar documents, try to find patterns like IDs, date, name, and rename (move) the file using the results of the parsing.

Tutorial

Copy config.yaml.sample as config.yaml. Edit the file:

Leave ExtractTextCommand as is if you have pdftotext installed. Or change it if you prefer using another tool.

Leave env as is or declare other environment variables according to your needs. These environment variables will be available in your go-templates.

Leave commonTemplate empty. You will fill it later, according to your needs.

Leave months as is or translate months into your language. This is used to convert months names into number. For example octobre (in French, meaning october can be converted to 10).

Leave grokPatterns as is. You may add new patterns later, according to your needs.

Now we will work with fileDescriptions that contains patterns to try to apply on the input file and output as a go-template that we configure as a shell command.

  1. Run fileganizer -c config.yaml -f yourfile.pdf -t. This will print the output of the ExtractTextCommand.
  2. identify some interesting patterns, for example a date, an identifier...
  3. add these patterns with grok syntax (learn with Grok filter plugin from Logstash). Note that the parser is Grokky and is not fully compatible with Grok.
  4. forge a go-template output with all avaiable variables (.filename, .env.XXX for environment variables, .grok.xxx for parsed data.
  5. Run fileganizer -c config.yaml -f yourfile.pdf (without the -t option). This do all the job and print the generated result.

You can iterate as many times as you need to improve the template. You can also add other fileDescriptions to identify other document types and print from other go-templates.

When you want to run the output as a shell command, add -r option: fileganizer -c config.yaml -f yourfile.pdf -r.

Build

go build

Test

go test ./...

Run

Run fileganizer on a file and print the generated output:

./fileganizer -c <config.yaml> -f <file.pdf>

Run fileganizer on a file and run the generated output:

./fileganizer -c <config.yaml> -f <file.pdf> -r

Show pdf text contents

./fileganizer -c <config.yaml> -f <file.pdf> -t

Environment variables

Name Value
LOG_TXT_FILENAME file to log in, in plain text. Possible values: stdout, stderr, any filename.
LOG_JSON_FILENAME file to log in, in json format. Possible values: stdout, stderr, any filename.
LOG_LEVEL one of debug, info, warn, error, panic or fatal. Default is info

Note : if none of LOG_TXT_FILENAME or LOG_JSON_FILENAME is set, logging will output to stdout in plain text format, same as if LOG_TXT_FILENAME=stdout.

Licensing

This project is licensed under the MIT License. See the LICENSE file for the full license text.

About

Tool to help organize your invoices, documents...

Resources

License

Stars

Watchers

Forks

Languages