This module can be used to extract text from a PDF. Currently, it only contains a single function that traverses a PDF line-by-line and uses a RuleSet passed as a parameter to extract particular bits of information. It's set up to extract the total, vat, date, and time from receipts.
Function | Alias | Description |
---|---|---|
Get-TextFromPdf |
Extracts text from a PDF using a RuleSet. |
Get-TextFromPDF -Path 'c:\temp\receipt01.pdf'
$ruleSet = @(
[pscustomobject]@{
Name = "Total"
Expression = "(?i)Net: ?"
Function = {
return [regex]::Match($text, "\d{1,2}\.\d{2}").Value
}
}
)
Get-TextFromPDF -Path '.\receipt01.pdf' -RuleSet $ruleSet