Skip to content

A cute PDF parser that gives position of elements for inspection purposes.

License

Notifications You must be signed in to change notification settings

LiterateInk/PDFInspector

@literate.ink/pdf-inspector

Installation

npm install @literate.ink/pdf-inspector
yarn add @literate.ink/pdf-inspector
pnpm add @literate.ink/pdf-inspector
bun add @literate.ink/pdf-inspector

Usage

import { parsePDF } from "@literate.ink/pdf-inspector";

const pages = await parsePDF(buffer);

for (const page of pages) {
  console.log(`Page of ${page.Width}x${page.Height}px`);
  console.log("- contains", page.Texts.length, "texts");
  console.log("- contains", page.Fills.length, "fills");
}

Credits

The JS/TS implementation is a fork of the following projects :

  • pdf2json is a Node.js module that parses and converts PDF from binary to JSON format
  • pdf.js but lighter because we only need a few things from it for what we are doing