A pure JavaScript, cross-platform module designed for extracting text from PDF files using pdf.js.
- Extract text from PDF files.
- Supports both browser and Node.js environments.
- Easy to use with promise-based API.
npm install pdf-parse2
Or
yarn add pdf-parse2
const fs = require('fs');
const PDFParse = require('pdf-parse2');
(async () => {
const dataBuffer = fs.readFileSync('path/to/your/document.pdf');
const PDFParse = new PDFParse();
try {
const pdfData = await PDFParse.loadPDF(dataBuffer);
console.log('Text:', pdfData.text);
} catch (error) {
console.error(error);
}
})();
Ensure you include pdf.js library in your project. You can then use PDFParse
similar to the Node.js example, but with fetching the PDF file using Fetch API or XMLHttpRequest.
-
loadPDF(src, options)
: Loads a PDF file and extracts text.src
can be aBuffer
orArrayBuffer
.options
is optional. -
renderPage(pageData, options)
: A helper function for rendering a single page. This function is used internally byloadPDF
.
Contributions are welcome! Please feel free to submit a Pull Request or open an issue for any bugs or feature requests.
This project is licensed under the MIT License - see the LICENSE file for details.