Skip to content

Extract all links from a PDF and archive the URLs in the Internet Archive's Wayback Machine

License

Notifications You must be signed in to change notification settings

thoth-pub/archive-pdf-urls

Repository files navigation

Archive PDF URLs

This command-line tool extracts URLs from a PDF file and archives them using the Wayback Machine.

Build status Crates.io

Installation

You can build and install the tool using Cargo:

cargo install archive-pdf-urls

Usage

The tool reads URLs from standard input, one URL per line, and archives them using the Wayback Machine.

Example usage:

archive-pdf-urls file.pdf --exclude https://some.pattern/\*

Docker usage

docker run --rm -v ./file.pdf:/file.pdf ghcr.io/thoth-pub/archive-pdf-urls file.pdf

About

Extract all links from a PDF and archive the URLs in the Internet Archive's Wayback Machine

Resources

License

Stars

Watchers

Forks

Packages