Extract ToC from Deutsche Nationalbibliothek #10119

zorae · 2024-12-05T15:20:09Z

Proposal

Deutsche Nationalbibliothek has high-quality scans of the table of contents for a large part of their holdings. These can be freely accessed as a PDF. For each OL edition that has a DNB identifier attached, OL could attempt to download the corresponding PDF and extract the ToC text. Note that some of the PDFs have a wildly inaccurate text layer, so it makes sense to run our own OCR.

Example edition page at DNB:
https://d-nb.info/973546166

Example TOC:
https://d-nb.info/973546166/04

Justification

Problem: OL currently only has a table of contents for a small fraction of editions. This impacts patrons’ ability to learn what a book is about.

Impact: Increase the number of ToCs, especially for German-language books.

Research: I’ve been manually OCRing and/or transcribing a number of TOCs from DNB for use on OL and can attest that the scans are of consistently high quality.

Breakdown

Requirements Checklist

[ ]

Related files

Stakeholders

Instructions for Contributors

Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.

mekarpeles added Type: Proposal Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Table of Contents and removed Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] Needs: Lead labels Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract ToC from Deutsche Nationalbibliothek #10119

Extract ToC from Deutsche Nationalbibliothek #10119

zorae commented Dec 5, 2024

Extract ToC from Deutsche Nationalbibliothek #10119

Extract ToC from Deutsche Nationalbibliothek #10119

Comments

zorae commented Dec 5, 2024

Proposal

Justification

Breakdown

Requirements Checklist

Related files

Stakeholders

Instructions for Contributors