-
Notifications
You must be signed in to change notification settings - Fork 7
Workflow Guide generic transformations
Konstantin Baierer edited this page Feb 9, 2022
·
3 revisions
Sometimes PAGE-XML annotations need to be processed specially to make a workflow's processors interoperate properly. For example, a text producing processor might forget to make TextEquiv
consistent between hierarchy levels, or it might be necessary to remove specific region types. Also, repairing minor syntactic or semantic deficiencies is usually required for export or visualization, like removing empty ReadingOrder
and dead @regionRef
s, ensuring each TextEquiv
has a Unicode
, or fixing negative or floating-point coordinates. While it is always possible to do that ad-hoc via scripts, it might help formulate this as a proper workflow step via processor CLI.
Processor | Parameter | Remarks | Call |
---|---|---|---|
ocrd-page-transform | -P xsl page-remove-regions.xsl -P xslt-params "-s type=ImageRegion" |
Many useful XSLTs come as preinstalled resources, but can be passed any XSL file. Specify mimetype if the output is not PAGE-XML anymore |
ocrd-page-transform |
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows