Dswt Reader

Description

A text reader for scenes with dynamic text using docTR's ocr_predictor as the OCR module.

Input

The wrapper takes a VideoDocument with SWT TimeFrame annotations. Specifically, it uses the property timePoint of the first and last timepoints of the target in the TimeFrame classified as credits by the swt app.

The classification of whether each scene with text is credits or not is assumed to be perfectly handled by the swt-app.

Output

For each TimeFrame classified as credits, a single TextDocument is generated and added to the MMIF as a new view. The text value of the TextDocument stores the text extracted from the dynamic credits in the best possible reading order, considering the positional arrangement of text blocks or columns within each scene.

The best reading order is usually with job titles followed by the corresponding name (or names).

The TextDocument is aligned to the TimeFrame.

User instruction

General user instructions for CLAMS apps are available at CLAMS Apps documentation.
The documentation for docTR, the OCR model used in this app.
The examples/gold_transcriptions/ folder contains gold annotations, formatted as follows:
- Each Job Title and corresponding Names are listed as follows, based on the placement of the text within the scenes. Each Job title-Names pair is separated by two newlines (\n\n):
  - <Job title> <name> or
  - <Job title>\n<names> or
  - <Job title> <name>\n<names>
- Logo Part: Logos are annotated using <Logo> or <Logos>.
- Other texts are transcribed based on their placement within the scene.
Start and end timePoint (in ms) of annotated TimeFrames in each example video
- sample_video_1.mp4 <start>: 0, <end>: 282000
- sample_video_2.mp4 <start>: 10000, <end>: 250000
- sample_video_3.mp4 <start>: 52000, <end>: 295000
When running this app, the user can decide whether to apply an algorithm that identifies scenes (timepoints) with multiple columns of text and reorders the output text considering the positional arrangement of these text blocks or columns within each scene. This option is controlled by a boolean parameter called multiColumn, which defaults to True.

System requirements

Requires mmif-python[cv] for the VideoDocument helper functions
Requires GPU to run docTR model at a reasonable speed
Please refer to the requirements.txt for the required libraries and their version information.

Configurable runtime parameter

For the full list of parameters, please refer to the app metadata from the CLAMS App Directory or the metadata.py file in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
examples		examples
.dockerignore		.dockerignore
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cli.py		cli.py
evaluation.py		evaluation.py
metadata.py		metadata.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dswt Reader

Description

Input

Output

User instruction

System requirements

Configurable runtime parameter

About

Releases

Packages

Languages

License

clamsproject/app-dswt-reader

Folders and files

Latest commit

History

Repository files navigation

Dswt Reader

Description

Input

Output

User instruction

System requirements

Configurable runtime parameter

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages