Skip to content

clamsproject/app-dswt-reader

Repository files navigation

Dswt Reader

Description

A text reader for scenes with dynamic text using docTR's ocr_predictor as the OCR module.

Input

The wrapper takes a VideoDocument with SWT TimeFrame annotations. Specifically, it uses the property timePoint of the first and last timepoints of the target in the TimeFrame classified as credits by the swt app.

The classification of whether each scene with text is credits or not is assumed to be perfectly handled by the swt-app.

Output

For each TimeFrame classified as credits, a single TextDocument is generated and added to the MMIF as a new view. The text value of the TextDocument stores the text extracted from the dynamic credits in the best possible reading order, considering the positional arrangement of text blocks or columns within each scene.

The best reading order is usually with job titles followed by the corresponding name (or names).

The TextDocument is aligned to the TimeFrame.

User instruction

  • General user instructions for CLAMS apps are available at CLAMS Apps documentation.

  • The documentation for docTR, the OCR model used in this app.

  • The examples/gold_transcriptions/ folder contains gold annotations, formatted as follows:

    • Each Job Title and corresponding Names are listed as follows, based on the placement of the text within the scenes. Each Job title-Names pair is separated by two newlines (\n\n):

      • <Job title> <name> or
      • <Job title>\n<names> or
      • <Job title> <name>\n<names>
    • Logo Part: Logos are annotated using <Logo> or <Logos>.

    • Other texts are transcribed based on their placement within the scene.

  • Start and end timePoint (in ms) of annotated TimeFrames in each example video

    • sample_video_1.mp4 <start>: 0, <end>: 282000
    • sample_video_2.mp4 <start>: 10000, <end>: 250000
    • sample_video_3.mp4 <start>: 52000, <end>: 295000
  • When running this app, the user can decide whether to apply an algorithm that identifies scenes (timepoints) with multiple columns of text and reorders the output text considering the positional arrangement of these text blocks or columns within each scene. This option is controlled by a boolean parameter called multiColumn, which defaults to True.

System requirements

  • Requires mmif-python[cv] for the VideoDocument helper functions
  • Requires GPU to run docTR model at a reasonable speed
  • Please refer to the requirements.txt for the required libraries and their version information.

Configurable runtime parameter

For the full list of parameters, please refer to the app metadata from the CLAMS App Directory or the metadata.py file in this repository.

About

Text reader app for dynamic scenes with text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published