The Standard Transcription JSON (STJ) format is designed to be flexible and comprehensive. Following best practices ensures consistency, interoperability, and ease of use across different applications and services.
- Mandatory Fields: Always include all mandatory fields as specified in the STJ specification.
- Optional Fields: Use optional fields where appropriate to enhance the richness of the data.
- Timestamps: Represent all time fields (
start
,end
) in seconds as floating-point numbers. - Language Codes: Use standard ISO 639-1 or ISO 639-3 language codes.
- Creation Timestamp: Use the ISO 8601 format for the
created_at
field (e.g.,"2023-10-20T12:00:00Z"
).
- Field Names: Use lowercase letters with underscores (
_
) for multi-word field names. - Identifiers: Use unique strings for
id
fields inspeakers
andstyles
.
-
Source Languages (
metadata.source.languages
):- Represent the languages present in the original media.
- Useful for understanding the content and planning transcription or translation processes.
-
Transcription Languages (
metadata.languages
):- Represent the languages included in the transcription.
- May differ from source languages if translations are included.
- Specify Language per Segment:
- Use the
language
field in each segment to indicate the language of the transcribed text. - Crucial for multilingual transcriptions and translations.
- Use the
- Transcription with Translation:
- Source media contains English and Spanish.
- Transcription includes translations into French and German.
metadata.source.languages
:["en", "es"]
metadata.languages
:["fr", "de"]
- Non-Overlapping Segments: Ensure that segments do not overlap in time.
- Sequential Order: Segments should be ordered sequentially based on their
start
times.
- Consistent Speaker IDs: Use consistent
speaker_id
values throughout the transcript. - Unknown Speakers: If the speaker is unknown, omit the
speaker_id
field or use a placeholder like"speaker_id": "Unknown"
.
- Per-Segment Language: Specify the
language
field for each segment in multilingual transcripts. - Default Language: If the entire transcript is in a single language, specifying
language
in each segment is optional.
- Detailed Word Data: Include the
words
array in segments when word-level timing or confidence is needed. - Minimal Segments: If word-level details are not required, omit the
words
array to reduce file size.
- Complete Word Data:
- When
word_timing_mode
is"complete"
, ensure that all words in thetext
field are represented in thewords
array.
- When
- Partial Word Data:
- When
word_timing_mode
is"partial"
, include as many words as have timing data, and ensure they are in the correct order.
- When
- No Word Data:
- When there is no word-level timing data, you may omit the
words
array or setword_timing_mode
to"none"
.
- When there is no word-level timing data, you may omit the
- Use of Styles: Define styles in the
styles
array and reference them in segments usingstyle_id
. - Default Styles: If no styling is needed, omit the
styles
section.
- Including Confidence: Provide
confidence
scores for segments and words when available. - Interpreting Confidence: Use the
confidence_threshold
from metadata to determine if segments need review.
- Custom Data: Use the
additional_info
field to include application-specific data. - Namespacing: For custom fields, use a prefix (e.g.,
x_custom_field
) to avoid conflicts with future specification updates.
- Source URI: Provide the
uri
inmetadata.source
if the source media is accessible. - Duration: Ensure
duration
matches the actual length of the media.
- ISO 8601 Format: Use the ISO 8601 format for the
created_at
field (e.g.,"2023-10-20T12:00:00Z"
).
- Schema Validation: Validate STJ files against the provided JSON schema to ensure compliance.
- Automated Testing: Incorporate validation into automated testing workflows.
- Compatibility: When converting to other formats (e.g., SRT, VTT), ensure that essential information is preserved.
- Extensions: Avoid breaking changes when extending the format for specific needs.
- File Size: Be mindful of file size, especially when including detailed word-level data.
- Efficient Parsing: Structure data to facilitate efficient parsing and processing by applications.
- Data Sanitization: Sanitize any user-generated content to prevent injection attacks.
- Sensitive Information: Avoid including sensitive personal information in the
additional_info
or other fields unless necessary and secured.
By adhering to these best practices, you can ensure that your use of the STJ format is consistent, reliable, and compatible with a wide range of tools and applications.