srt

The goal of srt is to read SubRip text files as tabular data for easy analysis and manipulation.

Installation

You can install the development version of srt from GitHub with:

# install.packages("remotes")
remotes::install_github("kiernann/srt")

Example

The .srt standard is used to identify the subtitle components for the columns of a data frame:

A numeric counter identifying each sequential subtitle
The time that the subtitle should appear followed by --> and the time it should disappear
Subtitle text itself on one or more lines
A blank line containing no text, indicating the end of this subtitle

library(srt)
library(tidyverse)
library(tidytext)
srt <- srt_example()

#> 1
#> 00:01:25,210 --> 00:01:28,004
#> I owe everything to George Bailey.
#> 
#> 2
#> 00:01:28,422 --> 00:01:30,298
#> Help him, dear Father.
#> 
#> 3
#> 00:01:30,674 --> 00:01:33,718
#> Joseph, Jesus and Mary,

These subtitle files are parsed as data frames with separate columns.

(wonderful_life <- read_srt(path = srt, collapse = " "))
#> # A tibble: 2,268 x 4
#>        n start   end subtitle                           
#>    <int> <dbl> <dbl> <chr>                              
#>  1     1  85.2  88.0 I owe everything to George Bailey. 
#>  2     2  88.4  90.3 Help him, dear Father.             
#>  3     3  90.7  93.7 Joseph, Jesus and Mary,            
#>  4     4  93.8  96.4 help my friend Mr. Bailey.         
#>  5     5  96.9  99.5 Help my son George tonight.        
#>  6     6 100.  102.  He never thinks about himself, God.
#>  7     7 102.  104.  That's why he's in trouble.        
#>  8     8 104.  105.  George is a good guy.              
#>  9     9 106.  108.  Give him a break, God.             
#> 10    10 108.  110.  I love him, dear Lord.             
#> # … with 2,258 more rows

This makes it easy to perform various text analysis on the subtitles.

wonderful_life %>% 
  unnest_tokens(word, subtitle) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words)
#> # A tibble: 1,651 x 2
#>    word       n
#>    <chr>  <int>
#>  1 george   216
#>  2 mary      85
#>  3 bailey    74
#>  4 hey       56
#>  5 harry     53
#>  6 yeah      50
#>  7 gonna     45
#>  8 potter    45
#>  9 home      34
#> 10 money     34
#> # … with 1,641 more rows

Or uniformly manipulate the numeric time stamps:

wonderful_life <- srt_shift(wonderful_life, seconds = 9.99)

The subtitle data frames can be easily re-written as valid SubRip files.

tmp <- tempfile(fileext = ".srt")
write_srt(wonderful_life, tmp, wrap = FALSE)

#> 1
#> 00:01:35,200 --> 00:01:37,994
#> I owe everything to George Bailey.
#> 
#> 2
#> 00:01:38,412 --> 00:01:40,288
#> Help him, dear Father.
#> 
#> 3
#> 00:01:40,664 --> 00:01:43,708
#> Joseph, Jesus and Mary,

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
R		R
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
srt.Rproj		srt.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

srt

Installation

Example

About

Releases

Packages

Languages

License

CyuHat/srt

Folders and files

Latest commit

History

Repository files navigation

srt

Installation

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages