-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
92 lines (68 loc) · 3.99 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
output:
bookdown::github_document2:
html_preview: false
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# youtubecaption
<!-- badges: start -->
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)
[![CRAN status](https://www.r-pkg.org/badges/version/youtubecaption)](https://cran.r-project.org/package=youtubecaption)
[![Total Downloads](https://cranlogs.r-pkg.org/badges/grand-total/youtubecaption?color=orange)](https://cranlogs.r-pkg.org/badges/grand-total/youtubecaption)
[![Travis build status](https://travis-ci.org/jooyoungseo/youtubecaption.svg?branch=master)](https://travis-ci.org/jooyoungseo/youtubecaption)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/jooyoungseo/youtubecaption?branch=master&svg=true)](https://ci.appveyor.com/project/jooyoungseo/youtubecaption)
[![Codecov test coverage](https://codecov.io/gh/jooyoungseo/youtubecaption/branch/master/graph/badge.svg)](https://codecov.io/gh/jooyoungseo/youtubecaption?branch=master)
<!-- badges: end -->
## Motivation
Although there exist some R packages tailored for YouTube API (e.g., 'tuber'), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using 'youtube-transcript-api' Python package under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy tibble data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.
## Installation
### Python Dependencies
`youtubecaption` requires Anaconda Python environment on your system Path.
If you have not installed Conda environment on your system, please [download and install Anaconda](https://www.anaconda.com/download/) (Python 3.6 or later is recommended).
For this package, I have employed [**youtube-transcript-api**](https://pypi.org/project/youtube-transcript-api/) Python module into R using [**reticulate**](https://rstudio.github.io/reticulate/).
### R Package Installation
### Development Version
You can install the latest development version as follows:
```{r, eval=FALSE}
if(!require(remotes)) {
install.packages("remotes")
}
remotes::install_github("jooyoungseo/youtubecaption")
```
### Stable Version
You can install the released version of youtubecaption from [CRAN](https://CRAN.R-project.org) with:
```{r, eval=FALSE}
install.packages('youtubecaption')
```
## Usage
Please use `get_caption()` function after loading `youtubecaption` package like below:
```{r test, eval=FALSE}
library(youtubecaption)
# Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI":
url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA"
caption <- get_caption(url)
caption
#> # A tibble: 1,420 x 5
#> segment_id text start duration vid
#> <int> <chr> <dbl> <dbl> <chr>
#> 1 1 thank you for coming to a meeting ~ 7.13 8.32 cpbtcsGE0~
#> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~
#> 3 3 happy with chief data scientist in~ 15.4 7.11 cpbtcsGE0~
#> 4 4 studio as well as the member of th~ 19.1 7.23 cpbtcsGE0~
#> 5 5 Foundation and an attempt professo~ 22.6 6 cpbtcsGE0~
#> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~
#> 7 7 Auckland he builds both computatio~ 28.6 7.17 cpbtcsGE0~
#> 8 8 and cognitive tools to make data s~ 32.8 7.5 cpbtcsGE0~
#> 9 9 easier faster and more times his w~ 35.7 7.01 cpbtcsGE0~
#> 10 10 includes various packages as well ~ 40.4 6.21 cpbtcsGE0~
#> # ... with 1,410 more rows
# Save the caption as an Excel file and open it right away:
get_caption(url = url, savexl = TRUE, openxl = TRUE)
```