-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
292 lines (242 loc) · 8.45 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# stoRy
This is a test line
<!-- badges: start -->
[![R-CMD-check](https://github.com/theme-ontology/stoRy/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/theme-ontology/stoRy/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/theme-ontology/stoRy/branch/master/graph/badge.svg)](https://codecov.io/gh/theme-ontology/stoRy?branch=master)
[![Life
cycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![License: GPL
v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)
<!-- badges: end -->
stoRy is a [Tidyverse](https://www.tidyverse.org) friendly package for
downloading, exploring, and analyzing
[Literary Theme Ontology](https://themeontology.org/) (LTO) data in **R**.
## Installation
``` r
# Install the released version of stoRy from CRAN with:
install.packages("stoRy")
# Or the developmental version from GitHub:
# install.packages("devtools")
devtools::install_github("theme-ontology/stoRy")
```
## Using stoRy
The easiest way to get started with stoRy is to make use of the LTO demo
version data. It consists of the themes and 335 thematically annotated
[The Twilight Zone](https://en.wikipedia.org/wiki/The_Twilight_Zone) American
media franchise stories from the latest
[LTO version](https://themeontology.org/pub/data/).
Begin by loading the stoRy package:
``` r
library(stoRy)
```
### Exploring the Demo Data
The LTO demo version is loaded by default:
``` r
which_lto()
```
Get a feel for the demo data by printing some basic information about it to
console:
``` r
print_lto(version = "demo")
```
See the demo data help page for a more in depth description:
``` r
?`lto-demo`
```
#### Exploring the Demo Stories
Thematically annotated stories are initialized by *story ID*. For example, run
``` r
story <- Story$new(story_id = "tz1959e1x22")
```
to initialize a `Story` object representing the classic *The Twilight Zone*
(1959) television series episode *The Monsters Are Due on Maple Street*.
Story thematic annotations along with episode identifying metadata are printed
to console in either the default or the standard `.st.txt` format:
``` r
story
story$print(canonical = TRUE)
```
There are two complementary ways of going about finding story IDs. First, the
LTO website [story search box](https://themeontology.org/stories) offers a
quick-and-dirty way of locating LTO developmental version story IDs of
interest. Since story IDs are stable, developmental version *The Twilight
Zone* story IDs can be expected to agree with their demo data counterparts.
Alternatively, a demo data story ID is directly obtained from an episode title
as follows:
``` r
# install.packages("dplyr")
library(dplyr)
title <- "The Monsters Are Due on Maple Street"
demo_stories_tbl <- clone_active_stories_tbl()
story_id <- demo_stories_tbl %>% filter(title == !!title) %>% pull(story_id)
story_id
```
The `dplyr` package is required to run the `%>%` mediated pipeline.
A tibble of thematic annotations is obtained by running:
``` r
themes <- story$themes()
themes
```
#### Exploring the Demo Themes
*The Monsters Are Due on Maple Street* is a story about how
[mass hysteria](https://themeontology.org/theme.php?name=mass%20hysteria)
can transform otherwise normal people into an angry mob.
To view the *mass hysteria* theme entry, initialize a `Theme` object with
`theme_name` argument defined accordingly:
``` r
theme <- Theme$new(theme_name = "mass hysteria")
theme
theme$print(canonical = TRUE)
```
To view a tibble of all demo data stories featuring *mass hysteria* run:
``` r
theme$annotations()
```
As with story IDs, there are two ways to look for themes of interest.
Developmental version themes are searchable from LTO website
[theme search box](https://themeontology.org/themes). Demo version themes are
explorable in tibble format. For example, here is one way to search for *mass
hysteria* directly in the demo themes:
``` r
# install.packages("stringr")
library(stringr)
demo_themes_tbl <- clone_active_themes_tbl()
demo_themes_tbl %>% filter(str_detect(theme_name, "mass"))
```
Notice that all themes containing the substring `"mass"` are returned.
#### Exploring the Demo Collections
Each story belongs to at least one collection (i.e. a set of related stories).
*The Monsters Are Due on Maple Street*, for instance, belongs to the two
collections:
``` r
story$collections()
```
To initialize a `Collection` object for *The Twilight Zone* (1959) television
series, of which *The Monsters Are Due on Maple Street* is an episode, run:
``` r
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
```
Collection info is printed to console in the same way as with stories and
themes:
``` r
collection
collection$print(canonical = TRUE)
```
In general, developmental version collections can be explored from the LTO
website [story search box](https://themeontology.org/stories) or through the
package in the usual way:
``` r
demo_collections_tbl <- clone_active_collections_tbl()
demo_collections_tbl
```
### Analyzing the Demo Data
The LTO thematically annotated story data can be analyzed in various ways.
#### Topmost Featured Themes
To view the top 10 most featured themes in the *The Twilight Zone* (1959)
series run:
``` r
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_featured_themes(collection)
result_tbl
```
To view the top 10 most featured themes in the demo data as a whole run:
``` r
result_tbl <- get_featured_themes()
result_tbl
```
#### Topmost Enriched Themes
To view the top 10 most enriched, or over-represented themes in *The Twilight
Zone* (1959) series with all *The Twilight Zone* stories as background run:
``` r
test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_enriched_themes(test_collection)
result_tbl
```
To run the same analysis not counting *minor* level themes run:
``` r
result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0))
result_tbl
```
#### Topmost Similar Stories
To view the top 10 most thematically similar *The Twilight Zone* franchise
stories to *The Monsters Are Due on Maple Street* run:
``` r
query_story <- Story$new(story_id = "tz1959e1x22")
result_tbl <- get_similar_stories(query_story)
result_tbl
```
#### Similar Story Clusters
Cluster *The Twilight Zone* franchise stories according to thematic
similarity:
``` r
library(dplyr)
set.seed(123)
result_tbl <- get_story_clusters()
result_tbl
```
The command `set.seed(123)` is run here for the purpose of reproducibility.
Explore a cluster of stories related to executions:
``` r
cluster_id <- 8
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
```
Explore a cluster of stories related to old people wanting to be young:
``` r
cluster_id <- 10
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
```
Explore a cluster of stories related to space aliens:
``` r
cluster_id <- 13
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
```
Explore a cluster of stories related to wish making:
``` r
cluster_id <- 15
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
```
## Downloading Data
The package works with data from these LTO versions:
``` r
lto_version_statuses()
```
To download and cache the latest versioned LTO release run
``` r
configure_lto(version = "latest")
```
This can take awhile.
Load the newly configured LTO version as the active version in the R session:
``` r
set_lto(version = "latest")
```
To double check that it has been loaded successfully run
``` r
which_lto()
```
Now that the latest LTO version is loaded into the R session, its stories and
themes can be analyzed in the same way as with the "demo" LTO version data as
shown above.
## Getting Help
If you encounter a bug, please file a minimal reproducible example on
[GitHub issues](https://github.com/theme-ontology/stoRy/issues). For questions
and other discussion, please post on the
[GitHub discussions board](https://github.com/theme-ontology/stoRy/discussions/).
## License
All code in this repository is published with the [GPL v3](./LICENSE) license.