stoRy is a Tidyverse friendly package for downloading, exploring, and analyzing Literary Theme Ontology (LTO) data in R.
# Install the released version of stoRy from CRAN with:
install.packages("stoRy")
# Or the developmental version from GitHub:
# install.packages("devtools")
devtools::install_github("theme-ontology/stoRy")
The easiest way to get started with stoRy is to make use of the LTO demo version data. It consists of the themes and 335 thematically annotated The Twilight Zone American media franchise stories from the latest LTO version.
Begin by loading the stoRy package:
library(stoRy)
The LTO demo version is loaded by default:
which_lto()
Get a feel for the demo data by printing some basic information about it to console:
print_lto()
See the demo data help page for a more in depth description:
?`lto-demo`
Thematically annotated stories are initialized by story ID. For example, run
story <- Story$new(story_id = "tz1959e1x22")
to initialize a Story
object representing the classic The Twilight
Zone (1959) television series episode The Monsters Are Due on Maple
Street.
Story thematic annotations along with episode identifying metadata are
printed to console in either the default or the standard .st.txt
format:
story
story$print(canonical = TRUE)
There are two complementary ways of going about finding story IDs. First, the LTO website story search box offers a quick-and-dirty way of locating LTO developmental version story IDs of interest. Since story IDs are stable, developmental version The Twilight Zone story IDs can be expected to agree with their demo data counterparts. Alternatively, a demo data story ID is directly obtained from an episode title as follows:
# install.packages("dplyr")
library(dplyr)
title <- "The Monsters Are Due on Maple Street"
demo_stories_tbl <- clone_active_stories_tbl()
story_id <- demo_stories_tbl %>% filter(title == !!title) %>% pull(story_id)
story_id
The dplyr
package is required to run the %>%
mediated pipeline.
A tibble of thematic annotations is obtained by running:
themes <- story$themes()
themes
The Monsters Are Due on Maple Street is a story about how mass
hysteria can
transform otherwise normal people into an angry mob. To view the mass
hysteria theme entry, initialize a Theme
object with theme_name
argument defined accordingly:
theme <- Theme$new(theme_name = "mass hysteria")
theme
theme$print(canonical = TRUE)
To view a tibble of all demo data stories featuring mass hysteria run:
theme$annotations()
As with story IDs, there are two ways to look for themes of interest. Developmental version themes are searchable from LTO website theme search box. Demo version themes are explorable in tibble format. For example, here is one way to search for mass hysteria directly in the demo themes:
# install.packages("stringr")
library(stringr)
demo_themes_tbl <- clone_active_themes_tbl()
demo_themes_tbl %>% filter(str_detect(theme_name, "mass"))
Notice that all themes containing the substring "mass"
are returned.
Each story belongs to at least one collection (i.e. a set of related stories). The Monsters Are Due on Maple Street, for instance, belongs to the two collections:
story$collections()
To initialize a Collection
object for The Twilight Zone (1959)
television series, of which The Monsters Are Due on Maple Street is an
episode, run:
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
Collection info is printed to console in the same way as with stories and themes:
collection
collection$print(canonical = TRUE)
In general, developmental version collections can be explored from the LTO website story search box or through the package in the usual way:
demo_collections_tbl <- clone_active_collections_tbl()
demo_collections_tbl
The LTO thematically annotated story data can be analyzed in various ways.
To view the top 10 most featured themes in the The Twilight Zone (1959) series run:
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_featured_themes(collection)
result_tbl
To view the top 10 most featured themes in the demo data as a whole run:
result_tbl <- get_featured_themes()
result_tbl
To view the top 10 most enriched, or over-represented themes in The Twilight Zone (1959) series with all The Twilight Zone stories as background run:
test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_enriched_themes(test_collection)
result_tbl
To run the same analysis not counting minor level themes run:
result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0))
result_tbl
To view the top 10 most thematically similar The Twilight Zone franchise stories to The Monsters Are Due on Maple Street run:
query_story <- Story$new(story_id = "tz1959e1x22")
result_tbl <- get_similar_stories(query_story)
result_tbl
Cluster The Twilight Zone franchise stories according to thematic similarity:
# install.packages("isa2")
library(isa2)
set.seed(123)
result_tbl <- get_story_clusters()
result_tbl
The command set.seed(123)
is run here for the sake of reproducibility.
Explore a cluster of stories related to traveling back in time:
cluster_id <- 3
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to mass panics:
cluster_id <- 5
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to executions:
cluster_id <- 7
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to space aliens:
cluster_id <- 10
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to old people wanting to be young:
cluster_id <- 11
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to wish making:
cluster_id <- 13
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
The package works with data from these LTO versions:
lto_version_statuses()
To download and cache the latest versioned LTO release run
configure_lto(version = "latest")
This can take awhile.
Load the newly configured LTO version as the active version in the R session:
set_lto(version = "latest")
To double check that it has been loaded successfully run
which_lto()
Now that the latest LTO version is loaded into the R session, its stories and themes can be analyzed in the same way as with the “demo” LTO version data as shown above.
If you encounter a bug, please file a minimal reproducible example on GitHub issues. For questions and other discussion, please post on the GitHub discussions board.
All code in this repository is published with the GPL v3 license.