Skip to content

theme-ontology/stoRy

Repository files navigation

stoRy

R-CMD-check Codecov test coverage Life cycle CRAN_Status_Badge License: GPL v3

stoRy is a Tidyverse friendly package for downloading, exploring, and analyzing Literary Theme Ontology (LTO) data in R.

Installation

# Install the released version of stoRy from CRAN with:
install.packages("stoRy")

# Or the developmental version from GitHub:
# install.packages("devtools")
devtools::install_github("theme-ontology/stoRy")

Using stoRy

The easiest way to get started with stoRy is to make use of the LTO demo version data. It consists of the themes and 335 thematically annotated The Twilight Zone American media franchise stories from the latest LTO version.

Begin by loading the stoRy package:

library(stoRy)

Exploring the Demo Data

The LTO demo version is loaded by default:

which_lto()

Get a feel for the demo data by printing some basic information about it to console:

print_lto()

See the demo data help page for a more in depth description:

?`lto-demo`

Exploring the Demo Stories

Thematically annotated stories are initialized by story ID. For example, run

story <- Story$new(story_id = "tz1959e1x22")

to initialize a Story object representing the classic The Twilight Zone (1959) television series episode The Monsters Are Due on Maple Street.

Story thematic annotations along with episode identifying metadata are printed to console in either the default or the standard .st.txt format:

story
story$print(canonical = TRUE)

There are two complementary ways of going about finding story IDs. First, the LTO website story search box offers a quick-and-dirty way of locating LTO developmental version story IDs of interest. Since story IDs are stable, developmental version The Twilight Zone story IDs can be expected to agree with their demo data counterparts. Alternatively, a demo data story ID is directly obtained from an episode title as follows:

# install.packages("dplyr")
library(dplyr)
title <- "The Monsters Are Due on Maple Street"
demo_stories_tbl <- clone_active_stories_tbl()
story_id <- demo_stories_tbl %>% filter(title == !!title) %>% pull(story_id)
story_id

The dplyr package is required to run the %>% mediated pipeline.

A tibble of thematic annotations is obtained by running:

themes <- story$themes()
themes

Exploring the Demo Themes

The Monsters Are Due on Maple Street is a story about how mass hysteria can transform otherwise normal people into an angry mob. To view the mass hysteria theme entry, initialize a Theme object with theme_name argument defined accordingly:

theme <- Theme$new(theme_name = "mass hysteria")
theme
theme$print(canonical = TRUE)

To view a tibble of all demo data stories featuring mass hysteria run:

theme$annotations()

As with story IDs, there are two ways to look for themes of interest. Developmental version themes are searchable from LTO website theme search box. Demo version themes are explorable in tibble format. For example, here is one way to search for mass hysteria directly in the demo themes:

# install.packages("stringr")
library(stringr)
demo_themes_tbl <- clone_active_themes_tbl()
demo_themes_tbl %>% filter(str_detect(theme_name, "mass"))

Notice that all themes containing the substring "mass" are returned.

Exploring the Demo Collections

Each story belongs to at least one collection (i.e. a set of related stories). The Monsters Are Due on Maple Street, for instance, belongs to the two collections:

story$collections()

To initialize a Collection object for The Twilight Zone (1959) television series, of which The Monsters Are Due on Maple Street is an episode, run:

collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")

Collection info is printed to console in the same way as with stories and themes:

collection
collection$print(canonical = TRUE)

In general, developmental version collections can be explored from the LTO website story search box or through the package in the usual way:

demo_collections_tbl <- clone_active_collections_tbl()
demo_collections_tbl

Analyzing the Demo Data

The LTO thematically annotated story data can be analyzed in various ways.

Topmost Featured Themes

To view the top 10 most featured themes in the The Twilight Zone (1959) series run:

collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_featured_themes(collection)
result_tbl

To view the top 10 most featured themes in the demo data as a whole run:

result_tbl <- get_featured_themes()
result_tbl

Topmost Enriched Themes

To view the top 10 most enriched, or over-represented themes in The Twilight Zone (1959) series with all The Twilight Zone stories as background run:

test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_enriched_themes(test_collection)
result_tbl

To run the same analysis not counting minor level themes run:

result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0))
result_tbl

Topmost Similar Stories

To view the top 10 most thematically similar The Twilight Zone franchise stories to The Monsters Are Due on Maple Street run:

query_story <- Story$new(story_id = "tz1959e1x22")
result_tbl <- get_similar_stories(query_story)
result_tbl

Similar Story Clusters

Cluster The Twilight Zone franchise stories according to thematic similarity:

# install.packages("isa2")
library(isa2)
set.seed(123)
result_tbl <- get_story_clusters()
result_tbl

The command set.seed(123) is run here for the sake of reproducibility.

Explore a cluster of stories related to traveling back in time:

cluster_id <- 3
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

Explore a cluster of stories related to mass panics:

cluster_id <- 5
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

Explore a cluster of stories related to executions:

cluster_id <- 7
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

Explore a cluster of stories related to space aliens:

cluster_id <- 10
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

Explore a cluster of stories related to old people wanting to be young:

cluster_id <- 11
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

Explore a cluster of stories related to wish making:

cluster_id <- 13
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

Downloading Data

The package works with data from these LTO versions:

lto_version_statuses()

To download and cache the latest versioned LTO release run

configure_lto(version = "latest")

This can take awhile.

Load the newly configured LTO version as the active version in the R session:

set_lto(version = "latest")

To double check that it has been loaded successfully run

which_lto()

Now that the latest LTO version is loaded into the R session, its stories and themes can be analyzed in the same way as with the “demo” LTO version data as shown above.

Getting Help

If you encounter a bug, please file a minimal reproducible example on GitHub issues. For questions and other discussion, please post on the GitHub discussions board.

License

All code in this repository is published with the GPL v3 license.