-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
bechdel-test.Rmd
195 lines (157 loc) · 6.94 KB
/
bechdel-test.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: "Bechdel Test"
description: |
Graphs using the #TidyTuesday data set for week 11 of 2021 (9/3/2021):
"Bechdel Test"
author:
- name: Ronan Harrington
url: https://github.com/rnnh/
date: 03-21-2021
repository_url: https://github.com/rnnh/TidyTuesday/
preview: bechdel-test_files/figure-html5/figure1-1.gif
output:
distill::distill_article:
self_contained: false
toc: true
---
## Setup
Loading the `R` libraries and [data set](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-03-09/readme.md).
```{r, code_folding=TRUE, include=TRUE}
# Loading libraries
library(gganimate)
library(tidytuesdayR)
library(tidyverse)
library(tidytext)
library(forcats)
# Loading the Bechdel Test data set
tt <- tt_load("2021-03-09")
```
## Illustrating the change in Bechdel Test results over time
The first graph we want to create is an animation showing the change in Bechdel
Test results over time. This animation shows the percentage of films each year
(from 1940 to 2020) that meet different criteria of the Bechdel Test.
```{r figure1, fig.cap="The percentage of films released each year with various Bechdel test results", include=TRUE}
# Changing "rating" from a character to a factor variable
tt$raw_bechdel$rating <- as.factor(tt$raw_bechdel$rating)
# Levels of the "rating" variable
levels(tt$raw_bechdel$rating)
# Renaming the levels of the "rating" variable
levels(tt$raw_bechdel$rating) <- c("Unscored", "It has two women...",
"...who talk to each other...",
"...about something besides a man")
# Counting the number of films with each Bechdel test rating per year
ratings_by_year <- tt$raw_bechdel %>%
group_by(year) %>%
count(year, rating)
# Counting the total number of films in each year
film_count_by_year <- ratings_by_year %>%
group_by(year) %>%
summarise(total = sum(n))
# Adding the annual film count to the Bechdel test rating count per year
ratings_by_year <- left_join(ratings_by_year, film_count_by_year)
rmarkdown::paged_table(ratings_by_year)
# Changing "year" to an integer variable
ratings_by_year$year <- as.integer(ratings_by_year$year)
# Creating an animation summarising the Bechdel test results from 1940 to 2020
p <- ratings_by_year %>%
ggplot(aes(x = fct_rev(rating), y = (n/total), group = rating, fill = rating)) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
coord_flip() +
theme_bw() +
theme(legend.position = "none") +
transition_time(year, range = c(1940L, 2020L)) +
labs(x = "Bechdel Test result", y = "Percentage of films",
subtitle = "Year: {frame_time}",
title = "Bechdel Test results over time") +
ease_aes("cubic-in-out")
# Rendering the animation as a .gif
animate(p, nframes = 400, fps = 20, renderer = magick_renderer())
```
## Plotting the directors most likely to pass/fail the Bechdel Test
In this section, a plot is produced that shows the directors most likely to
pass/fail the Bechdel Test. This is done by...
- taking all the directors in `tt$movies` as a corpus
- splitting that corpus into two documents: Bechdel Test passes and failures
- calculating [tf-idf](https://www.tidytextmining.com/tfidf.html) to find
significant directors in each document
```{r figure2, fig.cap="Directors most likely to produce films that pass/fail the Bechdel test", include=TRUE}
# Selecting directors and their Bechdel pass/fail results
results_by_director <- tt$movies %>%
select(director, binary) %>%
filter(!is.na(director)) %>%
separate_rows(director, sep = ", ")
# Changing "binary" to a factor variable
results_by_director$binary <- as.factor(results_by_director$binary)
# Renaming the levels of the "binary" factor
levels(results_by_director$binary) <- c("Bechdel Test Failed",
"Bechdel Test Passed")
# Counting the number of times each director passes/fails the Bechdel test
results_by_director <- results_by_director %>%
count(binary, director, sort = TRUE)
# Counting the number of times the Bechdel test has been passed/failed
total_results <- results_by_director %>%
group_by(binary) %>%
summarise(total = sum(n))
# Adding the total Bechdel result counts to "results_by_director"
results_by_director <- left_join(results_by_director, total_results)
# Adding tf-idf values to "results_by_director"
results_by_director <- results_by_director %>%
bind_tf_idf(director, binary, n)
rmarkdown::paged_table(results_by_director)
# Plotting the directors with the highest tf-idf values based on Bechdel
# test results
results_by_director %>%
group_by(binary) %>%
slice_max(tf_idf, n = 10) %>%
ungroup() %>%
ggplot(aes(tf_idf, fct_reorder(director, tf-idf), fill = binary)) +
geom_col(show.legend = FALSE) +
facet_wrap(~binary, ncol = 2, scales = "free") +
labs(x = "term frequency–inverse document frequency (tf-idf)", y = NULL) +
labs(title = "Directors most likely to Pass/Fail the Bechdel Test") +
theme_bw()
```
## Plotting the writers most likely to pass/fail the Bechdel Test
In this section, the same process as the previous section is applied to the
writers in `tt$movies` to find who is most likely to write for a film that
passes/fails the Bechdel Test.
```{r figure3, fig.cap="Writers most likely to produce films that pass/fail the Bechdel test", code_folding = T, include=TRUE}
# Selecting writers and their Bechdel pass/fail results
results_by_writer <- tt$movies %>%
select(writer, binary) %>%
filter(!is.na(writer)) %>%
filter(!writer == "N/A") %>%
mutate(writer = str_remove(writer, " \\(.*")) %>%
separate_rows(writer, sep = ", ")
# Changing "binary" to a factor variable
results_by_writer$binary <- as.factor(results_by_writer$binary)
# Renaming the levels of the "binary" factor
levels(results_by_writer$binary) <- c("Bechdel Test Failed",
"Bechdel Test Passed")
# Counting the number of times each writer passes/fails the Bechdel test
results_by_writer <- results_by_writer %>%
count(binary, writer, sort = TRUE)
# Counting the number of times the Bechdel test has been passed/failed
total_results <- results_by_writer %>%
group_by(binary) %>%
summarise(total = sum(n))
# Adding the total Bechdel result counts to "results_by_writer"
results_by_writer <- left_join(results_by_writer, total_results)
# Adding tf-idf values to "results_by_director"
results_by_writer <- results_by_writer %>%
bind_tf_idf(writer, binary, n)
rmarkdown::paged_table(results_by_writer)
# Plotting the directors with the highest tf-idf values based on Bechdel
# test results
results_by_writer %>%
group_by(binary) %>%
slice_max(tf_idf, n = 10) %>%
ungroup() %>%
ggplot(aes(tf_idf, fct_reorder(writer, tf-idf), fill = binary)) +
geom_col(show.legend = FALSE) +
facet_wrap(~binary, ncol = 2, scales = "free") +
labs(x = "term frequency–inverse document frequency (tf-idf)", y = NULL) +
labs(title = "Writers most likely to Pass/Fail the Bechdel Test") +
theme_bw()
```