Spring 2016. This project will visualize the story shapes of a variety of 19th century novels about 300 800+ science fiction short stories. Ideally, I'd like to build something that's extendable to a wide variety of texts. This is the final project for CS171 Spring 2016.
Final project website is here. Screencast is here.
Code | What it does | Attribution (if not me) |
---|---|---|
html/js/main.js |
Calls in all visualizations, runs everything. | -- |
html/js/scatterplot.js |
A scatterplot visualization (not used in final website). | -- |
html/js/sweetalert.min.js |
Pretty alerts. | Tristan Edwards (Sweet Alerts library) |
html/js/textChart.js |
The bubble heat map. | -- |
html/js/timeline.js |
The average vocab over time line chart. |
-- |
html/js/wordcloud.js |
The word cloud on the right-hand side. | -- |
data processing/cs171analysis.py |
Post-processing of Strange Horizons data. | -- |
data processing/preparingData.py |
The initial scrape of Gutenberg texts. | -- |
data processing/strangehorizons.py |
The scrape of Strange Horizons. | -- |
- Kurt Vonnegut's lecture: The shape of stories
- Kurt Vonnegut's actual story shapes
- indico: Exploring the shapes of stories using Python and sentiment APIs
- OpenVisConf 2015 video visualization
- Ben Fry's Origin of Species
- Popular Science archive explorer
- Everything in the Bocoup text viz exploration slides
- CS171 official website - Project
- CS171 official website - Schedule
- Book covers:
- d3 Scalability – Virtual Scrolling for Large Visualizations
- Tony Hschu - Scroll linked animations
- Jim Vallandingham - Scroller.js
- StackOverflow - Fisheye distortion on simple scatterplot
- bl.ocks - Zoomable plot
- Automated Readability Index (ARI)
- Smoothing out lines in d3.js
Getgit
set up.- Keep taking pictures/screenshots for Process Book.
Find CC-licensed book covers.New design idea: User can search for something in the text.New design idea: Tooltip not on hover - but rather, once top word or user word appears, reveal that line to the right of the charts.New design idea: The long scroll is boring:*Collapse paras into chapter objects? (Using average sentence length per para?)font-awesome
?- Process book: Screencast.
Select texts for download.Basic input oftxt
, outputjson
.Top-level object for each text, capturing title, author, top words, etc.Paragraph-level objects capturing para text, length (sentences vs. characters?), sentences, whether they have any top words in there.Convert alllen(x)
into word tokenizers counting words, instead of characters.Fix above -- specifically, fixcleanWordTokenize()
.Design 2 (SH scrape): Fix 2003 story.Re-run to getword-count
objects in year object.General questions:Should I separate out each book into its own json? What would be most efficient to pull from the server?How else can I reduce the size of the json?
I need full date-time objects for each story for the line chart.Make unique identifier for all stories.Used index - but why is it 100+ stories over?! (2003 duplicates?)Sentence length variance.Didmean
,std
.Add storyaward
(1/0
) based on this.
Add templates from best HWs and Labs.Set up some basic CSS and website structure.Where to put#main-viz
and#link-viz
?Data summary box - get that started.- Top words
buttonsword cloud. Getmain.js
,textChart.js
andlinkChart.js
set up.User choice covers: CSS, functions.How to replicate Sublime Text's brushed sidebar?How to determine height of what the viewer is seeing? Highest data point and lowest data point?
Filter linkData, based on book selected. (Essentially, just remove allsentences
objects fromtext
.)Jane AustenmaxParaLength
seems way too high.Apply(Did this manually.)d3.stack.layout()
data transformation toparaArray
intextChart.js
.Resize isn't re-loadingmainChart
in the correct way.Figure outheight
onmainChart
.`CentermainChart
.Heat map of complexity (usingvocab
).Hyperlink each square.- Smoother transition on
#sh-age
. Tooltip each story: title, author (URL?).- Special outline for my story? (vanity button?)
- Why is
words
appearing in the story objects? #scatterplot
:axes, lines, labels (use Bootstrap.code
class).#linechart
: get it started.Linking the charts together...! Highlights, brushed.mouseout
function.Highlighting is not working on 2003 column (textChart.js
). 2003 is repeating.Fix ordering of years intextChart.js
.What's a better color scheme?Bookmark interesting stories (from the popup).- Tooltip: Random image,
"science fiction " + d.top_word[0]
Google CC-license image search!? Brushing inscatterplot.js
: keep scale the same, just filter data points.scatterplot.js
:.data(displayData, function(d) { return d.id; })
... points by index.scatterplot.js
: Let the user select the scale (zoom).- User selects y-axis of scatterplot (update
y-label
inindex.html
). Beef uptooltip
.- Custom refresh for each viz.
Show bookmarked stories on "See bookmarks" click.Get year-week for y-axis ontextChart.js
.De-mean the standard deviation.Excellent option for analysis: readability.py!- Use genderize.io - $9/mo. for 100k names
Clean up puncts inwordcount
andvocab
(you can use a list comprehension:[w for w in words if w not in r'[\.\?!]']
)tf-idf
: how much does a word occur in a story, divided by the number of stories with that word.
less text up top and just get to the vizhave it take up the whole screen rather than two columns of 3 different views (like each view one after the other for now)- include some visual keys about what the size means of each dot as well as what the colors mean
- include trends across texts? Like what words are top throughout? What words are trending over time?
- What do you mean by unique word anyhow?
have a rollover on each dot of just the most popular unique word and then the other info comes when you click it?- when you say a word is unique, do you mean it is unique to that story? Or is is that the word is occurring once in that particular story? That is, the classic interpretation of Oelke & Keim's hapax legomena?
- It could be interesting to also see the words that are unique _in the corpus_ to each story.
- It seems like there are yellow lines in the background of the vocab over time - they are very faint, not sure what they are for.
Though, for some reason, it took me a minute to realize I could click the title in the popup. It looks a bit like a title.- Finally, have you considered authorship analysis? Do many authors write multiple stories? Are their stories similar or different from each other?
- What about a comparison to some canonical speculative fiction novels as a reference point?
- put it back, but align (CRAP)
- meaningful story in the title; ask a question in the title (subtitle is more technical)
- User-adjustable opacity on
scatterplot.js
? What do you think of adjusting the opacity in the scatter?Bin scatter plot?Is there any useful category that you can use to color the points in the scatter?Have you thought about animating the scatterplot transitions?Link to Github doesn't work.- Tooltip word cloud, similar to dem/rep wordle (see Bocoup slides).
- Reveal on interaction? (see Bocoup Stereotropes)
- Reduce data points shown (gray circles with null fill, excepting award winners, similar to this).
Remove bookmarks functionality. What is bookmarking for? hidden functionality!- Add author filter functionality.
Remove scatterplot.- Add sentence length variance as user option to timeline.
Top words tooltip: Filter "said"/"say" and names.- Gender of protagonist?
- Change size encoding to something with better variance: Readability? Sentence variance?
- More obvious/labelled functionality.
- Bring it all up into one frame!
Remove "says"/"said" asstopword
.- POS tag
top_words
, find proper nouns and remove? Or tag/highlight? REVERT BACK TO0f35193
FOR.js
FILES!- Data summary/table in each tooltip.
- Add award blocks.
- All
timeline.js
circles arehidden
, until found in one of the search functions.