This repository is my code and output for my preliminary analysis of my final project for Computational Psychology of Language at Dartmouth College.
getngrams.py
: This is the only file not written by myself. It is directly taken from https://github.com/econpy/google-ngrams. It generates a CSV and a plot for each binomial expression from Google N-grams.
expressions.txt
: This is a list of common binomial expressions I thought of as well as some taken from https://www.eslbuzz.com/40-common-binomial-expressions-in-english/
getPlots.sh
: The first loop in this script makes a call to getngrams.py
for each binomial expression in expressions.txt
. The second loop passes each generated CSV into calculateRatios.py
.
calculateRatios.py
: This file takes in a CSV file name that was generated from getngrams.py
and calculates the difference in frequency for each word order of a binomial expression and plots them.
results
: This directory contains the CSV and frequency plots for each binomial expression in expressions.txt
that were generated by getngrams.py
.
ratios
: This directory contains the plot of the ratios between the word order frequencies of a binomial expression as well as a .txt file containing the precise values.
To run the code in this repository, put all your binomial expressions of interest in expressions.txt
. Make sure the expressions are in alphabetical order. Then, in the terminal, use the following command to make all the CSVs and plots of both frequencies and ratios.
bash getPlots.sh
If you are running Python 2.x, lines 22 and 34 of getPlots.sh
will have to be modified. Instead of python3
, simply write python
.