English Speaking Country Text Classification Model

Data

Data used to train the model was Twitter_by_Country data provided by professor Jonathan Dunn. It was too large to include in this repo. Tripadvisor review data is from Kaggle and is included in the repo in addition to the newly country-labeled datasets.

Usage

The process of creating and using the model is split up into two files Project1_createModel.ipynb and Project1_useModel.ipynb. Each Jupyter notebook walks through the process of training a model and then using the pretrained model on new data. The pretrained models have been python pickled and are in unigram_tweet_classification_model.p, bigram_tweet_classification_model.p, and trigram_tweet_classification_model.p. To use them, see the code in the Project1_useModel.ipynb notebook.

Results

Results from testing data on the N-gram models are provided in unigram_tweet_classification_results.txt, bigram_tweet_classification_results.txt, and trigram_tweet_classification_results.txt.

Labeling

Tripadvisor data with country labeles from each model are included in the repo.

Graphs

All graphs were created using the code in Project1_createGraphs.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English Speaking Country Text Classification Model

Data

Usage

Results

Labeling

Graphs

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Project1_createGraphs.ipynb		Project1_createGraphs.ipynb
Project1_createModel.ipynb		Project1_createModel.ipynb
Project1_useModel.ipynb		Project1_useModel.ipynb
README.md		README.md
bigram_tripadvisor.csv		bigram_tripadvisor.csv
bigram_tweet_classification_model.p		bigram_tweet_classification_model.p
bigram_tweet_classification_results.txt		bigram_tweet_classification_results.txt
trigram_tripadvisor.csv		trigram_tripadvisor.csv
trigram_tweet_classification_model.p		trigram_tweet_classification_model.p
trigram_tweet_classification_results.txt		trigram_tweet_classification_results.txt
tripadvisor_hotel_reviews.csv		tripadvisor_hotel_reviews.csv
unigram_tripadvisor.csv		unigram_tripadvisor.csv
unigram_tweet_classification_model.p		unigram_tweet_classification_model.p
unigram_tweet_classification_results.txt		unigram_tweet_classification_results.txt

JTSIV1/English-Speaking-Country-Text-Classification-Model

Folders and files

Latest commit

History

Repository files navigation

English Speaking Country Text Classification Model

Data

Usage

Results

Labeling

Graphs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages