Skip to content

Political Bias Recommendation and Visualization Website

Notifications You must be signed in to change notification settings

Aayush-S/NewsMatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NewsMatch

Description

NewsMatch: A Political Bias Recommendation and Visualization Website.

We have built a system that clusters news topics by topic and determines their bias. Our visualization system makes it easy for users to see the distribution of different metadata across articles as well as find interesting articles for them to read. NewsMatch also provides users with recommendations for similar and different ideological articles to give different perspectives on a given topic.

Installation

Setting up the Dataset

If you only want to use the first 10k articles of the dataset, no setup is needed!

If you want to download more of the dataset (will run slower):

  1. Install the most recent SQLite command-line tools
  2. Download the clustered_articles_100k.csv file (or whichever size file you like) from here and save the file to the CODE/app/server directory
  3. Add the CSV file to the server file by going into CODE/app/server/setup.sql and change line 14 to hold the name of your file
    • This step is already complete if you are using the clustered_articles_100k.csv file
  4. Open a terminal window and navigate to the CODE/app/server directory
  5. Run the command sqlite3 full_database.db < setup.sql
    • For the clustered_articles_100k.csv file, this command will take about 60 seconds to run
  6. Confirm that the file full_database.db is now in the CODE/app/server directory
  7. Input the new database name in line 27 of CODE/app/server/app.py.

Backend

Ensure you have Python installed. You can use a conda environment to install the required Python packages if desired.

From the home directory, run the following commands:
Note: it is recommended that this is run in a python or conda environment

cd CODE/app/server
pip install -r requirements.txt

Frontend

Install the most recent Node runtime.

From the home directory, run the following commands:

cd CODE/app/client
npm install

Execution

Backend

From the home directory, run the following commands:

cd CODE/app/server
python app.py

Now the backend is running on http://127.0.0.1:5000!

Frontend

Open a new terminal, and from the home directory, run the following commands:

cd CODE/app/client
npm install
npm start

The app is now viewable in any browser at http://localhost:3000! To use our system, first select a topic, then select an article, and finally view recommendations.

Files

Below is a description of all of the important directories and files in this repository.

Directory File Description
DOC - folder containing project report and poster
CODE - folder containing all project code
CODE/app - folder containing the project app visualization
CODE/dev - folder containing all data processing and model training code
CODE/dev/Bias_Classification - folder containing data processing and model training code for classifying bias
CODE/dev/Bias_Classification Process_Data.ipynb Google Colab notebook for pre-processing dataset
CODE/dev/Bias_Classification Sklearn_Models.ipynb Google Colab notebook for training and saving sklearn models using processed dataset
CODE/dev/Bias_Classification/sklearn - folder containing saved sklearn model checkpoints for various experiments1
CODE/dev/Bias_Classification Fine_Tune.ipynb Google Colab notebook for fine-tuning and saving DistilBERT models using processed dataset
CODE/dev/Bias_Classification/fine_tune - folder containing saved DistilBERT model checkpoints for various experiments2
CODE/dev/Keywords - folder containing code for keyword extaction and clustering
CODE/dev/Keywords Keyword_Extraction.ipynb Google Colab notebook for clearning article text and extracting keywords
CODE/dev/Keywords Keyword_Clustering.ipynb Google Colab notebook for clustering extracted keywords and assigning topic labels

1Provided model checkpoints are only for the last training epoch to reduce the size of this repository; however, each experiment folder contains a logs.json file containing training and validation metrics for all epochs
2Each experiment folder contains a {EXPERIMENT_NAME}.txt file containing a Google Drive link to the model checkpoints and logs.json file containing training and validation metrics for all epochs

About

Political Bias Recommendation and Visualization Website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages