Welcome to our final project for Intro to Computational Linguistics spring 2023! The code in this repo picks 7,000 songs from the dataset at random, cleans and lemmatizes the text of the lyrics, and runs 3 classifiers to predict their genres.
This script is best run in Jupyter Notebook against the dataset linked above (download from site as 'train.csv'), but could easily be tweaked to run against other data or as a standalone Python script.
Enjoy!