Lyrics Genre Classification

For University of San Diego ADS-509 Summer 2023

Created by Hunter Blum, Kyle Esteban Dalope, & Nicholas Lee

The model takes in a sample of approximately 1,000 well-known songs per genre, analyzes the lyrics, and predicts the probability of each genre for a given set of lyrics.

How to Use the Application:

Navigate to the following page: http://hunterblum.pythonanywhere.com/#
The first page provides a general overview of the creation process behind the app. To being making predictions, click on Try the App Now (in the black box) or click on the App tab above, or click on this link (http://hunterblum.pythonanywhere.com/prediction.html).
Input any set of lyrics or text in the text box on the page.
Click submit. Preview the output, like the one below.
The output is the probability of each genre; and for the most likely genre, the words are highlighted such that green lyrics indicate words that supported the prediction of the most likely genre and words highlighted in red suggest otherwise.

Example Output

Repository Contents:

Jupyter Notebooks:

There are five jupyter notebooks, one for each major step in the final product development.

API_DataPull_Jupyter Notebook
- The first notebook contains the code for pulling raw lyrical data from the Genius API.
Preprocessing_Jupyter Notebook
- The second notebook contains the code for preprocessing the raw data to acquire normalized clean text, as well as tokens.
EDA_Jupyter Notebook
- The third notebook contains the code for general exploratory data analysis, where aspects such as genre distribution, song length, and other descriptive statistics about the songs pulled were explored.
Modeling_Jupyter Notebook
- The fourth notebook contains the code for establishing machine learning models in the context of a multilabel problem. This is the foundation for the underlying model that powers the final application. Ultimately, a linear SVC model was selected as the optimal model.
ModelExplanation_Jupyter Notebook
- The fifth notebook contains the code for examining how the model predicted the likelihood of each genre per lyric set via eli5 analysis.

Lyric Data

Raw_Genius_API_Data
- The raw data set contains the data pulled directly from the Genius API, without any preprocessing or other alterations.
Preprocessed_Data
- The preprocessed data contains results following the completed run of the second notebook. The data here is ready for modeling.

Flask Application Files:

Templates Folder
- This folder contains the files needed to create the foundation for the model, such as all HTML source code, the model and text vectorizer pickle objects, and the python script utilized for predictions.
Static Folder
- This folder contains the CSS files needed by Flask for aesthetics and application design.
App.py
- App.py is the file read by Flask that drives the application.

Acknowledgements:

References:

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
data		data
misc		misc
models		models
static		static
templates		templates
.gitignore		.gitignore
01_API_DataPull.ipynb		01_API_DataPull.ipynb
02_PreProcessing.ipynb		02_PreProcessing.ipynb
03_EDA.ipynb		03_EDA.ipynb
04_Modeling.ipynb		04_Modeling.ipynb
05_ModelExplanation.ipynb		05_ModelExplanation.ipynb
App_Sample_Results.png		App_Sample_Results.png
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lyrics Genre Classification

How to Use the Application:

Repository Contents:

Acknowledgements:

About

Releases

Packages

Contributors 3

Languages

hunterblum/Multi-Genre-Classification

Folders and files

Latest commit

History

Repository files navigation

Lyrics Genre Classification

How to Use the Application:

Repository Contents:

Acknowledgements:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages