For University of San Diego ADS-509 Summer 2023
Created by Hunter Blum, Kyle Esteban Dalope, & Nicholas Lee
The model takes in a sample of approximately 1,000 well-known songs per genre, analyzes the lyrics, and predicts the probability of each genre for a given set of lyrics.
- Navigate to the following page: http://hunterblum.pythonanywhere.com/#
- The first page provides a general overview of the creation process behind the app. To being making predictions, click on Try the App Now (in the black box) or click on the App tab above, or click on this link (http://hunterblum.pythonanywhere.com/prediction.html).
- Input any set of lyrics or text in the text box on the page.
- Click submit. Preview the output, like the one below.
- The output is the probability of each genre; and for the most likely genre, the words are highlighted such that green lyrics indicate words that supported the prediction of the most likely genre and words highlighted in red suggest otherwise.
Example Output
Jupyter Notebooks:
There are five jupyter notebooks, one for each major step in the final product development.
- API_DataPull_Jupyter Notebook
- The first notebook contains the code for pulling raw lyrical data from the Genius API.
- Preprocessing_Jupyter Notebook
- The second notebook contains the code for preprocessing the raw data to acquire normalized clean text, as well as tokens.
- EDA_Jupyter Notebook
- The third notebook contains the code for general exploratory data analysis, where aspects such as genre distribution, song length, and other descriptive statistics about the songs pulled were explored.
- Modeling_Jupyter Notebook
- The fourth notebook contains the code for establishing machine learning models in the context of a multilabel problem. This is the foundation for the underlying model that powers the final application. Ultimately, a linear SVC model was selected as the optimal model.
- ModelExplanation_Jupyter Notebook
- The fifth notebook contains the code for examining how the model predicted the likelihood of each genre per lyric set via eli5 analysis.
Lyric Data
-
- The raw data set contains the data pulled directly from the Genius API, without any preprocessing or other alterations.
-
- The preprocessed data contains results following the completed run of the second notebook. The data here is ready for modeling.
Flask Application Files:
- Templates Folder
- This folder contains the files needed to create the foundation for the model, such as all HTML source code, the model and text vectorizer pickle objects, and the python script utilized for predictions.
- Static Folder
- This folder contains the CSS files needed by Flask for aesthetics and application design.
- App.py
- App.py is the file read by Flask that drives the application.
References: