This repository contains scripts for analyzing music features, training recommendation models, and building a real-time recommendation system using Apache Kafka.
- load1.py: Extracts audio features using Librosa and prints them for each audio file. Also includes plotting of normalized MFCCs, Spectral Centroid, and Zero-Crossing Rate.
- mongodb1.py: Inserts audio features into MongoDB for storage and retrieval.
- connector.py: Connects Apache Spark with MongoDB to read data into Spark DataFrames.
- PHASE2.py: Trains a music recommendation model using Annoy and performs nearest neighbor search.
- producer.py: Streams music features to Kafka for real-time processing.
- consumer.py: Consumes music recommendations from Kafka and applies them.
- app.py: A web application to upload audio files and get insights.
- index.html: A simple web interface for uploading files and displaying insights.
All metadata and features for all tracks are distributed in fma_metadata.zip
(342 MiB).
The below tables can be used with pandas or any other data analysis tool.
See the [paper] or the [usage.ipynb
] notebook for a description.
tracks.csv
: per track metadata such as ID, title, artist, genres, tags and play counts, for all 106,574 tracks.genres.csv
: all 163 genres with name and parent (used to infer the genre hierarchy and top-level genres).features.csv
: common features extracted with librosa.echonest.csv
: audio features provided by Echonest (now Spotify) for a subset of 13,129 tracks.
Then, you got various sizes of MP3-encoded audio data:
fma_small.zip
: 8,000 tracks of 30s, 8 balanced genres (GTZAN-like) (7.2 GiB)fma_medium.zip
: 25,000 tracks of 30s, 16 unbalanced genres (22 GiB)fma_large.zip
: 106,574 tracks of 30s, 161 unbalanced genres (93 GiB)fma_full.zip
: 106,574 untrimmed tracks, 161 unbalanced genres (879 GiB)
- Clone the Repository: Clone or download the repository to your local machine.
git clone <https://github.com/tashi-2004/FMA-A-Dataset-For-Music-Analysis>
- Set Up MongoDB: Ensure MongoDB is installed and running on your system. Update MongoDB connection strings in the relevant scripts.
- Set Up Kafka: Install and run Apache Kafka on your system. Update the Kafka broker address in
producer.py
andconsumer.py
. - Run Scripts: Execute the scripts in the following order:
- Extract and Visualize Audio Features: Run
load1.py
to extract and visualize audio features from your audio files. - Store Audio Features in MongoDB: Run
mongodb1.py
to store the extracted audio features in MongoDB. - Data Analysis with Spark: Run
connector.py
to connect Spark with MongoDB and perform data analysis using Spark DataFrames. - Train Recommendation Models: Run
PHASE2.py
to train music recommendation models using Annoy and perform nearest neighbor searches. - Stream Music Features to Kafka: Run
producer.py
to stream music features to Kafka for real-time processing. - Consume Music Recommendations from Kafka: Run
consumer.py
to consume music recommendations from Kafka and apply them. - Web Interface for Audio Files: Use
app.py
andindex.html
to upload audio files via a web interface and get insights.
- Extract and Visualize Audio Features: Run
- Extract and Visualize Audio Features: Run
load1.py
to extract and visualize audio features from your audio files. - Store Audio Features in MongoDB: Run
mongodb1.py
to store the extracted audio features in MongoDB. - Data Analysis with Spark: Run
connector.py
to connect Spark with MongoDB and perform data analysis using Spark DataFrames. - Train Recommendation Models: Run
PHASE2.py
to train music recommendation models using Annoy and perform nearest neighbor searches. - Stream Music Features to Kafka: Run
producer.py
to stream music features to Kafka for real-time processing. - Consume Music Recommendations from Kafka: Run
consumer.py
to consume music recommendations from Kafka and apply them. - Web Interface for Audio Files: Use
app.py
andindex.html
to upload audio files via a web interface and get insights.
You can customize the scripts according to your requirements, such as adjusting feature extraction parameters, changing MongoDB or Kafka configurations, or modifying recommendation model algorithms.
- Tashfeen Abbasi
- Laiba Mazhar
- Rafia Khan
Feel free to contribute to this project by submitting issues or pull requests. Enjoy analyzing and recommending music with this comprehensive toolkit!