The project analyses thousands of real messages sent during natural disasters and classifies them into 36 categories of required assistance. Such messages are collected either via social media or directly to disaster response organizations. The results can support different organizations to detect the most relevant messages requiring assistance.
The project consists of three sections:
- ETL pipeline
- Machine learning pipeline
- Web application with Flask
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Go to
app
directory:cd app
-
Run your web app:
python run.py
The requirements.txt file contains the libraries needed for the project.
You can create a virtual environment for the project with:
git clone <repo>
cd <repo>
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- app/run.py: The main Flask file for the web application to interact with disaster messages and categories.
- data/process_data.py: ETL pipeline that processes message and category data from CSV files and load them into a SQLite database.
- models/train_classifier.py: A Machine Learning pipeline that reads from the database to create and save a multi-output supervised learning model that classifies disaster messages into 36 categories.
This project is part of Udacity's Data Scientist program.
Dataset credit: https://appen.com/ (formerly Figure 8)