The Disaster Response Pipelines Project is part of Udacity's Data Science Nano Degree Program.
- Installation
- Project Motivation
- File Descriptions
- Instructions
- Licensing, Authors, and Acknowledgements
To be able to run and view this project. It's recommended to have the latest versions of the followings:
In this project, I appled data engineering techineqws analyzed disaster data from Figure Eight to build a model for an API that classifies disaster messages.
The project consist from three main folders:
- data folder that contains:
- disaster_messages.csv: contains the dataset that includes the messages.
- disaster_categories.csv: contains the dataset that includes messages categories.
- process_data.py: ETL pipeline script that reads the datasets, merges the two datasets and cleans the data, then saves the dataset into a database file.
- DisasterResponse.db: the outcome of the ETL pipeline (SQLite database containing a table that merges the messages and categories data).
- model folder that contains:
- train_classifier.py: machine learning pipeline script that loads data from the SQLite database, splits the dataset into training and test sets, process text and train test the classifier the export the trained classifier into pkl file.
- classifier.pkl: the outcome of the machine learning pipeline ( the trained classifer).
- app folder that contains:
- run.py: script to run the Flask web app
- templates folder: contains html files of the web app.
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/
Credit given to Udacity courses for code ideas and motivation , and to figure 8 for the data.
Author: NYRoomi