An end-end ETL pipeline utilising both an NLP and a Machine Learning Pipeline systems to create a web application that on typing a form of disaster-related message, categorizes it into categories for various disaster relief teams
Applying concepts and techniques of Data Engeering (ETL Pipelines, especially Machine Learning and NLP Pipelies) on a disaster messages dataset by Figure Eight to build a model for an API that classifies disaster messages.
-
Setting up the database and model
- Run the ETL pipeline that cleans (process_data.py) the raw data (disaster_messages.csv) and stores it in a database(DisasterResponse.db):
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- Run ML pipeline that trains the classifier (train_classifier.py) and saves it (classifier.pkl):
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- The pre-trained model can be downloaded from here
- Run the ETL pipeline that cleans (process_data.py) the raw data (disaster_messages.csv) and stores it in a database(DisasterResponse.db):
-
Running the web app:
python run.py
-
Goto the link: http://0.0.0.0:3001/
Below are some screenshots of how the web application looks:
Type a sample distress message: "We have a lot of problem at Delma 75 Avenue Albert Jode, those people need water and food"
If you do find this repository useful, why not give a star and even let me know about it!
Feel free to express issues and feedback as well, cheers!