This project is part of the UBC MDS-Vancouver 2020-2021 DSCI-525 Web and Cloud Computing course curriculum.
In this project, we will sequentially take on the roles of 1. Data Engineer, 2. Infrastructure, 3. Data Scientist, and 4. DevOps to perform prediction tasks on the large daily rainfall in Australia
dataset (5.7 GB) and to achieve the following 4 objectives:
- Get the data from the web using API, process it, and convert it to an efficient file format;
- Move the data to cloud, setup infrastructure in cloud and perform a Machine Learning model;
- Setup distributed infrastructure (Spark) in cloud and run the same Machine Learning model;
- Deploy the Machine Learning model in cloud so that other consumers can use it.
The purpose is to get exposure on working with a large dataset and to build and deploy ensemble machine learning models in the cloud.
Below is a list of current contributors:
- Yuting (Rachel) Xu
- Saule Atymtayeva
- Mai Le
- Doan Khanh Vu Tran
Current collaboration strategy: