Skip to content

Saksham4796/se_problem_statement

Repository files navigation

Software Engineering Problem

This repository holds a problem statement prominent in the software industry. The problem statement has been partly generated using a generative model called chatGPT. The software application satisfying the requirements of the given problem statement has also been developed using chatGPT.

The mysql directory is used to build the docker container which contains the Database. The sales_data_sample.csv file contains the sample sales data for the problem statement and is uploaded to the MySQL database. The Dockerfile is used to build the docker image which contains the Sales data hosted on MySQL database. The scripts folder contains .sql files which can build the database on the docker container.

The python folder is used to build the docker container which performs data processing tasks on the dataset. The data_processing_spark.py file performs data processing operation using Pyspark(so that data is processing using distributed computing fashion) on the dataset and thus stores total sale for each commodity for each month of the year in different csv files present in the total_sales.csv folder. This is because Spark writes the output to multiple files, with each file containing a part of the output. All these output files are merged to a single total_sales_merged.csv file.

The docker-compose.yml file builds both the docker containers and also make a connection between them.

This distributed application is executed on virtual machine intances present on google cloud platform. VM Instance repo presents the code for creating instances and runnning the application on those instance in a distributed fashion.

The docker_command.md file contains all the relevant docker commands used in developing this application.

The chatGPT transcipt for solving this problem is available here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published