This repository provides datasets which are recommended for beginners to test their concepts and coding skills to provide insights from the datasets.
The contributors are expected to drop issues(their hypothesis) and then get themselves assigned the same.
Three Datasets will be added every week and last week's STAR contributor will be mentioned in the readme.
- 1. Titanic - Machine Learning from Disaster
-
The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.
While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.
In this challenge, we ask you to test various hypothesis' that can answers the questions: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
Read more about the dataset on kaggle...
Contributors are expected to do analysis based on certain assumed hypothesis and maintain the notebook format. - 2. Airplane Crashes and Fatalities
-
This dataset showcases Boeing 707 accidents that have occurred since 1948. The data includes information on the date, time, location, operator, flight number, route, type of aircraft, registration number, cn/In number of persons on board, fatalities, ground fatalities, and a summary of the accident.
Read more about the dataset on kaggle...
Contributors are expected to do analysis based on certain assumed hypothesis and maintain the notebook format. - 3. Student Performance Data Set
-
This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por).
Read more about the dataset on kaggle...
Contributors are expected to do analysis based on certain assumed hypothesis and maintain the notebook format.
Maintainer: KUSHAGRA SRIVASTAVA
Visit Us: mycin.in
In case of queries, contact us at [email protected].