Final.video.mp4
Before installation, please ensure you have the following tools installed:
- Git - Learn Git step-by-step by following the instructions provided here.
- Anaconda
- Jupyter Package
- Fork the project: Fork the
sanjay-kv/Stackoverflow-Analysis
repository. Follow these instructions on how to fork a repository. - Clone the project:
git clone [email protected]:your-username/Stackoverflow-Analysis.git
- Download the original data from the drive link
- Open Jupyter Notebook and place the file in the project folder. Make sure you're selecting the correct path.
We welcome contributions from all levels of experience. If you think the community would benefit from being walked through the steps you're going through, please share! ❤️
- To perform Analysis on 3 years Stackoverflow Dataset and get insights.
- To perform Data Analysis and answer the below questions.
- Impact of higher education on salary of the surveyed developers.
- Impact of education/experience/responsibilities on gender inequalities.
- Impact on participation rate due to different ethnicity.
- To find whether there is any difference between men and women's income.
- Impact on the increase in popularity of a language in the current year due to developer’s interest in the previous year.
Stack Overflow is a professional community for developers, conducting a survey annually. Analyzing the dataset professionally using modern tools can enable us to answer real-world questions effectively. The dataset covers 275 questions in total.
- Perform Analysis on the last 3 years' Stack Overflow Dataset to extract insights.
- Analyze the impact of higher education, experience, and responsibilities on salary and gender inequalities.
- Investigate participation rates based on ethnicity and differences in income between men and women.
- Explore the popularity of programming languages and predict their growth based on survey responses.
The dataset comes from the annual Stack Overflow developer survey, covering responses from developers in 180 countries. The data are available in CSV format, ranging from 40 to 150 MB, with responses from 1.5 Lakh survey participants.
The data is in a CSV file format with 252,199 observations and 62 variables.
Data wrangling tasks include handling null values and converting data for analysis. Techniques such as ML algorithms and data visualization will be employed.
- Contributions are greatly appreciated. Check out our Contribution Guidelines
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to all contributors for helping this project grow! 🍻
Don't forget to leave a star ⭐️ for this project!
Crafted with ♥ by @sanjay-kv.