This was an Introductory Session conducted on Data Science by Tanmay Goyal (github: tanmaygoyal258), a 3rd year Btech Student, majoring in Artificial Intelligence,on the 4th of February, 2023.
The session was aimed at anyone who wished to get a glimpse into the world of Data, what are the pre-requisites (if any) for this world, and the similarities and differences between the world of Data and the world of Software. Further, it aimed at introducing NumPy, Pandas and Matplotlib, some of the most commonly used Python Libraries, from a Data Analysis point of view.
The session consisted of the following:
- An introduction to Data, Big Data and why Data is here to stay
- Data Science, and two lesser talked about roles: Data Analysis and Data Engineering. What are the fundamental differences between the roles, and what are the similarities between the roles? Are the roles mutually exclusive, or are the roles overlapping?
- What are the skillsets required for either of the roles, and can these skillsets be distinguished primarirly based on the roles, or are the skills overlapping, like the roles?
- Apart from technical and mathematical skills, what other skills (especially soft skills) are required?
- Introducing NumPy as a tool to perform extensive calculations and for easy handling of Numerical Data
- Introducing Pandas as a tool to work with extensive amounts of data in Tabular form, and drawing comparision to queries that may be written in SQL, or Pivot Tables used in Spreadsheet Softwares
- Introducing Matplotlib as a visualisation tool
- Using the Libraries above to clean, process, visualise and draw basic inferences and conclusions from the data.
- Worked on an example dataset to draw basic inferences. (Reference: https://towardsdatascience.com/data-analysis-for-beginners-90a8d53fa2f9)