Initial commit all project files

Pacode74 · Feb 3, 2024 · ca92e1e · ca92e1e
commit ca92e1e
Show file tree

Hide file tree

Showing 9 changed files with 45,046 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,123 @@
+# These are some examples of commonly ignored file patterns.
+# You should customize this list as applicable to your project.
+# Learn more about .gitignore:
+#     https://www.atlassian.com/git/tutorials/saving-changes/gitignore
+
+# Node artifact files
+node_modules/
+dist/
+
+# Compiled Java class files
+*.class
+
+# Compiled Python bytecode
+*.py[cod]
+
+# Log files
+*.log
+
+# Package files
+*.jar
+
+# Maven
+target/
+dist/
+
+# JetBrains IDE
+.idea/
+
+# VSC Settings:
+.vscode/
+
+# Unit test reports
+TEST*.xml
+
+# Generated by MacOS
+.DS_Store
+
+# Generated by Windows
+Thumbs.db
+
+# Applications
+*.app
+*.exe
+*.war
+
+# Large media files
+*.mp4
+*.tiff
+*.avi
+*.flv
+*.mov
+*.wmv
+
+ project solution
+#*.ipynb
+# *.xml
+
+# Ignore all .ipynb files
+#*.ipynb
+#*.csv
+*.png
+*.pdf
+*.xls
+
+
+#files:
+Playground.ipynb
+Comprehensive_Project_Challenge.ipynb
+
+# Recovery files Jupyter Notebook
+.ipynb_checkpoints/
+.ipynb_checkpoints
+
+# Unit test / covarage
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+
+# Python
+__pycache__/
+
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+
+# Exclude following folders:
+#2023 Python Data analysis & Visualization Masterclass/
+#Support_files/
+#A Gentle Introduction to Pandas Data Analysis (on Kaggle)/
+
+
+allure_reports/
+#images/
+my_jupyter_notebooks/
+/my_jupyter_notebooks/*/
+# api/coronavstech/coronavstech/settings.py
+#api/coronavstech/db.sqlite3
+#api/coronavstech/db.sqlite3_clone
+
+# Exclude pipenv virtual environment files:
+.venv/
+
+# Exclude environment variable file:
+.env
+
+# Exclude pyenv file:
+.python-version
+
diff --git a/Importance_of_resetting_setting_the_index.ipynb b/Importance_of_resetting_setting_the_index.ipynb
@@ -0,0 +1,69 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "99ece5fe-bf00-42e6-b616-93ea1fa15cc1",
+   "metadata": {},
+   "source": [
+    "Resetting the index in a DataFrame is often necessary or beneficial for several reasons, especially when you're performing operations like slicing, filtering, or concatenating DataFrames. In the context of your operations on the `summer` DataFrame, there are a few reasons why resetting the index is important:\n",
+    "\n",
+    "1. **Maintaining Data Integrity:** When you filter or slice a DataFrame, as you did to create `singles` and `team`, the original index is retained. This means that if you try to concatenate these DataFrames back together or perform other operations, the index might not be unique or sequential. Resetting the index ensures that each row gets a unique, sequential index, which is crucial for data integrity and avoiding indexing errors.\n",
+    "\n",
+    "2. **Reference to Original Data:** By resetting the index and then creating a new column from the original index, you maintain a reference to the original DataFrame. This can be particularly useful if you need to trace back the modifications or compare the new DataFrame with the original one.\n",
+    "\n",
+    "3. **Ease of Concatenation:** When concatenating the `singles` and `team` DataFrames back into `summer_new`, having a reset, sequential index in both ensures that there won't be any index overlap or conflicts. It also ensures that the concatenated DataFrame has a clean, orderly index.\n",
+    "\n",
+    "4. **Avoiding Indexing Issues:** If the index is not reset, it could lead to misleading results or errors in subsequent operations, especially if the index has some inherent meaning or order in your dataset.\n",
+    "\n",
+    "In summary, resetting the index is a way to standardize and clean your DataFrame's indexing system, particularly after slicing or filtering operations, ensuring that subsequent operations on the data are based on a clear, unambiguous index. This is a common practice in data manipulation tasks to maintain data integrity and avoid potential issues related to index misalignment."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6fec592-acc2-4171-98a2-355e53246fc6",
+   "metadata": {},
+   "source": [
+    "Resetting the index before concatenating DataFrames and then setting it back after the concatenation is not a universal standard practice, but it can be useful or necessary in specific contexts. Let's break down the potential reasons and implications for doing this in your instructor's example:\n",
+    "\n",
+    "### 1. Why Reset Index?\n",
+    "\n",
+    "- **Unique Identifier**: Before concatenation, the `summer` DataFrame's index is reset. This action turns the original index into a column named `index`. This can be helpful if the original index has a meaningful order or contains unique identifiers for each row.\n",
+    "- **Avoiding Index Conflicts**: When concatenating two DataFrames (`singles` and `team`), if they have overlapping index values, resetting the index can prevent potential conflicts. Each row from both DataFrames will receive a unique index in the concatenated DataFrame, avoiding duplicate index values.\n",
+    "\n",
+    "### 2. Why Set Index Again?\n",
+    "\n",
+    "- **Preserving Original Order**: After concatenation, setting the `index` column back as the index of `summer_new` ensures that the original order of the data is preserved. This can be important if the order of rows carries meaning or if later operations rely on the data being in its original order.\n",
+    "- **Consistency**: If the original index is important for later operations or for consistency with other DataFrames or datasets, it's necessary to revert back to it.\n",
+    "\n",
+    "### 3. Specific to This Example\n",
+    "\n",
+    "In your instructor's code, the importance of resetting and then setting back the index seems to be more about maintaining the original structure and order of the dataset. It could be a way to track the original position of each row from the `summer` DataFrame through the transformations. If, however, there's no specific requirement to maintain the original index for later use or reference, this step might not be essential.\n",
+    "\n",
+    "### Conclusion\n",
+    "\n",
+    "In summary, whether or not to reset and then set the index when concatenating DataFrames depends on the specific requirements of your analysis and dataset. It's not a standard practice applied in every scenario but can be crucial in certain contexts to maintain data integrity, order, or for specific data processing needs."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "pandas",
+   "language": "python",
+   "name": "pandas"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/README.md b/README.md
@@ -0,0 +1,34 @@
+This project is based on "The Complete Pandas Bootcamp 2023 - Data Science with Python," a course offered by Udemy and taught by Alexander Hagmann. It presents a Data Aggregation challenge, requiring the application and combination of various concepts and methods.
+
+This challenge is commonly used in job application processes and assessment centers within the field of Data Science. It tests candidates' abilities to work with, manipulate, and aggregate data. Even experienced Data Scientists find it challenging, not due to its coding complexity, but because it demands:
+
+- Proficient coding skills, and more importantly,
+- The capability to interpret and understand the underlying data, incorporating insights from subject matter experts—in this case, sports experts.
+
+Emphasis is placed on "Thinking in Data Structures!"
+
+**Project Goal:**
+On your first day at a Data Science advisory firm, your task is to produce the official Summer Olympic Games Medal Tables for all editions from 1896 to 2012.
+
+You are provided with a dataset containing over 31,000 medals (summer.csv) and the official Medal Tables for the 1996 and 1976 editions from Wikipedia (wik_1996.csv, wik_1976.csv). These official Medal Tables serve as a reference to verify the accuracy of your code.
+
+Your objective is to minimize the divergence between your aggregated Medal Tables and the official Medal Tables. For example, if the official count of Gold Medals for the United States in the 1996 edition is 44, and your code calculates 46, this represents an absolute divergence of 2.
+
+Calculate the total absolute divergence for the 1996 and 1976 editions (the "Score"). The optimal Score is 0!
+
+**Valuable Insights from Sports Experts:**
+- In Team Events, each medal awarded to individual team members counts as a single medal. For instance, the United States Basketball Team winning a Gold Medal in 2012, with 12 athletes, counts as one Gold Medal in the official 2012 Medal Table.
+- Events with 5 or fewer medals are considered Singles Events, and those with more than 5 medals are Team Events. In Singles Events, all awarded medals (including shared Bronze medals, leading to 4 or 5 medals in total) count for the official Medal Table. The same applies to Team Events where two teams may share the Bronze medal, resulting in a total of 4 medals (1 Gold, 1 Silver, 2 Bronze) for the Medal Table.
+- To identify unique events, the gender category of the event is crucial. There are Men's, Women's, and Mixed Events. Mixed Events can be identified as those marked "mixed" or "pairs," all "Equestrian" events, "Sailing" events before 1988, and certain specified medals in Badminton mixed doubles.
+
+Embark on this challenge and showcase your data science skills!
+
+To identify all unique Events, the Event Gender matters! There are Men Events, Women Events and Mixed Events. Assume
+that the following medals have been awarded in Mixed Events:
+
+- the Event is marked with "mixed" or "pairs"
+- all "Equestrian" Events
+- all "Sailing" Events before 1988 (until and including 1984)
+- the following medals (index labels) were awarded in Badminton mixed Double Events: [21773, 21782, 21776, 21785, 21770,
+21779, 23703, 23712, 23706, 23715, 23709, 23700, 25720, 25729, 25723, 25732, 25726, 25717, 27727, 27736, 27730, 27739,
+27724, 27733, 29784, 29785, 29786, 29787, 29788, 29789]