diff --git a/README.md b/README.md index f8d0424..e49afab 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # JupyterLite Demo -[![lite-badge](https://jupyterlite.rtfd.io/en/latest/_static/badge.svg)](https://jupyterlite.github.io/demo) +[![lite-badge](https://jupyterlite.rtfd.io/en/latest/_static/badge.svg)](https://humbledata.org/online-workshop/lab/index.html) + +> This repository holds the contents for the HumbelData workshop using JupyterLite. JupyterLite deployed as a static site to GitHub Pages, for demo purposes. diff --git a/content/notebooks/1. Beginning with Python.ipynb b/content/notebooks/1. Beginning with Python.ipynb index f393375..5d56c1e 100644 --- a/content/notebooks/1. Beginning with Python.ipynb +++ b/content/notebooks/1. Beginning with Python.ipynb @@ -1,8 +1,28 @@ { + "metadata": { + "kernelspec": { + "name": "python3", + "language": "python", + "display_name": "Python 3 (ipykernel)" + }, + "language_info": { + "codemirror_mode": { + "name": "python", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8" + } + }, + "nbformat_minor": 4, + "nbformat": 4, "cells": [ { "cell_type": "markdown", - "metadata": {}, "source": [ "
\n", "

\n", @@ -11,11 +31,11 @@ "Introduction to Python\n", "\n", "
" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "## Welcome to learning Python!\n", "\n", @@ -24,108 +44,139 @@ "We will walk you through different aspects of the Python language interactively. Take your time to experiment with it if you like and don't hesitate to ask or Google things. One of the first lessons in programming is using the documentation and information shared by other programmers. Jupyter has another trick, where you can put your cursor into the brackets of a function call such as `print()` and hit **Shift + Tab**. This will let you see the documentation of a function directly in the notebook (*pro-tip*: you can hit it up to four times for different effects).\n", "\n", "As for Python itself, one of the big strengths of Python is the extensibility. It has a surprising amount of functionality directly within the core language, however, you can `import` almost arbitrary code others make available as libraries. Below you can see a special import that shows you the \"Zen of Python\" a set of guidelines that could guide your programming journey.\"" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# We can use comments to document our code in a coding cell.\n", + "import this # Zen of Python" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# We can use comments to document our code in a coding cell.\n", - "import this # Zen of Python" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**--> Simple is better than complex.** \n", "It is really easy to print ***Hello World*** in Python:" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "print('Hello World!')" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "print('Hello World!')" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**PEP 8** is Python's style guide. You can find it here:\n", "https://www.python.org/dev/peps/pep-0008/\n", "\n", "It is good to know about, but when you become a professional programmer there are programs called a \"linter\" that will help you adhere to PEP 8." - ] + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "Before you get started, we just need to do a small amount of set up. We're going to load a module called NumPy which we'll need to complete this notebook. Don't stress if you don't understand this code - it's specific to the JupyterLite notebooks we're using for this course." + ], + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import pyodide_js\n", + "\n", + "# Install NumPy\n", + "await pyodide_js.loadPackage('numpy')" + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", - "metadata": {}, "source": [ "

\n", "Variables\n", "


\n", "
" - ] + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", - "metadata": {}, "source": [ "In programming it's very useful to store values. Accessing values through names is called a variable. Python is very user-friendly in that it will let you store most things in a variable, without making space in the computer's space explicitly. Moreover, computers need to differentiate between the type of data, such as, `5` being an integer and `'Hello'` being a string, but Python attempts to handle these intuitively for you. You will learn about the different types in the following sections!\n", "\n", "**Python is an object oriented programming language. You do not need to declare variables (or their types) before using them as every variable in Python is an object.** " - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "name = 'Sandrine'" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "name = 'Sandrine'" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "print(name)" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "print(name)" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Moreover, variables can easily be updated. Try it out below!" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -139,200 +190,219 @@ "\n", "\n", "---" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Assign another string to the variable 'name' and print this variable" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_01.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_01.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**!!! Variable can can change type when re-assigned.**" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Assign a number (without quotes) to the variable 'name' and print this variable" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_02.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_02.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "

\n", "Strings\n", "


\n", "
" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "A string always begins and ends with a single ( ' ) or double ( \\\" ) quotes. There is no difference, except if there is an apostrophe ( ' ) inside the string." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "'Beginners Data Workshop'" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "'Beginners Data Workshop'" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "\"Beginner's Data Workshop\"" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "\"Beginner's Data Workshop\"" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Write *I'm enjoying this workshop!* using double quotes, and then single quotes." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# double quotes\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# double quotes\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_03.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_03.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# single quotes\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# single quotes\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_04.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_04.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Oh no, an error!** Errors are nothing to be afraid of. Think of them as friendly messages trying to help you understand what's gone wrong. Here's how to read this:\n", "- Read errors backwards, so start at the bottom! It's a \"SyntaxError\" which means there's something wrong with our Python code.\n", @@ -342,294 +412,321 @@ "**It turns out, if you want to use single quotes inside single quotes, you need to \"escape\" the quote with a backslash ( \\\\ ).**\n", "\n", ">Let's try that!" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# single quotes, second try\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# single quotes, second try\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_05.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_05.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "As mentioned before, Python tries to handle everything as intuitively as possible. That means strings can be added together." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "'We are ' + 'everywhere around the world.'" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "'We are ' + 'everywhere around the world.'" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Also, strings can even be multiplied by a number." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "'Great! 🎉' * 3" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "'Great! 🎉' * 3" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "'😂' * 50" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "'😂' * 50" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can access parts of strings by slicing the string. A character of a string (which is considered as a string of length 1 by Python) using slice, and a substring using slice range. Something important to note:\n", "\n", "**!!! Like birthdays, Python starts counting from 0! (It is \"zero-indexed\")** \n", "\n", "(If you're interested why, here's a historical [letter from Dijkstra](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) about it.)" - ] + ], + "metadata": {} }, { + "cell_type": "markdown", + "source": [ + "![image.png](attachment:image.png)" + ], + "metadata": {}, "attachments": { "image.png": { "image/png": "" } - }, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![image.png](attachment:image.png)" - ] + } }, { "cell_type": "code", - "execution_count": null, + "source": [ + "s = 'I am a Pythonista.'" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "s = 'I am a Pythonista.'" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "***Slice*** \n", "We can access characters of a string by referencing the position (\"index\") numbers within square brackets." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# Selecting the first character of the string s\n", + "s[0]" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# Selecting the first character of the string s\n", - "s[0]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "> Select the last character of the string s" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_06.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_06.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "***Slice range*** \n", "We can get a range of characters of a string by using a slicing range." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# Select from position 2 up to but not including position 6 of the string s\n", + "s[2:6] # 6 is excluded" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# Select from position 2 up to but not including position 6 of the string s\n", - "s[2:6] # 6 is excluded" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can skip the start (resp. stop) number `s[:6]`. Then it start form index 0 (resp. end at the highest index)." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Select the last 3 characters of the string s" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_07.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_07.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can check if a string contains another string inside of it (a \"substring\"), using *in* and *not in*.\n", "\n", "**!!! Python is case-sensitive! The string `'a'` is not equal to `'A'`***" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "# Checking if s contains 'python'\n", "print(s)\n", "'python' in s" - ] + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "> Check if s does not contain 'I' (capital i)" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_08.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_08.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -637,409 +734,442 @@ "Numbers\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Python has different numerical types. Two of which are used more often than others:\n", "- **Integers:** These are whole numbers, i.e. `1`, `2`, `-5`\n", "- **Floats:** \"Floating point\" numbers are those with a decimal point, i.e. `3.14158`, `2.5`, `0.1` and even `3.0`. \n", "\n", "Python attempts to deal with these numbers intuitively when integers and floats are mixed." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Basic Operators" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Addition:** \n", - ">Try adding two numbers." - ] + ">Try adding 3 and 4 together." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_09.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_09.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Subtraction:**\n", - "> Try subtracting two numbers." - ] + "> Try subtracting 6 from 10.0." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_10.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_10.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Multiplication:** \n", "The sign for multiplication is *. \n", - ">Try multiplying two numbers." - ] + ">Try multiplying 15 and 12." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_11.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_11.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Exponent:** \n", "The sign for exponent (or power) is *`**`*.\n", "\n", - "> Try the power of an integer." - ] + "> Try raising 2 to the power of 6." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_12.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_12.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Try the power of a float" - ] + ">Try raising a float to a power this time. Calculate the square of 3.1." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_13.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_13.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Try the power of an integer written as a float (e.g. 12.0)" - ] + ">Now try something similar - raise an integer written as a float to a power. Calculate the square of 5, where 5 is written as a float." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_14.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_14.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Division and modulo**\n", "- The sign for division is **`/`**.\n", "- The sign for floor division is **`//`**. It returns the \"quotient\" of a division - how many times one number goes into another. (\"If six people can sit around a dinner table, and we have 99 guests coming, how many tables do we need?\")\n", "- The **`%`** sign is the [modulo](https://en.wikipedia.org/wiki/Modulo_operation), which returns the \"remainder\" after division." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Divide 6 by 2" - ] + ">Divide 6 by 2." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_15.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_15.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Try the floor division of 6 by 2" - ] + ">Try the floor division of 6 by 2." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_16.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_16.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Divide 19 by 5" - ] + ">Divide 19 by 5." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_17.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_17.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "*Note*: division returns a float." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**The floor division returns an int, the non-fractional part.** \n", - ">Try the floor division of 19 by 5" - ] + ">Try the floor division of 19 by 5." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_18.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_18.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Modulo returns the remainder of the division**" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Calculate 19 modulo 5" - ] + ">Calculate 19 modulo 5." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_19.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_19.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**--> 19 = 3 * 5 + 4**" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1047,44 +1177,48 @@ "Order of operations\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "The order of operations in Python respects the usual rules of mathematics (brackets -> powers -> division/multiplication -> add/subtract). **If in doubt, use brackets** to make it clear (to yourself, and anyone who might be reading your code) what you're trying to do." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "5 + 6 * 10" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "5 + 6 * 10" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "(5 + 6) * 10" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "(5 + 6) * 10" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1092,110 +1226,119 @@ "Booleans\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Booleans are the two constant values `True` and `False`. Python implements the concept of \"truthiness\", that means their numerical values are 1 and 0. These values are especially important for comparisons, therefore, we'll also learn about a new operator:\n", "\n", "**`==`** is an equality operator, different from **`=`** which is the assignment operator you used to assign variables." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "True == 1 # '==' is an equality operator, different from '=' which is an assignment operator" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "True == 1 # '==' is an equality operator, different from '=' which is an assignment operator" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "False * 3" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "False * 3" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Using **`!=`** , check if False is not equal to 2.\"" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_20.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_20.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Check if the length of your name is greater than 8." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "len()" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_21.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_21.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can use `and` and `or` operators with booleans in Python. Type them below and see how the color of the text changes, because Python recognises the keyword! These can be used to chain together multiple comparisons. These follow mathematical logic.\n", "\n", @@ -1207,35 +1350,38 @@ "| False | False | False | False |\n", "\n", "> Check if the length of your name is greater than 5 and the length of your mentor's name is less than 7." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_22.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_22.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1243,397 +1389,433 @@ "Lists\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "A list is a list of comma-separated values between square brackets.\n", "\n", "The items of a list can have different types." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False]" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can access a single value using a slice, and several values using slice range. Check the \"string slicing\" section above for this." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Get the first item of the list." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_23.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_23.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Get every the items from the list, starting with the 4th one.\n", "\n", "Note that there is no need to put a number after the colon when when want to select until the end of the list." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Get the items starting with the one with index 3 until the end of the list." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_24.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_24.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Get the items from the list until the 4th one.**" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Get the items from the beginning of the list until the value with index 4 (index 4 is excluded)." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_25.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_25.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**Advanced slicing allows us to set how the list's index will increment between the start/stop indexes we select.** \n", "The slicing then looks like this: [start:stop:step] \n", "For example, if we want to select every third items of a list, we will set the step as 3." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Get every other items from the list of greetings" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_26.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_26.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**We can update a list by re-assigning a value selected using a slice.** \n", "For example, we can replace False with 'Ave' in list_greeting." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "list_greeting[-1] = 'Ave'\n", + "print(list_greeting)" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "list_greeting[-1] = 'Ave'\n", - "print(list_greeting)" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Replace 10 with Hola in list_greetings, then print list_greeting to check it." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_27.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_27.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can have lists inside a list, these are often called \"nested lists\":" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "list_of_lists = [[1, 2, 3], [4, 5]]" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "list_of_lists = [[1, 2, 3], [4, 5]]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "list_of_lists[0] # access the first element of the list, which is a list." + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "list_of_lists[0] # access the first element of the list, which is a list." - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "list_of_lists[0][-1] # access the last element of the first list." + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "list_of_lists[0][-1] # access the last element of the first list." - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can concatenate lists using ' + '." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "[1, 2, 3] + [4, 5, 6]" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "[1, 2, 3] + [4, 5, 6]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can also use the multiplication to repeat values in a list (only works with integers)." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "['Hey'] * 5" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "['Hey'] * 5" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can also use `in` / `not in` with lists." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Check if `10` is in `list_greeting`." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_28.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_28.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Check if 'Ole' is not in list_greeting." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_29.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_29.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1641,216 +1823,236 @@ "Built-in functions\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "You've seen your first function at the very beginning of this notebook, i.e. `print(\"Hello World\")`. Functions are a way for you to write reusable code. In the case before `print()` is Python's way for you to show an output you provide. However, there are many other useful functions that Python comes with, that are provided by other libraries, or written by you yourself to save you from repeating some code.\n", "\n", "The functions that are always available in Python can be found here:\n", "https://docs.python.org/3/library/functions.html" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Print *'Here we are!'*." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_30.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_30.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Compute the length of the string variable `snakes`." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "snakes = \"🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍\"" - ] + ], + "metadata": { + "trusted": true + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_31.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_31.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Compute the length of list_greeting." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_32.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_32.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Work out the maximum of 1, 2, 3, 4 and 5." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_33.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_33.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Round the number 123.45 to the nearest integer." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_34.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_34.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Round the number 123.45 to 1 decimal place." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_35.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_35.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1858,64 +2060,69 @@ "Methods\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "A method is a function associated to an object. Basically, it provides a way for an object to know functions about themselves.\n", "\n", "For example, we can change the string `s` to upper case using the `.upper()` method." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "s.upper()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "s.upper()" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Using the append method, add 'Aloha' to list_greeting." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_36.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_36.py" - ] + "execution_count": 3, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1923,20 +2130,20 @@ "Importing modules\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can import modules (Python code that can define functions, classes and variables).\n", "\n", "Usually, we write all the imports at the beginning of a Python program or notebook." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**A small warning** \n", "Ensure you trust the packages you're installing!\n", @@ -1946,77 +2153,84 @@ "2. Do I trust giving their code access to my computer?\n", "3. `pip install` especially can be a little more dangerous, anyone can upload something to the Python Package Index (\"PyPI\" for short), which is where `pip` installs from. So be very careful especially as the smallest typo can be a security risk!\n", "4. `conda install` is more of a walled garden and as such better positioned for enterprise usage." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "> From the **math** library, import **sqrt** to work out the square root of **24 336**." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_37.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_37.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Python programmers really are lazy! (Or obsessive about productivity, take your pick!)\n", "\n", "Hence, you can define aliases for imports. The `as np` is two characters long, whereas `numpy` is five. It's less to type, when you write long programs.\n", "\n", "> Import `numpy as np` and try **`np.sin(np.pi/4)`**." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/01_38.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/01_38.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -2024,28 +2238,8 @@ "Take a well deserved break, brew a relaxing beverage of your choice or go have a snack, maybe share a picture of your snack/drink in the #random channel, we love to see pictures from everyone attending around the world. ☕️\n", "\n", "![](https://media0.giphy.com/media/3otPoS81loriI9sO8o/200.gif)" - ] + ], + "metadata": {} } - ], - "metadata": { - "kernelspec": { - "display_name": "Python (beginners-data-workshop)", - "language": "python", - "name": "beginners-data-workshop" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + ] } diff --git a/content/notebooks/2. First steps with Pandas.ipynb b/content/notebooks/2. First steps with Pandas.ipynb index 85395c0..eb52b7c 100644 --- a/content/notebooks/2. First steps with Pandas.ipynb +++ b/content/notebooks/2. First steps with Pandas.ipynb @@ -1,8 +1,28 @@ { + "metadata": { + "kernelspec": { + "name": "python", + "display_name": "Python (Pyodide)", + "language": "python" + }, + "language_info": { + "codemirror_mode": { + "name": "python", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8" + } + }, + "nbformat_minor": 4, + "nbformat": 4, "cells": [ { "cell_type": "markdown", - "metadata": {}, "source": [ "
\n", "

\n", @@ -11,32 +31,32 @@ "Data Analysis with Pandas\n", "\n", "
" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "> ***Note***: This notebook contains solution cells with ***a*** solution. Remember there is not only one solution to a problem! \n", "> \n", "> You will recognise these cells as they start with **# %**. \n", "> \n", "> If you would like to see the solution, you will have to remove the **#** (which can be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "

\n", "Data analysis packages\n", "


\n", "
" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Data Scientists use a wide variety of libraries in Python that make working with data significantly easier. Those libraries primarily consist of:\n", "\n", @@ -55,12 +75,31 @@ "\n", "Though there are countless others available.\n", "\n", - "For today, we'll primarily focus ourselves around the library that is 99% of our work: `pandas`. Pandas is built on top of the speed and power of NumPy." - ] + "For today, we'll primarily focus ourselves around the library that is 99% of our work: `pandas`. Pandas is built on top of the speed and power of NumPy.\n", + "\n", + "Run the code below to get the imports we need for this notebook." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import pyodide_js\n", + "\n", + "# Install NumPy\n", + "await pyodide_js.loadPackage('numpy')\n", + "\n", + "# Install Pandas\n", + "await pyodide_js.loadPackage('pandas')" + ], + "metadata": { + "trusted": true + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -68,55 +107,48 @@ "Imports\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "import pandas as pd" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "import pandas as pd" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Import numpy using the convention seen at the end of the first notebook." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_01.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "# %load ../solutions/02_01.py" - ] + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -124,121 +156,132 @@ "Loading the data\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "To see a method's documentation, you can use the help function. In Jupyter, you can also just put a question mark before the method." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "?pd.read_csv" + ], "metadata": { "jupyter": { "outputs_hidden": false }, - "scrolled": true + "scrolled": true, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "?pd.read_csv" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "To load the dataframe we are using in this notebook, we will provide the path to the file: ../data/Penguins/penguins.csv" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Load the dataframe, read it into a pandas DataFrame and assign it to df" - ] + ">Load the dataframe, read it into a pandas DataFrame and assign it to df." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_02.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_02.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**To have a look at the first 5 rows of df, we can use the *head* method.**" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df.head()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df.head()" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Have a look at the last 3 rows of df using the tail method" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_03.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_03.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -246,112 +289,121 @@ "General information about the dataset\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**To get the size of the datasets, we can use the *shape* attribute.** \n", "The first number is the number of row, the second one the number of columns" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Show the shape of df (do not put brackets at the end)" - ] + ">Show the shape of df (do not put brackets at the end)." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_04.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_04.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Get the names of the columns and info about them (number of non null and type) using the info method." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_05.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_05.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Get the columns of the dataframe using the columns attribute." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_06.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_06.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -359,62 +411,55 @@ "Display settings\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can check the display option of the notebook." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "pd.set_option('display.max_rows', [number of rows])" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "pd.options.display.max_rows" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Force pandas to display 25 rows by changing the value of the above." - ] + ">Force pandas to display 25 rows by changing the value of [number of rows] above." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_07.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "# %load ../solutions/02_07.py" - ] + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -422,270 +467,302 @@ "Subsetting data\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can subset a dataframe by label, by index or a combination of both. \n", "There are different ways to do it, using .loc, .iloc and also []. \n", "See [documentation ](https://pandas.pydata.org/pandas-docs/stable/indexing.html)." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Display the 'bill_length_mm' column" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_08.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_08.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "*Note:* We could also use `df.bill_length_mm`, but it's not the greatest idea because it could be mixed with methods and does not work for columns with spaces." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Have a look at the 12th observation:" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# using .iloc (uses positions, \"i\" stands for integer)\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# using .iloc (uses positions, \"i\" stands for integer)\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_09.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_09.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# using .loc (uses indexes and labels)\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# using .loc (uses indexes and labels)\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_10.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_10.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Display the **bill_length_mm** of the last three observations." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# using .iloc\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# using .iloc\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_11.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_11.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# using .loc\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# using .loc\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_12.py" + ], "metadata": { "jupyter": { "outputs_hidden": false }, - "scrolled": true + "scrolled": true, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_12.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "And finally look at the **flipper_length_mm** and **body_mass_g** of the 146th, the 8th and the 1rst observations:" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# using .iloc\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# using .iloc\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_13.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_13.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# using .loc\n" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# using .loc\n" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_14.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_14.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**!!WARNING!!** Unlike Python and ``.iloc``, the end value in a range specified by ``.loc`` **includes** the last index specified. " - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df.iloc[5:10]" + ], "metadata": { "jupyter": { "outputs_hidden": false }, - "scrolled": true + "scrolled": true, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df.iloc[5:10]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df.loc[5:10]" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df.loc[5:10]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -693,85 +770,92 @@ "Filtering data on conditions\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**We can also use condition(s) to filter.** \n", "We want to display the rows of df where **body_mass_g** is greater than 4000. We will start by creating a mask with this condition." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "mask_PW = df['body_mass_g'] > 4000\n", + "mask_PW" + ], "metadata": { "jupyter": { "outputs_hidden": false }, - "scrolled": true + "scrolled": true, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "mask_PW = df['body_mass_g'] > 4000\n", - "mask_PW" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Note that this return booleans. If we pass this mask to our dataframe, it will display only the rows where the mask is True." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df[mask_PW]" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df[mask_PW]" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Display the rows of df where **body_mass_g** is greater than 4000 and **flipper_length_mm** is less than 185." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_15.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_15.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -779,65 +863,70 @@ "Values\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can get the number of unique values from a certain column by using the `nunique` method.\n", "\n", "For example, we can get the number of unique values from the species column:" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df['species'].nunique()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df['species'].nunique()" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can also get the list of unique values from a certain column by using the `unique` method.\n", ">Return the list of unique values from the species column" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_16.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_16.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -845,341 +934,373 @@ "Null Values and NaN\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "When you work with data, you will quickly learn that data is never \"clean\". These values are usually referred to as null value. In computation it is best practice to define a \"special number\" that is \"**N**ot **a** **N**umber\" also called NaN.\n", "\n", "We can use the `isnull` method to know if a value is null or not. It returns boolean values." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df['flipper_length_mm'].isnull()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df['flipper_length_mm'].isnull()" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "**We can apply different methods one after the other.**. \n", "For example, we could apply to method `sum` after the method `isnull` to know the number of null observations in the **flipper_length_mm** column.\n", ">Get the total number of null values for **flipper_length_mm**." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": {}, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ - "# %load ../solutions/02_17.py" - ] + "# %run ../solutions/02_17.py" + ], + "metadata": { + "trusted": true + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "To get the count of the different values of a column, we can use the `value_counts` method.\n", "\n", "For example, for the species column:" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df['species'].value_counts()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df['species'].value_counts()" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "If we want to know the count of NaN values, we have to pass the value `False` to the parameter **dropna** (set to `True` by default).\n", "> Return the proportion for each sex, including the NaN values.\"" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_18.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_18.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "To get the proportion instead of the count of these values, we have to pass the value `True` to the parameter **normalize**.\n", ">Return the proportion for each species." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_19.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_19.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Using the index attribute, get the indexes of the observation without **flipper_length_mm**" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_20.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_20.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Use the **[dropna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)** method to remove the row which only has NaN values.\n", ">Get the help for the dropna method." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_21.py" + ], "metadata": { "jupyter": { "outputs_hidden": false }, - "scrolled": true + "scrolled": true, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_21.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Use the dropna method to remove the row of `df` where all of the values are NaN, and assign it to `df_2`." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_22.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_22.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can use a f-string to format a string. We have to write a `f` before the quotation mark, and write what you want to format between curly brackets." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "print(f'shape of df: {df.shape}')" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "print(f'shape of df: {df.shape}')" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "> Print the number of rows of `df_2` using a f_string. Did we lose any rows between `df` and `df_2`? If not, why not?" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_23.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_23.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Use the dropna method to remove the rows of `df_2` which contains any NaN values, and assign it to `df_3`" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_24.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_24.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Print the number of rows of `df_3` using a f_string." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_25.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_25.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1187,64 +1308,69 @@ "Duplicates\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Remove the duplicates rows from `df_3`, and assign the new dataframe to `df_4`" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_26.py" + ], "metadata": { "jupyter": { "outputs_hidden": false }, - "scrolled": true + "scrolled": true, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_26.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# checking the shape of df_4\n", + "df_4.shape" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# checking the shape of df_4\n", - "df_4.shape" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "You should see that 4 rows have been dropped. " - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1252,185 +1378,208 @@ "Some stats\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Use the describe method to see how the data is distributed (numerical features only!)" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_27.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_27.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can also change the **species** column to save memory space. Note: You may receive a **SettingWithCopyWarning** - you can safely ignore this error for this notebook." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df_4['species'] = df_4['species'].astype('category')" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df_4['species'] = df_4['species'].astype('category')" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Using the dtypes attribute, check the types of the columns of `df_4`" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_28.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_28.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can also use the functions count(), mean(), sum(), median(), std(), min() and max() separately if we are only interested in one of those." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Get the minimum for each numerical column of `df_4`" - ] + ">Get the minimum for each numerical column of `df_4`. Make sure to include the argument `numeric_only=True` in the function to filter results to only numeric columns." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_29.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_29.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ - ">Calculate the maximum of the **flipper_length_mm**" - ] + ">Calculate the maximum of the **flipper_length_mm**." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_30.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_30.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "We can also get information for each species using the `groupby` method.\n", "\n", "\n", - "> Get the median for each **species**." - ] + "> Get the median for each **species**. Again, make sure to include the argument `numeric_only=True` in the function to filter results to only numeric columns." + ], + "metadata": {} }, { "cell_type": "code", + "source": [], + "metadata": {}, "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# %run ../solutions/02_31.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "# %load ../solutions/02_31.py" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -1438,59 +1587,42 @@ "Saving the dataframe as a csv file\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ ">Save df_4 using this path: `'../data/Penguins/my_penguins.csv'`" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "# %run ../solutions/02_32.py" + ], "metadata": { "jupyter": { "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "# %load ../solutions/02_32.py" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (system-wide)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 + }, + "collapsed": false, + "trusted": true }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" + "execution_count": null, + "outputs": [] } - }, - "nbformat": 4, - "nbformat_minor": 4 + ] } diff --git a/content/notebooks/3.1 Visualization with Matplotlib.ipynb b/content/notebooks/3.1 Visualization with Matplotlib.ipynb index 5385877..567fb7f 100644 --- a/content/notebooks/3.1 Visualization with Matplotlib.ipynb +++ b/content/notebooks/3.1 Visualization with Matplotlib.ipynb @@ -1,8 +1,28 @@ { + "metadata": { + "kernelspec": { + "name": "python", + "display_name": "Python (Pyodide)", + "language": "python" + }, + "language_info": { + "codemirror_mode": { + "name": "python", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8" + } + }, + "nbformat_minor": 4, + "nbformat": 4, "cells": [ { "cell_type": "markdown", - "metadata": {}, "source": [ "
\n", "

\n", @@ -11,103 +31,119 @@ "Data visualization with matplotlib\n", "\n", "
" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "![](https://matplotlib.org/_static/logo2_compressed.svg)" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "

\n", "Import pyplot in matplotlib (and pandas)\n", "


\n", "
" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "According to the [official documentation](https://matplotlib.org/gallery/index.html):\n", "\n", "`matplotlib.pyplot` is a collection of command style functions that make Matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.\n", "\n", "`pyplot` is mainly intended for interactive plots and simple cases of programmatic plot generation." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "![](https://miro.medium.com/max/2000/1*swPzVFGpYdijWAmbrydCDw.png)" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "%matplotlib inline\n", "# this is for ipython interpreter to show the plot in Jupyter\n", "\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" - ] + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": 1, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Import the dataframe again, read it into a pandas DataFrame and assign it to df." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df = pd.read_csv('../data/Penguins/penguins_clean.csv')" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df = pd.read_csv('../data/Penguins/penguins_clean.csv')" - ] + "execution_count": 2, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Refresh our memory about how the data looks like" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df.head()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df.head()" + "execution_count": 3, + "outputs": [ + { + "execution_count": 3, + "output_type": "execute_result", + "data": { + "text/plain": " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n0 Adelie Torgersen 39.1 18.7 181.0 \n1 Adelie Torgersen 39.5 17.4 186.0 \n2 Adelie Torgersen 40.3 18.0 195.0 \n3 Adelie Torgersen NaN NaN NaN \n4 Adelie Torgersen 36.7 19.3 193.0 \n\n body_mass_g sex \n0 3750.0 Male \n1 3800.0 Female \n2 3250.0 Female \n3 NaN NaN \n4 3450.0 Female ", + "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieTorgersen39.118.7181.03750.0Male
1AdelieTorgersen39.517.4186.03800.0Female
2AdelieTorgersen40.318.0195.03250.0Female
3AdelieTorgersenNaNNaNNaNNaNNaN
4AdelieTorgersen36.719.3193.03450.0Female
\n
" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Using DataFrame.plot() in pandas\n", "\n", @@ -117,89 +153,165 @@ "\n", "You will find this page very helpful:\n", "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Example: Box plot in general" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df.plot(kind='box')" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df.plot(kind='box')" + "execution_count": 4, + "outputs": [ + { + "execution_count": 4, + "output_type": "execute_result", + "data": { + "text/plain": "" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "The scales of our data don't align particularly well. So for the sake of plotting, we'll ignore the body mass of the penguins." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "df.drop([\"body_mass_g\"], axis=1).plot(kind='box')" + ], + "metadata": { + "trusted": true + }, + "execution_count": 5, + "outputs": [ + { + "execution_count": 5, + "output_type": "execute_result", + "data": { + "text/plain": "" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Better presentation: figure size, add title and legend" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df.drop([\"body_mass_g\"], axis=1).plot(kind='box', figsize=(10,8), title='Box plot of different measurements of species of penguin', legend=True)" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df.drop([\"body_mass_g\"], axis=1).plot(kind='box', figsize=(10,8), title='Box plot of different measurements of species of penguin', legend=True)" + "execution_count": 6, + "outputs": [ + { + "execution_count": 6, + "output_type": "execute_result", + "data": { + "text/plain": "" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Making subplots" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "df.plot(kind='box',\n", " subplots=True, layout=(2,2),\n", " figsize=(10,8), title='Box plot of different measurements of species of penguin', legend=True)" + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": 7, + "outputs": [ + { + "execution_count": 7, + "output_type": "execute_result", + "data": { + "text/plain": "bill_length_mm AxesSubplot(0.125,0.53;0.352273x0.35)\nbill_depth_mm AxesSubplot(0.547727,0.53;0.352273x0.35)\nflipper_length_mm AxesSubplot(0.125,0.11;0.352273x0.35)\nbody_mass_g AxesSubplot(0.547727,0.11;0.352273x0.35)\ndtype: object" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -207,193 +319,241 @@ "Exercise: Compare bill length of different species of penguin\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - "Let's use box plot to compare the bill length of different species of penguin. We need the DataFrame to be slightly different so we can compare the different type species of penguin. We would like to pivot the data so each column are bill length of different species of penguin." - ] + "Let's use box plot to compare the bill length of different species of penguin. We need the DataFrame to be slightly different so we can compare the different species of penguin. We would like to pivot the data so each column are bill length of different species of penguin." + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Prepare the data set" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df_pivot = df.pivot(index=None, columns='species', values='bill_length_mm')\n", + "# tell the pivot() method to make the 'species' as columns, and using the 'bill_length_mm' as the value" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df_pivot = df.pivot(index=None, columns='species', values='bill_length_mm')\n", - "# tell the pivot() method to make the 'species' as columns, and using the 'bill_length_mm' as the value" - ] + "execution_count": 8, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df_pivot.sample(10)" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df_pivot.sample(10)" + "execution_count": 9, + "outputs": [ + { + "execution_count": 9, + "output_type": "execute_result", + "data": { + "text/plain": "species Adelie Chinstrap Gentoo\n200 NaN 51.5 NaN\n180 NaN 46.4 NaN\n136 35.6 NaN NaN\n30 39.5 NaN NaN\n199 NaN 49.0 NaN\n337 NaN NaN 48.8\n165 NaN 52.0 NaN\n129 44.1 NaN NaN\n194 NaN 50.9 NaN\n155 NaN 45.4 NaN", + "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
speciesAdelieChinstrapGentoo
200NaN51.5NaN
180NaN46.4NaN
13635.6NaNNaN
3039.5NaNNaN
199NaN49.0NaN
337NaNNaN48.8
165NaN52.0NaN
12944.1NaNNaN
194NaN50.9NaN
155NaN45.4NaN
\n
" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Box plot of df_pivot\n", "\n", "Now we can use `plot()` on `df_pivot`. To make a box plot, remember to set the parameter `kind` to 'box'. Also make the presentation nice by setting a good `figsize` and with a good `title`. Don't forget the `legend`." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Additional exercise\n", "\n", "Challenge yourself by making your own `df_pivot` pivoting on a different measure (e.g. Body Mass). Also try using a histogram (hist) instead of a boxplot. You can also try making a plot with 3 subplots, each is a histogram of a type of penguin." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "So far we are not using `matplotlib.pyplot` directly. Although it is very convenient to use `df.plot()`, sometimes we would like to have more control with what we are plotting and make more complex graphs. In the following sections, we will use `matplotlib.pyplot` (which is imported as `plt` now) directly." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - "### Divide the data into 3 types accordingly" - ] + "### Divide the data into 3 types accordingly\n", + "\n", + "In order to create the following plots, we need to create different pandas DataFrames for each penguin species." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df['species'].unique()" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df['species'].unique()" + "execution_count": 10, + "outputs": [ + { + "execution_count": 10, + "output_type": "execute_result", + "data": { + "text/plain": "array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)" + }, + "metadata": {} + } ] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df_adelie = df[df['species'] == 'Adelie']" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df_adelie = df[df['species'] == 'Adelie']" - ] + "execution_count": 11, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df_chinstrap = df[df['species'] == 'Chinstrap']" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df_chinstrap = df[df['species'] == 'Chinstrap']" - ] + "execution_count": 12, + "outputs": [] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "df_gentoo = df[df['species'] == 'Gentoo']" + ], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false, + "trusted": true }, - "outputs": [], - "source": [ - "df_gentoo = df[df['species'] == 'Gentoo']" - ] + "execution_count": 13, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Scatter plot example: plot on Bill Length and Width" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "plt.scatter(df_adelie['bill_length_mm'], df_adelie['bill_depth_mm'], c='r')\n", "plt.scatter(df_chinstrap['bill_length_mm'], df_chinstrap['bill_depth_mm'], c='g')\n", "plt.scatter(df_gentoo['bill_length_mm'], df_gentoo['bill_depth_mm'], c='b')" + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": 14, + "outputs": [ + { + "execution_count": 14, + "output_type": "execute_result", + "data": { + "text/plain": "" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "#### Better presentation: figure size, add labels and legend" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "plt.figure(figsize=(10,8)) # set the size of the plot\n", "\n", @@ -408,44 +568,64 @@ "ax.set_title('Bill Length and Width for Different Species of Penguin')\n", "\n", "ax.legend(('adelie', 'chinstrap', 'gentoo'))" + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": 16, + "outputs": [ + { + "execution_count": 16, + "output_type": "execute_result", + "data": { + "text/plain": "" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Scatter plot exercise: plot on Flipper Length and Body Mass\n", "\n", - "Now is your turn to make your own plot. Make sure you have also set the labels and legend" - ] + "Now is your turn to make your own plot. Make sure you have also set the labels and legend." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Histogram example: plot on Bill Length" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "plt.figure(figsize=(10,8))\n", "\n", @@ -459,48 +639,68 @@ "ax.set_title('Histogram of Bill Length for Different Species of Penguin')\n", "\n", "ax.legend(('adelie', 'chinstrap', 'gentoo'))" + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": 17, + "outputs": [ + { + "execution_count": 17, + "output_type": "execute_result", + "data": { + "text/plain": "" + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Histogram exercise: plot on Body Mass\n", "\n", "Now is your turn to make your own plot. Make sure you set the alpha to a proper value and have the right the labels and legend." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { "jupyter": { "outputs_hidden": false - } + }, + "collapsed": false }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "### Making subplots example\n", "\n", - "To make subplots with just `plt` is a bit more complicated. It is considered more advance and require some understanding of what the building blocks are in a plot. Don't feel bad if you find it challenging, you can always follow the example and try it yourself to understand more what is going on.\n", + "To make subplots with just `plt` is a bit more complicated. It is considered more advanced and requires some understanding of what the building blocks are in a plot. Don't feel bad if you find it challenging, you can always follow the example and try it yourself to understand more of what is going on.\n", "\n", - "The example below plot the histogram of Bill Length and Bill Width side by side" - ] + "The example below plots the histogram of Bill Length and Bill Width side by side." + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], "source": [ "# First, we have to decide how many subplots we want and how they are orientated\n", "# say we want them side by side (i.e. 1 row 2 columns)\n", @@ -535,11 +735,28 @@ "ax1.legend(('adelie', 'chinstrap', 'gentoo'))\n", "\n", "plt.show() # after building what we want for both axes, use show() method to show plots" + ], + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": 18, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": "
", + "image/png": "" + }, + "metadata": {} + } ] }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -547,18 +764,18 @@ "Making subplots exercise\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "Make 2 subplots, one on top of another. They are scatter plots of Flipper Length and Body Mass (with different type of penguin). After you have done it, try also other orientation and plots. See if you can make 4 subplots together. Always make sure the presentation is good." - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ "---\n", "\n", @@ -566,13 +783,13 @@ "More matplotlib!\n", "
\n", "" - ] + ], + "metadata": {} }, { "cell_type": "markdown", - "metadata": {}, "source": [ - "Check out more example of histogram with multiple data sets: https://matplotlib.org/gallery/statistics/histogram_multihist.html#sphx-glr-gallery-statistics-histogram-multihist-py\n", + "Check out more examples of histogram with multiple data sets: https://matplotlib.org/gallery/statistics/histogram_multihist.html#sphx-glr-gallery-statistics-histogram-multihist-py\n", "\n", "Example: Creates histogram from scatter plot and adds them to the sides of the plot\n", "https://matplotlib.org/gallery/lines_bars_and_markers/scatter_hist.html#sphx-glr-gallery-lines-bars-and-markers-scatter-hist-py\n", @@ -582,28 +799,8 @@ "Also, if you are stuck, always check the documentation: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot\n", "\n", "![](https://media0.giphy.com/media/l3nF8lOW9D0ZElDvG/200.gif)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (system-wide)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" + ], + "metadata": {} } - }, - "nbformat": 4, - "nbformat_minor": 4 + ] } diff --git a/content/notebooks/3.2 Visualization with Seaborn.ipynb b/content/notebooks/3.2 Visualization with Seaborn.ipynb index 574cd4a..de829e9 100644 --- a/content/notebooks/3.2 Visualization with Seaborn.ipynb +++ b/content/notebooks/3.2 Visualization with Seaborn.ipynb @@ -1,420 +1,335 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "

\n", - "

\n", - "

\n", - "Data visualization with Seaborn\n", - "

\n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## About seaborn\n", - "Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics, which is very powerful for visualizing categorical data.\n", - "\n", - "![](https://d1rwhvwstyk9gu.cloudfront.net/2017/07/seaburn-1.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will be using the [Pokemon.csv](https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6). Let's have a look at the data:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "import pandas as pd\n", - "\n", - "pokemon_df = pd.read_csv('../data/Pokemon/pokemon.csv', index_col=0)\n", - "pokemon_df.head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "

\n", - "Categorical scatterplots\n", - "


\n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For example, we want to compare the Attack of different type of Pokemon, to see if any type is generally more powerful than the others:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "import seaborn as sns\n", - "import matplotlib.pyplot as plt\n", - "\n", - "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When import, we usually simplify 'seaborn' as 'sns'. (It's a [West Wing / Rob Lowe](https://en.wikipedia.org/wiki/Sam_Seaborn) reference!) Note that we have to also have to import matplotlib.pyplot because Seaborn is a library that sit on top of matplotlib. We got a plot but it looks ugly and not readable, let's add some configuration to make it nicer." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: adding `aspect=2.5` as the last arguments in the following `sns.catplot`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So you can see that by adding 'aspect' we make the plot wider. The width of the plot is equal to 'aspect * height' so by adding 'aspect' we increase the width of the plot. It is one of the configuration we can add to the plot. For the whole list and their details, we can refer to the [official documentation](https://seaborn.pydata.org/generated/seaborn.catplot.html#seaborn.catplot) but we will give an introduction to a few common ones." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For example, here we see that there's a random x-axis offset for all the points so we can see them without dots overlapping each other. This is done by the 'jitter' setting which is default to True. Let's turn it off and see how it looks like:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: adding `jitter=False` as the last arguments in the following `sns.catplot`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, aspect=2.5);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So we now have a plot that points are align according to their catagories without the x-axis offsets. Which one to use is depending on if the population of the value (e.g. Attack) is important. In our case, we want to know how the Attack is distributed in each Type so many be it's good to have 'jitter' on, or even better if we can spread it out even more and show the distribution:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: adding `kind=\"swarm\"` as the last arguments in the following `sns.catplot`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, aspect=2.5);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here we can do it by setting 'kind' to 'swarm' so the points are not overlapping. The disadvantage is that this ploy will need more space horizontally. Imagine we don't want to make the plot super wide due to the limitation of the paper. We can turn it 90 degrees by flipping the x and the y,also we would adjust the aspect and the height:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: swap `x` and `y`, and add `height=12, aspect=0.6, kind=\"swarm\"` in the arguments of the following `sns.catplot`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are a few thing we can observe so far:\n", - "\n", - "1. For some Types, like Psychic has a very large range of Attack with a long tail the end (i.e. some Physic Types has very high Attack power while most of the Psychic type does not).\n", - "\n", - "2. On the other hand, the Poison type are mostly in the range of 40-110 Attacks.\n", - "\n", - "3. In general Dragon Types have more Attack power than Fairy, but there are 2 Fairy type that has more attack power." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "However, we would like to look deeper: I have a theory that Legendary Pokemon are more powerful. let's colour code according to 'Legendary' to see if the pokemon is Legendary or not will have something to do with the Attack of the pokemon:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: adding `hue=\"Legendary\"` as the last arguments in the following `sns.catplot`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false + "metadata": { + "kernelspec": { + "name": "python", + "display_name": "Python (Pyodide)", + "language": "python" + }, + "language_info": { + "codemirror_mode": { + "name": "python", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8" } - }, - "outputs": [], - "source": [ - "plt.figure(figsize=(15, 6))\n", - "sns.stripplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, size=7)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Ah ha! We see that a lot of the Psychic Type that has higher that others in Attack is actually Legendary pokemon. That also happen to the Ground Type and the Flying type." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Exercise\n", - "Now it's your turn to do some analysis. Pick a property of the Pokemon: HP, Defense, Sp. Atk, Sp. Def or Speed and do the similar analysis as above to see if you can find any interesting facts about Pokemon." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "

\n", - "Building structured multi-plot grids\n", - "


\n", - "
" - ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Sometimes, we would have multiple plots in one graph for comparison. One way to do it in seaborn is to use FacetGrid. The FacetGrid class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. In the following, we will be using FacetGrid to see if there is a difference for our analysis above across different Generations." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To make a FacetGrid, we can do the following:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "g = sns.FacetGrid(pokemon_df, col=\"Generation\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Look we have 6 plot areas which match as the number of different of Generations that we have\n", - "(we can check what are the different Generations like this):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false + "nbformat_minor": 4, + "nbformat": 4, + "cells": [ + { + "cell_type": "markdown", + "source": "
\n

\n

\n

\nData visualization with Seaborn\n

\n
", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "## About seaborn\nSeaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics, which is very powerful for visualizing categorical data.\n\n![](https://d1rwhvwstyk9gu.cloudfront.net/2017/07/seaburn-1.png)\n\nBefore we get started, we need to install Seaborn. Run the cell below.", + "metadata": {} + }, + { + "cell_type": "code", + "source": "import piplite\nawait piplite.install('seaborn')", + "metadata": { + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "We will be using the dataset [Pokemon.csv](https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6). Let's have a look at the data:", + "metadata": {} + }, + { + "cell_type": "code", + "source": "import pandas as pd\n\npokemon_df = pd.read_csv('../data/Pokemon/pokemon.csv', index_col=0)\npokemon_df.head(10)", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "---\n\n

\nCategorical scatterplots\n


\n
", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "For example, we want to compare the Attack of different types of Pokemon, to see if any type is generally more powerful than the others:", + "metadata": {} + }, + { + "cell_type": "code", + "source": "import seaborn as sns\nimport matplotlib.pyplot as plt\n\nsns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "When import, we usually simplify 'seaborn' as 'sns'. (It's a [West Wing / Rob Lowe](https://en.wikipedia.org/wiki/Sam_Seaborn) reference!) Note that we have to also have to import `matplotlib.pyplot` because Seaborn is a library that sits on top of matplotlib. We got a plot but it looks ugly and not readable, let's add some configuration to make it nicer.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: adding `aspect=2.5` as the last arguments in the following `sns.catplot`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "So you can see that by adding 'aspect' we make the plot wider. The width of the plot is equal to 'aspect * height' so by adding 'aspect' we increase the width of the plot. It is one of the configurations that we can add to the plot. For the whole list and their details, we can refer to the [official documentation](https://seaborn.pydata.org/generated/seaborn.catplot.html#seaborn.catplot) but we will give an introduction to a few common ones.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "For example, here we see that there's a random x-axis offset for all the points so we can see them without dots overlapping each other. This is done by the 'jitter' setting which is default to True. Let's turn it off and see what it looks like:", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: adding `jitter=False` as the last arguments in the following `sns.catplot`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, aspect=2.5);", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "So we now have a plot that points are align according to their catagories without the x-axis offsets. Which one to use is depending on if the population of the value (e.g. Attack) is important. In our case, we want to know how the Attack is distributed in each Type so many be it's good to have 'jitter' on, or even better if we can spread it out even more and show the distribution:", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: adding `kind=\"swarm\"` as the last arguments in the following `sns.catplot`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, aspect=2.5);", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "Here we can do it by setting 'kind' to 'swarm' so the points are not overlapping. The disadvantage is that this ploy will need more space horizontally. Imagine we don't want to make the plot super wide due to the limitation of the paper. We can turn it 90 degrees by flipping the x and the y,also we would adjust the aspect and the height:", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: swap `x` and `y`, and add `height=12, aspect=0.6, kind=\"swarm\"` in the arguments of the following `sns.catplot`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "There are a few things we can observe so far:\n\n1. For some Types, like Psychic has a very large range of Attack with a long tail the end (i.e. some Physic Types has very high Attack power while most of the Psychic type does not).\n\n2. On the other hand, the Poison type are mostly in the range of 40-110 Attacks.\n\n3. In general Dragon Types have more Attack power than Fairy, but there are 2 Fairy type that has more attack power.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "However, we would like to look deeper: I have a theory that Legendary Pokemon are more powerful. let's colour code according to 'Legendary' to see if the pokemon is Legendary or not will have something to do with the Attack of the pokemon:", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: adding `hue=\"Legendary\"` as the last arguments in the following `sns.catplot`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "plt.figure(figsize=(15, 6))\nsns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, size=7)", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "Ah ha! We see that a lot of the Psychic Type that has higher that others in Attack is actually Legendary pokemon. That also happen to the Ground Type and the Flying type.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "### Exercise\nNow it's your turn to do some analysis. Pick a property of the Pokemon: HP, Defense, Sp. Atk, Sp. Def or Speed and do the similar analysis as above to see if you can find any interesting facts about Pokemon.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "---\n\n

\nBuilding structured multi-plot grids\n


\n
", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "Sometimes, we would have multiple plots in one graph for comparison. One way to do it in seaborn is to use FacetGrid. The FacetGrid class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. In the following, we will be using FacetGrid to see if there is a difference for our analysis above across different Generations.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "To make a FacetGrid, we can do the following:", + "metadata": {} + }, + { + "cell_type": "code", + "source": "g = sns.FacetGrid(pokemon_df, col=\"Generation\")", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "Look we have 6 plot areas which match as the number of different of Generations that we have\n(we can check what are the different Generations like this):", + "metadata": {} + }, + { + "cell_type": "code", + "source": "pokemon_df[\"Generation\"].unique()", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "However, we would like to have the plots align vertically rather than horizontally.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: replace `col` with `row` in the following `sns.FacetGrid`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "g = sns.FacetGrid(pokemon_df, col=\"Generation\")", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "Ok, now we have the layout, how we gonna to put the plot in? For some plots, it could be done with the [FacetGrid.map()](https://seaborn.pydata.org/generated/seaborn.FacetGrid.map.html#seaborn.FacetGrid.map) method, for example, using sns.countplot to count how many Pokemon in different types:", + "metadata": {} + }, + { + "cell_type": "code", + "source": "g = sns.FacetGrid(pokemon_df, row=\"Generation\", aspect=3.5)\ng.map(sns.countplot, \"Type 1\");", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "But with sns.catplot that we used before, this are even simpler. As catplot is already a FacetGrid , we can directly add the `row` or `col` setting to it.", + "metadata": {} + }, + { + "cell_type": "markdown", + "source": "**Try: adding `row=\"Generation\"` as the last arguments in the following `sns.catplot`**", + "metadata": {} + }, + { + "cell_type": "code", + "source": "plt.figure(figsize=(15, 6))\nsns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, size=7, hue=\"Legendary\", aspect=2.5)", + "metadata": { + "jupyter": { + "outputs_hidden": false + }, + "collapsed": false, + "trusted": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": "Now you see that in each generation, the Legendary Pokemon are outliers with super attack powers comparing with the others within their own generation. For details using FacetGrids, you can see the official documentation here: https://seaborn.pydata.org/tutorial/axis_grids.html", + "metadata": {} } - }, - "outputs": [], - "source": [ - "pokemon_df[\"Generation\"].unique()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "However, we would like to have the plots align vertically rather than horizontally." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: replace `col` with `row` in the following `sns.FacetGrid`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "g = sns.FacetGrid(pokemon_df, col=\"Generation\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Ok, now we have the layout, how we gonna to put the plot in? For some plots, it could be done with the [FacetGrid.map()](https://seaborn.pydata.org/generated/seaborn.FacetGrid.map.html#seaborn.FacetGrid.map) method, for example, using sns.countplot to count how many Pokemon in different types:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "g = sns.FacetGrid(pokemon_df, row=\"Generation\", aspect=3.5)\n", - "g.map(sns.countplot, \"Type 1\");" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "But with sns.catplot that we used before, this are even simpler. As catplot is already a FacetGrid , we can directly add the `row` or `col` setting to it." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Try: adding `row=\"Generation\"` as the last arguments in the following `sns.catplot`**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "plt.figure(figsize=(15, 6))\n", - "sns.stripplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, size=7, hue=\"Legendary\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you see that in each generation, the Legendary Pokemon are outliers with super attack powers comparing with the others within their own generation. For details using FacetGrids, you can see the official documentation here: https://seaborn.pydata.org/tutorial/axis_grids.html" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (system-wide)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} + ] +} \ No newline at end of file diff --git a/content/notebooks/4. More Python basics.ipynb b/content/notebooks/4. More Python basics.ipynb index ec77f4f..5b79594 100644 --- a/content/notebooks/4. More Python basics.ipynb +++ b/content/notebooks/4. More Python basics.ipynb @@ -19,7 +19,7 @@ "source": [ "***Note***: This notebook contains solution cells with (a) solution. Remember there is not only one solution to a problem! \n", "You will recognise these cells as they start with **# %**. \n", - "If you would like to see the solution, you will have to remove the **#** (which can be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again." + "If you would like to see the solution, you will have to remove the **#** (which can also be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again." ] }, { @@ -81,7 +81,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/04_01.py" + "# %run ../solutions/04_01.py" ] }, { @@ -106,7 +106,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/04_02.py" + "# %run ../solutions/04_02.py" ] }, { @@ -130,7 +130,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/04_03.py" + " %run ../solutions/04_03.py" ] }, { @@ -244,7 +244,7 @@ "metadata": {}, "outputs": [], "source": [ - "name = input('What is your name? ')\n", + "name = ''\n", "\n", "if len(name) > 6:\n", " print('You have a long name.')\n", @@ -280,7 +280,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/04_04.py" + "# %run ../solutions/04_04.py" ] }, { @@ -411,7 +411,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - ">Get the documentation for the is_greeting function." + ">Get the documentation for the is_greeting function. Hint: Try using a question mark." ] }, { @@ -427,7 +427,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/04_05.py" + "# %run ../solutions/04_05.py" ] }, { @@ -452,13 +452,13 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/04_06.py" + "# %run ../solutions/04_06.py" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (system-wide)", + "display_name": "Python 3.10.5 64-bit", "language": "python", "name": "python3" }, @@ -472,7 +472,12 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.5" + "version": "3.10.5" + }, + "vscode": { + "interpreter": { + "hash": "97cc609b13305c559618ec78a438abc56230b9381f827f22d070313b9a1f3777" + } } }, "nbformat": 4, diff --git a/content/notebooks/5. More Pandas.ipynb b/content/notebooks/5. More Pandas.ipynb index 6cd7e11..65f1edb 100644 --- a/content/notebooks/5. More Pandas.ipynb +++ b/content/notebooks/5. More Pandas.ipynb @@ -24,6 +24,28 @@ "If you would like to see the solution, you will have to remove the **#** (which can be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, let's install the libraries we need for this notebook. Run the cell below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pyodide_js\n", + "\n", + "# Install Matplotlib\n", + "await pyodide_js.loadPackage(\"matplotlib\")\n", + "\n", + "# Install pandas\n", + "await pyodide_js.loadPackage(\"pandas\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -42,8 +64,7 @@ "source": [ "> 1. Import pandas (with the mostly used convention seen in notebook 2). \n", "> 2. Import datetime. \n", - "> 3. Import matplotlib (as shown in notebook 3.1). \n", - "> 4. Add the magic command to show the plot in Jupyter." + "> 3. Import matplotlib (as shown in notebook 3.1). " ] }, { @@ -67,9 +88,25 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_01.py" + "# %run ../solutions/05_01.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **BONUS EXERCISE**: Add the magic command to show the plot in Jupyter.\n", + "\n", + "HINT: try starting with a '%' sign." ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": {}, @@ -105,7 +142,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_02.py" + "# %run ../solutions/05_02.py" ] }, { @@ -136,7 +173,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_03.py" + "# %run ../solutions/05_03.py" ] }, { @@ -144,7 +181,7 @@ "metadata": {}, "source": [ "You can see that the header are not at the right place. \n", - "If you have a look at the documentation, you will see that the *header* parameter of the read_csv method is set to *infer*. It will infer the header using the first row (which has the index *0*)." + "If you have a look at the documentation, you will see that the *header* parameter of the `read_csv` method is set to *infer*. Hence, it will infer the header using the first row, which has the index *0*." ] }, { @@ -175,7 +212,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_04.py" + "# %run ../solutions/05_04.py" ] }, { @@ -206,7 +243,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_05.py" + "# %run ../solutions/05_05.py" ] }, { @@ -237,7 +274,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_06.py" + "# %run ../solutions/05_06.py" ] }, { @@ -287,7 +324,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_07.py" + "# %run ../solutions/05_07.py" ] }, { @@ -318,7 +355,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_08.py" + "# %run ../solutions/05_08.py" ] }, { @@ -349,7 +386,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_09.py" + "# %run ../solutions/05_09.py" ] }, { @@ -382,7 +419,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_10.py" + "# %run ../solutions/05_10.py" ] }, { @@ -425,7 +462,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_11.py" + "# %run ../solutions/05_11.py" ] }, { @@ -464,7 +501,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_12.py" + "# %run ../solutions/05_12.py" ] }, { @@ -505,7 +542,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_13.py" + "# %run ../solutions/05_13.py" ] }, { @@ -563,7 +600,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_14.py" + "# %run ../solutions/05_14.py" ] }, { @@ -596,7 +633,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_15.py" + "# %run ../solutions/05_15.py" ] }, { @@ -629,7 +666,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_16.py" + "# %run ../solutions/05_16.py" ] }, { @@ -660,7 +697,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_17.py" + "# %run ../solutions/05_17.py" ] }, { @@ -691,7 +728,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_18.py" + "# %run ../solutions/05_18.py" ] }, { @@ -735,7 +772,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_19.py" + "# %run ../solutions/05_19.py" ] }, { @@ -766,7 +803,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_20.py" + "# %run ../solutions/05_20.py" ] }, { @@ -798,7 +835,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_21.py" + "# %run ../solutions/05_21.py" ] }, { @@ -829,7 +866,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_22.py" + "# %run ../solutions/05_22.py" ] }, { @@ -861,7 +898,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_23.py" + "# %run ../solutions/05_23.py" ] }, { @@ -892,7 +929,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_24.py" + "# %run ../solutions/05_24.py" ] }, { @@ -923,7 +960,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_25.py" + "# %run ../solutions/05_25.py" ] }, { @@ -949,7 +986,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_26.py" + "# %run ../solutions/05_26.py" ] }, { @@ -1012,7 +1049,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_27.py" + "# %run ../solutions/05_27.py" ] }, { @@ -1041,9 +1078,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "df['city'].value_counts()" - ] + "source": [] }, { "cell_type": "code", @@ -1051,7 +1086,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_28.py" + "# %run ../solutions/05_28.py" ] }, { @@ -1078,7 +1113,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_29.py" + "# %run ../solutions/05_29.py" ] }, { @@ -1119,7 +1154,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_30.py" + "# %run ../solutions/05_30.py" ] }, { @@ -1143,7 +1178,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_31.py" + "# %run ../solutions/05_31.py" ] }, { @@ -1166,7 +1201,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_32.py" + "# %run ../solutions/05_32.py" ] }, { @@ -1189,7 +1224,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_33.py" + "# %run ../solutions/05_33.py" ] }, { @@ -1239,7 +1274,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_34.py" + "# %run ../solutions/05_34.py" ] }, { @@ -1270,7 +1305,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_35.py" + "# %run ../solutions/05_35.py" ] }, { @@ -1305,7 +1340,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_36.py" + "# %run ../solutions/05_36.py" ] }, { @@ -1338,7 +1373,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_37.py" + "# %run ../solutions/05_37.py" ] }, { @@ -1361,7 +1396,7 @@ "metadata": {}, "outputs": [], "source": [ - "# %load ../solutions/05_38.py" + "# %run ../solutions/05_38.py" ] }, { @@ -1404,7 +1439,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_39.py" + "# %run ../solutions/05_39.py" ] }, { @@ -1436,7 +1471,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_40.py" + "# %run ../solutions/05_40.py" ] }, { @@ -1469,7 +1504,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_41.py" + "# %run ../solutions/05_41.py" ] }, { @@ -1500,7 +1535,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_42.py" + "# %run ../solutions/05_42.py" ] }, { @@ -1532,7 +1567,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_43.py" + "# %run ../solutions/05_43.py" ] }, { @@ -1575,7 +1610,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_44.py" + "# %run ../solutions/05_44.py" ] }, { @@ -1606,7 +1641,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_45.py" + "# %run ../solutions/05_45.py" ] }, { @@ -1656,7 +1691,7 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_46.py" + "# %run ../solutions/05_46.py" ] }, { @@ -1687,13 +1722,13 @@ }, "outputs": [], "source": [ - "# %load ../solutions/05_47.py" + "# %run ../solutions/05_47.py" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (system-wide)", + "display_name": "Python 3.10.5 64-bit", "language": "python", "name": "python3" }, @@ -1707,7 +1742,12 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.5" + "version": "3.10.5" + }, + "vscode": { + "interpreter": { + "hash": "97cc609b13305c559618ec78a438abc56230b9381f827f22d070313b9a1f3777" + } } }, "nbformat": 4, diff --git a/content/solutions/01_03.py b/content/solutions/01_03.py index 18a1322..92d4af6 100644 --- a/content/solutions/01_03.py +++ b/content/solutions/01_03.py @@ -1 +1 @@ -"I'm enjoying this workshop!" \ No newline at end of file +print("I'm enjoying this workshop!") \ No newline at end of file diff --git a/content/solutions/01_05.py b/content/solutions/01_05.py index 5c81d54..4f8359b 100644 --- a/content/solutions/01_05.py +++ b/content/solutions/01_05.py @@ -1 +1 @@ -'I\'m enjoying this workshop!' \ No newline at end of file +print('I\'m enjoying this workshop!') \ No newline at end of file diff --git a/content/solutions/01_06.py b/content/solutions/01_06.py index ab3be34..e75ba31 100644 --- a/content/solutions/01_06.py +++ b/content/solutions/01_06.py @@ -1 +1,2 @@ -s[-1] \ No newline at end of file +s = 'I am a Pythonista.' +print(s[-1]) \ No newline at end of file diff --git a/content/solutions/01_07.py b/content/solutions/01_07.py index 428c4d6..a2f6622 100644 --- a/content/solutions/01_07.py +++ b/content/solutions/01_07.py @@ -1 +1,2 @@ -s[-3:] \ No newline at end of file +s = "I am a Pythonista." +print(s[-3:]) \ No newline at end of file diff --git a/content/solutions/01_08.py b/content/solutions/01_08.py index 929aaca..617a7b0 100644 --- a/content/solutions/01_08.py +++ b/content/solutions/01_08.py @@ -1 +1,2 @@ -"I" not in s \ No newline at end of file +s = "I am a Pythonista." +print("I" not in s) \ No newline at end of file diff --git a/content/solutions/01_09.py b/content/solutions/01_09.py index 5b947f9..1d058a2 100644 --- a/content/solutions/01_09.py +++ b/content/solutions/01_09.py @@ -1 +1 @@ -3 + 4 \ No newline at end of file +print(3 + 4) \ No newline at end of file diff --git a/content/solutions/01_10.py b/content/solutions/01_10.py index 992d1cf..508534c 100644 --- a/content/solutions/01_10.py +++ b/content/solutions/01_10.py @@ -1 +1 @@ -10.0 - 6 \ No newline at end of file +print(10.0 - 6) \ No newline at end of file diff --git a/content/solutions/01_11.py b/content/solutions/01_11.py index 9f650f4..528d11d 100644 --- a/content/solutions/01_11.py +++ b/content/solutions/01_11.py @@ -1 +1 @@ -15 * 12 \ No newline at end of file +print(15 * 12) \ No newline at end of file diff --git a/content/solutions/01_12.py b/content/solutions/01_12.py index b4db663..f519641 100644 --- a/content/solutions/01_12.py +++ b/content/solutions/01_12.py @@ -1 +1 @@ -2**6 \ No newline at end of file +print(2**6) \ No newline at end of file diff --git a/content/solutions/01_13.py b/content/solutions/01_13.py index 74eeeaf..e7680ca 100644 --- a/content/solutions/01_13.py +++ b/content/solutions/01_13.py @@ -1 +1 @@ -3.1**2 \ No newline at end of file +print(3.1**2) \ No newline at end of file diff --git a/content/solutions/01_14.py b/content/solutions/01_14.py index 7f0af36..5c8c371 100644 --- a/content/solutions/01_14.py +++ b/content/solutions/01_14.py @@ -1 +1 @@ -5.0**2 \ No newline at end of file +print(5.0**2) \ No newline at end of file diff --git a/content/solutions/01_15.py b/content/solutions/01_15.py index 512522a..dd31e59 100644 --- a/content/solutions/01_15.py +++ b/content/solutions/01_15.py @@ -1 +1 @@ -6 / 2 \ No newline at end of file +print(6 / 2) \ No newline at end of file diff --git a/content/solutions/01_16.py b/content/solutions/01_16.py index d5e993a..976426b 100644 --- a/content/solutions/01_16.py +++ b/content/solutions/01_16.py @@ -1 +1 @@ -6 // 2 \ No newline at end of file +print(6 // 2) \ No newline at end of file diff --git a/content/solutions/01_17.py b/content/solutions/01_17.py index 32cddde..b3bda13 100644 --- a/content/solutions/01_17.py +++ b/content/solutions/01_17.py @@ -1 +1 @@ -19 / 5 \ No newline at end of file +print(19 / 5) \ No newline at end of file diff --git a/content/solutions/01_18.py b/content/solutions/01_18.py index 9f5b26b..b15f815 100644 --- a/content/solutions/01_18.py +++ b/content/solutions/01_18.py @@ -1 +1 @@ -19 // 5 \ No newline at end of file +print(19 // 5) \ No newline at end of file diff --git a/content/solutions/01_19.py b/content/solutions/01_19.py index 275fd5a..fe18c51 100644 --- a/content/solutions/01_19.py +++ b/content/solutions/01_19.py @@ -1 +1 @@ -19 % 5 \ No newline at end of file +print(19 % 5) \ No newline at end of file diff --git a/content/solutions/01_20.py b/content/solutions/01_20.py index 086bcd1..6485aa8 100644 --- a/content/solutions/01_20.py +++ b/content/solutions/01_20.py @@ -1 +1 @@ -False != 2 \ No newline at end of file +print(False != 2) \ No newline at end of file diff --git a/content/solutions/01_21.py b/content/solutions/01_21.py index a9a3622..991d23f 100644 --- a/content/solutions/01_21.py +++ b/content/solutions/01_21.py @@ -1 +1,2 @@ -len("Sandrine") > 8 \ No newline at end of file +name = "Sandrine" +print(f"Is {name} more than 8 characters? {len(name) > 8}") \ No newline at end of file diff --git a/content/solutions/01_22.py b/content/solutions/01_22.py index c339d50..8b6331f 100644 --- a/content/solutions/01_22.py +++ b/content/solutions/01_22.py @@ -1 +1,4 @@ -(len("Sandrine") > 5) and (len("Cheuk") < 7) \ No newline at end of file +name_1 = "Sandrine" +name_2 = "Cheuk" + +print(f"Is {name_1} > 5 and {name_2} < 7 characters? {(len(name_1) > 5) and (len(name_2) < 7)}") \ No newline at end of file diff --git a/content/solutions/01_23.py b/content/solutions/01_23.py index ad36c1c..12a449b 100644 --- a/content/solutions/01_23.py +++ b/content/solutions/01_23.py @@ -1 +1,2 @@ -list_greeting[0] \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False] +print(list_greeting[0]) \ No newline at end of file diff --git a/content/solutions/01_24.py b/content/solutions/01_24.py index 4aaa6d2..e87aa97 100644 --- a/content/solutions/01_24.py +++ b/content/solutions/01_24.py @@ -1 +1,2 @@ -list_greeting[3:] \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False] +print(list_greeting[3:]) \ No newline at end of file diff --git a/content/solutions/01_25.py b/content/solutions/01_25.py index 2d29224..0958419 100644 --- a/content/solutions/01_25.py +++ b/content/solutions/01_25.py @@ -1 +1,2 @@ -list_greeting[:4] \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False] +print(list_greeting[:4]) \ No newline at end of file diff --git a/content/solutions/01_26.py b/content/solutions/01_26.py index c8b200c..c942f85 100644 --- a/content/solutions/01_26.py +++ b/content/solutions/01_26.py @@ -1 +1,2 @@ -list_greeting[::2] \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False] +print(list_greeting[::2]) \ No newline at end of file diff --git a/content/solutions/01_27.py b/content/solutions/01_27.py index eb29a58..66bede0 100644 --- a/content/solutions/01_27.py +++ b/content/solutions/01_27.py @@ -1,2 +1,4 @@ -list_greeting[2] = "Ola" +list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False] +list_greeting[-1] = 'Ave' +list_greeting[2] = "Hola" print(list_greeting) \ No newline at end of file diff --git a/content/solutions/01_28.py b/content/solutions/01_28.py index d7f6573..726122a 100644 --- a/content/solutions/01_28.py +++ b/content/solutions/01_28.py @@ -1 +1,2 @@ -10 in list_greeting \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 'Hola', 'Hello', 'Ciao', 'Ave'] +print(10 in list_greeting) \ No newline at end of file diff --git a/content/solutions/01_29.py b/content/solutions/01_29.py index a42b108..6952d6f 100644 --- a/content/solutions/01_29.py +++ b/content/solutions/01_29.py @@ -1 +1,2 @@ -"Ole" not in list_greeting \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 'Hola', 'Hello', 'Ciao', 'Ave'] +print("Ole" not in list_greeting) \ No newline at end of file diff --git a/content/solutions/01_31.py b/content/solutions/01_31.py index 1f8c4e5..ddc0a86 100644 --- a/content/solutions/01_31.py +++ b/content/solutions/01_31.py @@ -1 +1,2 @@ -len(snakes) \ No newline at end of file +snakes = "🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍" +print(len(snakes)) \ No newline at end of file diff --git a/content/solutions/01_32.py b/content/solutions/01_32.py index 9cf9ec1..a333e1e 100644 --- a/content/solutions/01_32.py +++ b/content/solutions/01_32.py @@ -1 +1,2 @@ -len(list_greeting) \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 'Hola', 'Hello', 'Ciao', 'Ave'] +print(len(list_greeting)) \ No newline at end of file diff --git a/content/solutions/01_33.py b/content/solutions/01_33.py index 089fab2..9b25407 100644 --- a/content/solutions/01_33.py +++ b/content/solutions/01_33.py @@ -1 +1 @@ -max(1, 2, 3, 4, 5) \ No newline at end of file +print(max(1, 2, 3, 4, 5)) \ No newline at end of file diff --git a/content/solutions/01_34.py b/content/solutions/01_34.py index 1e307e4..5b2c7d8 100644 --- a/content/solutions/01_34.py +++ b/content/solutions/01_34.py @@ -1 +1 @@ -round(123.45) \ No newline at end of file +print(round(123.45)) \ No newline at end of file diff --git a/content/solutions/01_35.py b/content/solutions/01_35.py index bfcdf9d..a5ba06c 100644 --- a/content/solutions/01_35.py +++ b/content/solutions/01_35.py @@ -1 +1 @@ -round(123.45, 1) \ No newline at end of file +print(round(123.45, 1)) \ No newline at end of file diff --git a/content/solutions/01_36.py b/content/solutions/01_36.py index a4c7c36..e293d6d 100644 --- a/content/solutions/01_36.py +++ b/content/solutions/01_36.py @@ -1 +1,3 @@ -list_greeting.append("Aloha") \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 'Hola', 'Hello', 'Ciao', 'Ave'] +list_greeting.append("Aloha") +print(list_greeting) \ No newline at end of file diff --git a/content/solutions/01_37.py b/content/solutions/01_37.py index 03c927c..776a811 100644 --- a/content/solutions/01_37.py +++ b/content/solutions/01_37.py @@ -1,3 +1,3 @@ from math import sqrt -sqrt(24336) \ No newline at end of file +print(sqrt(24336)) \ No newline at end of file diff --git a/content/solutions/01_38.py b/content/solutions/01_38.py index 19e2957..75aa57d 100644 --- a/content/solutions/01_38.py +++ b/content/solutions/01_38.py @@ -1,3 +1,5 @@ +#await micropip.install("numpy") + import numpy as np -np.sin(np.pi / 4) \ No newline at end of file +print(np.sin(np.pi / 4)) \ No newline at end of file diff --git a/content/solutions/02_02.py b/content/solutions/02_02.py index 06a5823..1946ebd 100644 --- a/content/solutions/02_02.py +++ b/content/solutions/02_02.py @@ -1 +1,5 @@ -df = pd.read_csv("../data/Penguins/penguins.csv") \ No newline at end of file +import pandas as pd +from IPython.display import display + +df = pd.read_csv("../data/Penguins/penguins.csv") +display(df) \ No newline at end of file diff --git a/content/solutions/02_03.py b/content/solutions/02_03.py index dca017b..50f8669 100644 --- a/content/solutions/02_03.py +++ b/content/solutions/02_03.py @@ -1 +1,5 @@ -df.tail(3) \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.tail(3)) \ No newline at end of file diff --git a/content/solutions/02_04.py b/content/solutions/02_04.py index 20bfa30..21179e4 100644 --- a/content/solutions/02_04.py +++ b/content/solutions/02_04.py @@ -1 +1,4 @@ -df.shape \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df.shape) \ No newline at end of file diff --git a/content/solutions/02_05.py b/content/solutions/02_05.py index e1ed112..ad2647e 100644 --- a/content/solutions/02_05.py +++ b/content/solutions/02_05.py @@ -1 +1,4 @@ -df.info() \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df.info()) \ No newline at end of file diff --git a/content/solutions/02_06.py b/content/solutions/02_06.py index 4cc9080..eba4e66 100644 --- a/content/solutions/02_06.py +++ b/content/solutions/02_06.py @@ -1 +1,4 @@ -df.columns \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df.columns) \ No newline at end of file diff --git a/content/solutions/02_07.py b/content/solutions/02_07.py index 2e52ab3..aef01b2 100644 --- a/content/solutions/02_07.py +++ b/content/solutions/02_07.py @@ -1 +1,3 @@ -pd.options.display.max_rows = 25 \ No newline at end of file +import pandas as pd + +pd.set_option('display.max_rows', 25) \ No newline at end of file diff --git a/content/solutions/02_08.py b/content/solutions/02_08.py index 9b1475e..575915e 100644 --- a/content/solutions/02_08.py +++ b/content/solutions/02_08.py @@ -1 +1,5 @@ -df["bill_length_mm"] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df["bill_length_mm"]) \ No newline at end of file diff --git a/content/solutions/02_09.py b/content/solutions/02_09.py index b5e231a..b379909 100644 --- a/content/solutions/02_09.py +++ b/content/solutions/02_09.py @@ -1 +1,5 @@ -df.iloc[11] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.iloc[11]) \ No newline at end of file diff --git a/content/solutions/02_10.py b/content/solutions/02_10.py index 757e138..5bcd203 100644 --- a/content/solutions/02_10.py +++ b/content/solutions/02_10.py @@ -1 +1,5 @@ -df.loc[11] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.loc[11]) \ No newline at end of file diff --git a/content/solutions/02_11.py b/content/solutions/02_11.py index 5f6d037..138e910 100644 --- a/content/solutions/02_11.py +++ b/content/solutions/02_11.py @@ -1 +1,5 @@ -df.iloc[-3:, 2] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.iloc[-3:, 2]) \ No newline at end of file diff --git a/content/solutions/02_12.py b/content/solutions/02_12.py index ccd1060..78fd717 100644 --- a/content/solutions/02_12.py +++ b/content/solutions/02_12.py @@ -1 +1,5 @@ -df.loc[341:, "bill_length_mm"] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.loc[341:, "bill_length_mm"]) \ No newline at end of file diff --git a/content/solutions/02_13.py b/content/solutions/02_13.py index f20bfb2..6d3eaf1 100644 --- a/content/solutions/02_13.py +++ b/content/solutions/02_13.py @@ -1 +1,5 @@ -df.iloc[[145, 7, 0], [4, -2]] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.iloc[[145, 7, 0], [4, -2]]) \ No newline at end of file diff --git a/content/solutions/02_14.py b/content/solutions/02_14.py index b46a9a6..b93e285 100644 --- a/content/solutions/02_14.py +++ b/content/solutions/02_14.py @@ -1 +1,5 @@ -df.loc[[145, 7, 0], ["flipper_length_mm", "body_mass_g"]] \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + +display(df.loc[[145, 7, 0], ["flipper_length_mm", "body_mass_g"]]) \ No newline at end of file diff --git a/content/solutions/02_15.py b/content/solutions/02_15.py index 8540a2a..b58d4a7 100644 --- a/content/solutions/02_15.py +++ b/content/solutions/02_15.py @@ -1,2 +1,6 @@ +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") + mask_PW_PL = (df["body_mass_g"] > 4000) & (df["flipper_length_mm"] < 185) -df[mask_PW_PL] \ No newline at end of file +display(df[mask_PW_PL]) \ No newline at end of file diff --git a/content/solutions/02_16.py b/content/solutions/02_16.py index bfbdc0d..ef32cd5 100644 --- a/content/solutions/02_16.py +++ b/content/solutions/02_16.py @@ -1 +1,4 @@ -df["species"].unique() \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df["species"].unique()) \ No newline at end of file diff --git a/content/solutions/02_17.py b/content/solutions/02_17.py index 0ab5391..4caf8c1 100644 --- a/content/solutions/02_17.py +++ b/content/solutions/02_17.py @@ -1 +1,4 @@ -df["flipper_length_mm"].isnull().sum() \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df["flipper_length_mm"].isnull().sum()) \ No newline at end of file diff --git a/content/solutions/02_18.py b/content/solutions/02_18.py index 84339d0..f92629e 100644 --- a/content/solutions/02_18.py +++ b/content/solutions/02_18.py @@ -1 +1,4 @@ -df["sex"].value_counts(dropna=False) \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df["sex"].value_counts(dropna=False)) \ No newline at end of file diff --git a/content/solutions/02_19.py b/content/solutions/02_19.py index 8697011..194966a 100644 --- a/content/solutions/02_19.py +++ b/content/solutions/02_19.py @@ -1 +1,4 @@ -df["species"].value_counts(normalize=True) \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df["species"].value_counts(normalize=True)) \ No newline at end of file diff --git a/content/solutions/02_20.py b/content/solutions/02_20.py index 9a0f24b..eb1281a 100644 --- a/content/solutions/02_20.py +++ b/content/solutions/02_20.py @@ -1 +1,4 @@ -df[df["flipper_length_mm"].isnull()].index \ No newline at end of file +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + +print(df[df["flipper_length_mm"].isnull()].index) \ No newline at end of file diff --git a/content/solutions/02_21.py b/content/solutions/02_21.py index a9b1273..f4c1bc4 100644 --- a/content/solutions/02_21.py +++ b/content/solutions/02_21.py @@ -1 +1,3 @@ -?pd.DataFrame.dropna \ No newline at end of file +import pandas as pd + +print(pd.DataFrame.dropna.__doc__) \ No newline at end of file diff --git a/content/solutions/02_22.py b/content/solutions/02_22.py index 89cb40c..4703c72 100644 --- a/content/solutions/02_22.py +++ b/content/solutions/02_22.py @@ -1 +1,4 @@ +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") + df_2 = df.dropna(how="all") \ No newline at end of file diff --git a/content/solutions/02_23.py b/content/solutions/02_23.py index da2f480..009c745 100644 --- a/content/solutions/02_23.py +++ b/content/solutions/02_23.py @@ -1 +1,5 @@ +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") + print(f"number of rows of df_2: {df_2.shape[0]}") \ No newline at end of file diff --git a/content/solutions/02_24.py b/content/solutions/02_24.py index 9d41559..c3a5fe7 100644 --- a/content/solutions/02_24.py +++ b/content/solutions/02_24.py @@ -1 +1,5 @@ +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") + df_3 = df_2.dropna(how="any") \ No newline at end of file diff --git a/content/solutions/02_25.py b/content/solutions/02_25.py index 3deb842..8c80114 100644 --- a/content/solutions/02_25.py +++ b/content/solutions/02_25.py @@ -1 +1,6 @@ +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") + print(f"number of rows of df_3: {df_3.shape[0]}") \ No newline at end of file diff --git a/content/solutions/02_26.py b/content/solutions/02_26.py index 8b0e889..ea80adf 100644 --- a/content/solutions/02_26.py +++ b/content/solutions/02_26.py @@ -1 +1,6 @@ +import pandas as pd +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") + df_4 = df_3.drop_duplicates() \ No newline at end of file diff --git a/content/solutions/02_27.py b/content/solutions/02_27.py index da1df3b..4c60b12 100644 --- a/content/solutions/02_27.py +++ b/content/solutions/02_27.py @@ -1 +1,8 @@ -df_4.describe() \ No newline at end of file +import pandas as pd +from IPython.display import display +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") +df_4 = df_3.drop_duplicates() + +display(df_4.describe()) \ No newline at end of file diff --git a/content/solutions/02_28.py b/content/solutions/02_28.py index aa9d760..d90cc69 100644 --- a/content/solutions/02_28.py +++ b/content/solutions/02_28.py @@ -1 +1,11 @@ -df_4.dtypes \ No newline at end of file +import pandas as pd +from IPython.display import display +pd.options.mode.chained_assignment = None + +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") +df_4 = df_3.drop_duplicates() +df_4['species'] = df_4['species'].astype('category') + +display(df_4.dtypes) \ No newline at end of file diff --git a/content/solutions/02_29.py b/content/solutions/02_29.py index ee30853..64b19d6 100644 --- a/content/solutions/02_29.py +++ b/content/solutions/02_29.py @@ -1 +1,10 @@ -df_4.min() \ No newline at end of file +import pandas as pd +pd.options.mode.chained_assignment = None + +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") +df_4 = df_3.drop_duplicates() +df_4['species'] = df_4['species'].astype('category') + +print(df_4.min(numeric_only=True)) \ No newline at end of file diff --git a/content/solutions/02_30.py b/content/solutions/02_30.py index 03e20a5..846deee 100644 --- a/content/solutions/02_30.py +++ b/content/solutions/02_30.py @@ -1 +1,10 @@ -df_4["flipper_length_mm"].max() \ No newline at end of file +import pandas as pd +pd.options.mode.chained_assignment = None + +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") +df_4 = df_3.drop_duplicates() +df_4['species'] = df_4['species'].astype('category') + +print(df_4["flipper_length_mm"].max()) \ No newline at end of file diff --git a/content/solutions/02_31.py b/content/solutions/02_31.py index 9cec756..6d03644 100644 --- a/content/solutions/02_31.py +++ b/content/solutions/02_31.py @@ -1 +1,11 @@ -df_4.groupby("species").median() \ No newline at end of file +import pandas as pd +from IPython.display import display +pd.options.mode.chained_assignment = None + +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") +df_4 = df_3.drop_duplicates() +df_4['species'] = df_4['species'].astype('category') + +display(df_4.groupby("species").median(numeric_only=True)) \ No newline at end of file diff --git a/content/solutions/02_32.py b/content/solutions/02_32.py index 853bcf7..0378a45 100644 --- a/content/solutions/02_32.py +++ b/content/solutions/02_32.py @@ -1 +1,10 @@ +import pandas as pd +pd.options.mode.chained_assignment = None + +df = pd.read_csv("../data/Penguins/penguins.csv") +df_2 = df.dropna(how="all") +df_3 = df_2.dropna(how="any") +df_4 = df_3.drop_duplicates() +df_4['species'] = df_4['species'].astype('category') + df_4.to_csv("../data/Penguins/my_penguins.csv") \ No newline at end of file diff --git a/content/solutions/04_01.py b/content/solutions/04_01.py index ab320c4..382824a 100644 --- a/content/solutions/04_01.py +++ b/content/solutions/04_01.py @@ -1 +1,3 @@ -dict_greeting["Italy"] \ No newline at end of file +dict_greeting = {'Namibia':'Hallo', 'France':'Bonjour', 'Spain':'Ola', 'UK':'Hello', 'Italy':'Ciao'} + +print(dict_greeting["Italy"]) \ No newline at end of file diff --git a/content/solutions/04_02.py b/content/solutions/04_02.py index 14c5ddc..6536506 100644 --- a/content/solutions/04_02.py +++ b/content/solutions/04_02.py @@ -1,2 +1,4 @@ +dict_greeting = {'Namibia':'Hallo', 'France':'Bonjour', 'Spain':'Ola', 'UK':'Hello', 'Italy':'Ciao'} + dict_greeting["UK"] = "Good Morning" print(dict_greeting) \ No newline at end of file diff --git a/content/solutions/04_03.py b/content/solutions/04_03.py index fb2851f..f0e839f 100644 --- a/content/solutions/04_03.py +++ b/content/solutions/04_03.py @@ -1,2 +1,4 @@ +dict_greeting = {'Namibia':'Hallo', 'France':'Bonjour', 'Spain':'Ola', 'UK':'Hello', 'Italy':'Ciao'} + dict_greeting["Hawaii"] = "Aloha" print(dict_greeting) \ No newline at end of file diff --git a/content/solutions/04_05.py b/content/solutions/04_05.py index 371268b..66e5a12 100644 --- a/content/solutions/04_05.py +++ b/content/solutions/04_05.py @@ -1 +1,11 @@ -?is_greeting \ No newline at end of file +list_greeting = ['Hallo', 'Bonjour', 'Ola', 'Hello', 'Ciao', 'Ave'] + +def is_greeting(s): + """Returns True if s is in list_greeting, else False.""" + if s in list_greeting: + return True + else: + return False + + +help(is_greeting) \ No newline at end of file diff --git a/content/solutions/04_06.py b/content/solutions/04_06.py index 7e694ea..9519c1a 100644 --- a/content/solutions/04_06.py +++ b/content/solutions/04_06.py @@ -7,4 +7,11 @@ def f(x): """Returns the argument multiplied by 3 and increased by 10.""" - return (x * 3) + 10 \ No newline at end of file + return (x * 3) + 10 + + +print('def f(x):') +print('\t"""Returns the argument multiplied by 3 and increased by 10."""') +print('\treturn (x * 3) + 10\n') + +print('f(x) = ', f(4)) # x = 4 is arbitrary. Try with other values. \ No newline at end of file diff --git a/content/solutions/04_07.py b/content/solutions/04_07.py deleted file mode 100644 index e69de29..0000000 diff --git a/content/solutions/04_08.py b/content/solutions/04_08.py deleted file mode 100644 index e69de29..0000000 diff --git a/content/solutions/04_09.py b/content/solutions/04_09.py deleted file mode 100644 index e69de29..0000000 diff --git a/content/solutions/05_01.py b/content/solutions/05_01.py index 87e5a62..eb19a21 100644 --- a/content/solutions/05_01.py +++ b/content/solutions/05_01.py @@ -1,6 +1,11 @@ +#import libraries to run notebook import datetime - import matplotlib.pyplot as plt import pandas as pd -%matplotlib inline \ No newline at end of file +print('import datetime') +print('import matplotlib.pyplot as plt') +print('import pandas as pd') + +# solution for the bonus exercise +# %matplotlib inline \ No newline at end of file diff --git a/content/solutions/05_02.py b/content/solutions/05_02.py index b9dcabb..b709281 100644 --- a/content/solutions/05_02.py +++ b/content/solutions/05_02.py @@ -1 +1,4 @@ -df_2014 = pd.read_csv("../data/food_training/training_2014.csv") \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv") +print('df_2014 = pd.read_csv("../data/food_training/training_2014.csv")') \ No newline at end of file diff --git a/content/solutions/05_03.py b/content/solutions/05_03.py index a65ecc1..79b7a68 100644 --- a/content/solutions/05_03.py +++ b/content/solutions/05_03.py @@ -1 +1,5 @@ -df_2014.head() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv") + +display(df_2014.head()) \ No newline at end of file diff --git a/content/solutions/05_04.py b/content/solutions/05_04.py index a6cf222..b5dbdee 100644 --- a/content/solutions/05_04.py +++ b/content/solutions/05_04.py @@ -1 +1,5 @@ -df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) + +print('df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1)') \ No newline at end of file diff --git a/content/solutions/05_05.py b/content/solutions/05_05.py index a65ecc1..cfee109 100644 --- a/content/solutions/05_05.py +++ b/content/solutions/05_05.py @@ -1 +1,5 @@ -df_2014.head() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) + +display(df_2014.head()) \ No newline at end of file diff --git a/content/solutions/05_06.py b/content/solutions/05_06.py index 09a0c32..76d6ff4 100644 --- a/content/solutions/05_06.py +++ b/content/solutions/05_06.py @@ -1,2 +1,7 @@ +import pandas as pd + df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) -df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) \ No newline at end of file +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +print('df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1)') +print('df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1)') diff --git a/content/solutions/05_07.py b/content/solutions/05_07.py index ea505e8..05ace2c 100644 --- a/content/solutions/05_07.py +++ b/content/solutions/05_07.py @@ -1,2 +1,12 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + frames = [df_2014, df_2015, df_2016] -df = pd.concat(frames) \ No newline at end of file +df = pd.concat(frames) +# display(df) + +print('frames = [df_2014, df_2015, df_2016]') +print('df = pd.concat(frames)') \ No newline at end of file diff --git a/content/solutions/05_08.py b/content/solutions/05_08.py index 20bfa30..a97956d 100644 --- a/content/solutions/05_08.py +++ b/content/solutions/05_08.py @@ -1 +1,11 @@ -df.shape \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +print('df.shape') +display(df.shape) diff --git a/content/solutions/05_09.py b/content/solutions/05_09.py index 5476dda..fcc6fae 100644 --- a/content/solutions/05_09.py +++ b/content/solutions/05_09.py @@ -1 +1,11 @@ -df.index \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +print('df.index') +display(df.index) diff --git a/content/solutions/05_10.py b/content/solutions/05_10.py index cfad187..193a595 100644 --- a/content/solutions/05_10.py +++ b/content/solutions/05_10.py @@ -1,5 +1,17 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + df = df.reset_index() df.index # We could also have done the following when concatenating: -# df = pd.concat(frames, ignore_index=True) \ No newline at end of file +# df = pd.concat(frames, ignore_index=True) + +print('df.reset_index()\ndf.index\n') +display(df.index) \ No newline at end of file diff --git a/content/solutions/05_11.py b/content/solutions/05_11.py index e1ed112..f3297fb 100644 --- a/content/solutions/05_11.py +++ b/content/solutions/05_11.py @@ -1 +1,14 @@ -df.info() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() + +print('df.info\n') + +display(df.info()) \ No newline at end of file diff --git a/content/solutions/05_12.py b/content/solutions/05_12.py index 1cc1893..3326a4e 100644 --- a/content/solutions/05_12.py +++ b/content/solutions/05_12.py @@ -1 +1,8 @@ -?pd.DataFrame.drop \ No newline at end of file +import pandas as pd + +print('?pd.DataFrame.drop)') + +# help(pd.DataFrame.drop) + +# You can also use +# ?pd.DataFrame.drop \ No newline at end of file diff --git a/content/solutions/05_13.py b/content/solutions/05_13.py index fde2e4f..3205dbf 100644 --- a/content/solutions/05_13.py +++ b/content/solutions/05_13.py @@ -1,2 +1,17 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] -df = df.drop(cols_to_remove, axis=1) \ No newline at end of file +df = df.drop(cols_to_remove, axis=1) + +print('cols_to_remove = ["Unnamed: 5", "Unnamed: 6"]') +print('df = df.drop(cols_to_remove, axis=1)') \ No newline at end of file diff --git a/content/solutions/05_14.py b/content/solutions/05_14.py index 59ca733..5975a62 100644 --- a/content/solutions/05_14.py +++ b/content/solutions/05_14.py @@ -1 +1,18 @@ -df["Location"].unique() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +print('df["Location"].unique()\n') + +display(df["Location"].unique()) \ No newline at end of file diff --git a/content/solutions/05_15.py b/content/solutions/05_15.py index d1e771b..7d2d2ef 100644 --- a/content/solutions/05_15.py +++ b/content/solutions/05_15.py @@ -1 +1,18 @@ -df["Location"].str.split(pat=";") \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +print('df["Location"].str.split(pat=";")\n') + +display(df["Location"].str.split(pat=";")) \ No newline at end of file diff --git a/content/solutions/05_16.py b/content/solutions/05_16.py index b4c40f0..b18ccd7 100644 --- a/content/solutions/05_16.py +++ b/content/solutions/05_16.py @@ -1 +1,18 @@ -df["Location"].str.split(pat=";", expand=True) \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +print('df["Location"].str.split(pat=";", expand=True)\n') + +display(df["Location"].str.split(pat=";", expand=True)) \ No newline at end of file diff --git a/content/solutions/05_17.py b/content/solutions/05_17.py index ce9645a..f1c0f23 100644 --- a/content/solutions/05_17.py +++ b/content/solutions/05_17.py @@ -1 +1,20 @@ -df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +print('df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True)\n') + +display(df) \ No newline at end of file diff --git a/content/solutions/05_18.py b/content/solutions/05_18.py index 7381102..68d1bac 100644 --- a/content/solutions/05_18.py +++ b/content/solutions/05_18.py @@ -1 +1,20 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +print('df = df.drop("Location", axis=1)\n') + df = df.drop("Location", axis=1) \ No newline at end of file diff --git a/content/solutions/05_19.py b/content/solutions/05_19.py index 0c3641f..896b59a 100644 --- a/content/solutions/05_19.py +++ b/content/solutions/05_19.py @@ -1 +1,22 @@ -df["country"].nunique() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +print('df["country"].nunique()\n') + +display(df["country"].nunique()) \ No newline at end of file diff --git a/content/solutions/05_20.py b/content/solutions/05_20.py index 2dd0467..4b366b3 100644 --- a/content/solutions/05_20.py +++ b/content/solutions/05_20.py @@ -1 +1,22 @@ -df["country"].value_counts() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +print('df["country"].value_counts()\n') + +display(df["country"].value_counts()) \ No newline at end of file diff --git a/content/solutions/05_21.py b/content/solutions/05_21.py index 4411e2d..fcfae9a 100644 --- a/content/solutions/05_21.py +++ b/content/solutions/05_21.py @@ -1,2 +1,23 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +print('df["country"] = df["country"].str.strip()\ndf["city"] = df["city"].str.strip()\n') + df["country"] = df["country"].str.strip() df["city"] = df["city"].str.strip() \ No newline at end of file diff --git a/content/solutions/05_22.py b/content/solutions/05_22.py index 0c3641f..8d64964 100644 --- a/content/solutions/05_22.py +++ b/content/solutions/05_22.py @@ -1 +1,22 @@ -df["country"].nunique() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +print('df["country"].nunique()') + +display(df["country"].nunique()) \ No newline at end of file diff --git a/content/solutions/05_23.py b/content/solutions/05_23.py index 571e1d7..f15504d 100644 --- a/content/solutions/05_23.py +++ b/content/solutions/05_23.py @@ -1 +1,22 @@ -df[df["country"] == "Portugal"] \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +print('df[df["country"] == "Portugal"]\n') + +display(df[df["country"] == "Portugal"]) \ No newline at end of file diff --git a/content/solutions/05_24.py b/content/solutions/05_24.py index 1260c3f..3df1249 100644 --- a/content/solutions/05_24.py +++ b/content/solutions/05_24.py @@ -1 +1,23 @@ -df["city"] = df["city"].str.lower() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +print('df["city"] = df["city"].str.lower()\n') + +df["city"] = df["city"].str.lower() +display(df[df["country"] == "Portugal"]) \ No newline at end of file diff --git a/content/solutions/05_25.py b/content/solutions/05_25.py index 0db9542..e7149ba 100644 --- a/content/solutions/05_25.py +++ b/content/solutions/05_25.py @@ -1 +1,24 @@ -df["city"][df["city"].str.contains("/")] \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +print('df["city"][df["city"].str.contains("/")]\n') + +display(df["city"][df["city"].str.contains("/")]) \ No newline at end of file diff --git a/content/solutions/05_26.py b/content/solutions/05_26.py index 218e64b..84d9087 100644 --- a/content/solutions/05_26.py +++ b/content/solutions/05_26.py @@ -1 +1,26 @@ -df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +print('df["city"] = df["city"].str.replace(r"/\w*", "", regex=True)\n') + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +# display(df["city"].tail()) \ No newline at end of file diff --git a/content/solutions/05_27.py b/content/solutions/05_27.py index 87c78b0..7426db5 100644 --- a/content/solutions/05_27.py +++ b/content/solutions/05_27.py @@ -1,3 +1,26 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + dict_codes = { "BG": "Bulgaria", "CZ": "Czech Republic", @@ -8,4 +31,6 @@ } country_in_codes = df["country"].isin(dict_codes.keys()) -df.loc[country_in_codes, "country"] = df.loc[country_in_codes, "country"].map(dict_codes) \ No newline at end of file +df.loc[country_in_codes, "country"] = df.loc[country_in_codes, "country"].map(dict_codes) + +print('df.loc[country_in_codes, "country"] = df.loc[country_in_codes, "country"].map(dict_codes)') \ No newline at end of file diff --git a/content/solutions/05_28.py b/content/solutions/05_28.py index e4b0e00..1acc786 100644 --- a/content/solutions/05_28.py +++ b/content/solutions/05_28.py @@ -1 +1,38 @@ -df.loc[df["city"] == "unknown", "country"] \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_codes = { + "BG": "Bulgaria", + "CZ": "Czech Republic", + "IT": "Italy", + "GR": "Greece", + "SI": "Slovenia", + "UK": "United Kingdom", +} + +country_in_codes = df["country"].isin(dict_codes.keys()) +df.loc[country_in_codes, "country"] = df.loc[country_in_codes, "country"].map(dict_codes) + +print('df.loc[df["city"] == "unknown", "country"]\n') + +display(df.loc[df["city"] == "unknown", "country"]) \ No newline at end of file diff --git a/content/solutions/05_29.py b/content/solutions/05_29.py index fc41745..a90e3df 100644 --- a/content/solutions/05_29.py +++ b/content/solutions/05_29.py @@ -1,3 +1,26 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + dict_capitals = { "Denmark": "copenhague", "France": "paris", @@ -7,4 +30,7 @@ } unknown_city = df["city"] == "unknown" -df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) \ No newline at end of file +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +print('unknown_city = df["city"] == "unknown"') +print('df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals)') \ No newline at end of file diff --git a/content/solutions/05_30.py b/content/solutions/05_30.py index 1f54a6c..cba83dd 100644 --- a/content/solutions/05_30.py +++ b/content/solutions/05_30.py @@ -1 +1,39 @@ -set(df["city"]) - dict_cities.keys() \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +print('set(df["city"]) - dict_cities.keys()\n') + +display(set(df["city"]) - dict_cities.keys()) diff --git a/content/solutions/05_31.py b/content/solutions/05_31.py index c4c9fec..2177ef0 100644 --- a/content/solutions/05_31.py +++ b/content/solutions/05_31.py @@ -1,3 +1,39 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + dict_cities.update( { "bristol": "United Kingdom", @@ -7,4 +43,6 @@ "murcia": "Spain", "parma": "Italy", }, -) \ No newline at end of file +) + +print("dict_cities.update(\n{\n\t\"bristol\": \"United Kingdom\",\n\t\"gothenburg\": \"Sweden\",\n\t\"graz\": \"Austria\",\n\t\"lyon\": \"France\",\n\t \"murcia\": \"Spain\",\n\t \"parma\": \"Italy\",\n},\n)\n") \ No newline at end of file diff --git a/content/solutions/05_32.py b/content/solutions/05_32.py index 3b4a6e2..3a0a599 100644 --- a/content/solutions/05_32.py +++ b/content/solutions/05_32.py @@ -1,2 +1,52 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + null_country = df["country"].isnull() -df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) \ No newline at end of file +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +print('null_country = df["country"].isnull()') +print('df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities)') \ No newline at end of file diff --git a/content/solutions/05_33.py b/content/solutions/05_33.py index 689a6ca..8085ccf 100644 --- a/content/solutions/05_33.py +++ b/content/solutions/05_33.py @@ -1 +1,53 @@ -df["country"].value_counts(dropna=False) \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +print('df["country"].value_counts(dropna=False)\n') + +display(df["country"].value_counts(dropna=False)) \ No newline at end of file diff --git a/content/solutions/05_34.py b/content/solutions/05_34.py index 63c70f8..eea2930 100644 --- a/content/solutions/05_34.py +++ b/content/solutions/05_34.py @@ -2,4 +2,12 @@ def f(x): if x == 1: return "single" else: - return "multiple" \ No newline at end of file + return "multiple" + +print('def f(x):') +print('\tif x == 1:') +print('\t\treturn "single"') +print('\telse:') +print('\t\treturn "multiple"\n') + +print('f(4) = ',f(4)) #x = 4 \ No newline at end of file diff --git a/content/solutions/05_35.py b/content/solutions/05_35.py index db3b33e..4e54e09 100644 --- a/content/solutions/05_35.py +++ b/content/solutions/05_35.py @@ -1 +1,62 @@ -df["Attendees"].apply(f) \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +print('df["Attendees"].apply(f)\n') + +display(df["Attendees"].apply(f)) diff --git a/content/solutions/05_36.py b/content/solutions/05_36.py index 82e7848..75ac002 100644 --- a/content/solutions/05_36.py +++ b/content/solutions/05_36.py @@ -1 +1,5 @@ -languages = pd.read_csv("../data/food_training/languages.csv") \ No newline at end of file +import pandas as pd + +languages = pd.read_csv("../data/food_training/languages.csv") + +print('languages = pd.read_csv("../data/food_training/languages.csv")') \ No newline at end of file diff --git a/content/solutions/05_37.py b/content/solutions/05_37.py index f712fd9..3277a91 100644 --- a/content/solutions/05_37.py +++ b/content/solutions/05_37.py @@ -1 +1,68 @@ -df = df.merge(languages, how="left", left_on="country", right_on="Country") \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +print('df = df.merge(languages, how="left", left_on="country", right_on="Country")\n') +display(df) \ No newline at end of file diff --git a/content/solutions/05_38.py b/content/solutions/05_38.py index f2614a1..4c27890 100644 --- a/content/solutions/05_38.py +++ b/content/solutions/05_38.py @@ -1,5 +1,75 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + df = df.drop("Country", axis=1) +print('df = df.drop("Country", axis=1)') + +display(df) + # N.B. You can only run this cell once! If you try run it again, it will throw an error! # Why? Because if you drop the Country column, it will be removed...so you can't # drop it a second time as the column isn't there to drop! \ No newline at end of file diff --git a/content/solutions/05_39.py b/content/solutions/05_39.py index d9d3e45..fc708eb 100644 --- a/content/solutions/05_39.py +++ b/content/solutions/05_39.py @@ -1 +1,71 @@ -df["DateFrom"].dtype \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +print('df["DateFrom"].dtype)\n') + +display(df["DateFrom"].dtype) \ No newline at end of file diff --git a/content/solutions/05_40.py b/content/solutions/05_40.py index eef6325..d0d8bfa 100644 --- a/content/solutions/05_40.py +++ b/content/solutions/05_40.py @@ -1,2 +1,73 @@ +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +print('df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d")\n') +print('df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d")') + df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") \ No newline at end of file diff --git a/content/solutions/05_41.py b/content/solutions/05_41.py index 3a34721..213b28e 100644 --- a/content/solutions/05_41.py +++ b/content/solutions/05_41.py @@ -1 +1,73 @@ -df[df["DateFrom"] > "2017-02-01"] \ No newline at end of file +import pandas as pd + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") +df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") + +print('df[df["DateFrom"] > "2017-02-01"]\n') +display(df[df["DateFrom"] > "2017-02-01"]) diff --git a/content/solutions/05_42.py b/content/solutions/05_42.py index 73aa5d3..4341063 100644 --- a/content/solutions/05_42.py +++ b/content/solutions/05_42.py @@ -1 +1,75 @@ -df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) \ No newline at end of file +import pandas as pd +import datetime + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") +df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") + +print('df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1)\n') +df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) +display(df.head()) \ No newline at end of file diff --git a/content/solutions/05_43.py b/content/solutions/05_43.py index 49c09b5..58cf6a7 100644 --- a/content/solutions/05_43.py +++ b/content/solutions/05_43.py @@ -1,2 +1,79 @@ +import pandas as pd +import datetime + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") +df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") + +df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) + +print('df["month"] = df["DateFrom"].dt.month\n') +print('df["month"].hist()\n') + df["month"] = df["DateFrom"].dt.month -df["month"].hist() \ No newline at end of file +# display(df) +display(df["month"].hist()) \ No newline at end of file diff --git a/content/solutions/05_44.py b/content/solutions/05_44.py index bad939c..1dedd50 100644 --- a/content/solutions/05_44.py +++ b/content/solutions/05_44.py @@ -1 +1,69 @@ -df.sort_values("city") \ No newline at end of file +import pandas as pd +import datetime + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +print('df.sort_values("city")\n') + +display(df.sort_values("city")) \ No newline at end of file diff --git a/content/solutions/05_45.py b/content/solutions/05_45.py index 3491680..b1bb085 100644 --- a/content/solutions/05_45.py +++ b/content/solutions/05_45.py @@ -1 +1,77 @@ -df.sort_values(["duration", "Attendees"], ascending=[True, False]) \ No newline at end of file +import pandas as pd +import datetime + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") +df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") + +df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) + +df["month"] = df["DateFrom"].dt.month + +print('df.sort_values(["duration", "Attendees"], ascending=[True, False])') +display(df.sort_values(["duration", "Attendees"], ascending=[True, False])) \ No newline at end of file diff --git a/content/solutions/05_46.py b/content/solutions/05_46.py index 67dd12e..f3856a6 100644 --- a/content/solutions/05_46.py +++ b/content/solutions/05_46.py @@ -1 +1,79 @@ -df_gr = df.groupby("city") \ No newline at end of file +import pandas as pd +import datetime + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") +df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") + +df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) + +df["month"] = df["DateFrom"].dt.month +df.sort_values(["duration", "Attendees"], ascending=[True, False]) + +print('df_gr = df.groupby("city")') +df_gr = df.groupby("city") + diff --git a/content/solutions/05_47.py b/content/solutions/05_47.py index c7520eb..21e7bb5 100644 --- a/content/solutions/05_47.py +++ b/content/solutions/05_47.py @@ -1 +1,80 @@ -df_gr["Attendees"].mean() \ No newline at end of file +import pandas as pd +import datetime + +df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) +df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) +df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) + +frames = [df_2014, df_2015, df_2016] +df = pd.concat(frames) + +df = df.reset_index() +df.index + +cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] +df = df.drop(cols_to_remove, axis=1) + +df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) + +df = df.drop("Location", axis=1) + +df["city"] = df["city"].str.lower() + +df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) + +dict_capitals = { + "Denmark": "copenhague", + "France": "paris", + "Italy": "rome", + "Spain": "madrid", + "United Kingdom": "london", +} + +unknown_city = df["city"] == "unknown" +df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) + +dict_cities = df.loc[df['country'].notnull(), ['city', 'country']].set_index('city').to_dict()['country'] + +dict_cities.update( + { + "bristol": "United Kingdom", + "gothenburg": "Sweden", + "graz": "Austria", + "lyon": "France", + "murcia": "Spain", + "parma": "Italy", + }, +) + +null_country = df["country"].isnull() +df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) + +df["country"].value_counts(dropna=False) + + +def f(x): + if x == 1: + return "single" + else: + return "multiple" + +df["Attendees"].apply(f) + +languages = pd.read_csv("../data/food_training/languages.csv") + +df = df.merge(languages, how="left", left_on="country", right_on="Country") + +df = df.drop("Country", axis=1) + +df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") +df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") + +df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) + +df["month"] = df["DateFrom"].dt.month +df.sort_values(["duration", "Attendees"], ascending=[True, False]) + +df_gr = df.groupby("city") + +print('df_gr["Attendees"].mean()') +display(df_gr["Attendees"].mean()) \ No newline at end of file