diff --git a/README.md b/README.md
index f8d0424..e49afab 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,8 @@
# JupyterLite Demo
-[![lite-badge](https://jupyterlite.rtfd.io/en/latest/_static/badge.svg)](https://jupyterlite.github.io/demo)
+[![lite-badge](https://jupyterlite.rtfd.io/en/latest/_static/badge.svg)](https://humbledata.org/online-workshop/lab/index.html)
+
+> This repository holds the contents for the HumbelData workshop using JupyterLite.
JupyterLite deployed as a static site to GitHub Pages, for demo purposes.
diff --git a/content/notebooks/1. Beginning with Python.ipynb b/content/notebooks/1. Beginning with Python.ipynb
index f393375..5d56c1e 100644
--- a/content/notebooks/1. Beginning with Python.ipynb
+++ b/content/notebooks/1. Beginning with Python.ipynb
@@ -1,8 +1,28 @@
{
+ "metadata": {
+ "kernelspec": {
+ "name": "python3",
+ "language": "python",
+ "display_name": "Python 3 (ipykernel)"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "python",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8"
+ }
+ },
+ "nbformat_minor": 4,
+ "nbformat": 4,
"cells": [
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"
\n",
"
\n",
@@ -11,11 +31,11 @@
"Introduction to Python\n",
"\n",
"
"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"## Welcome to learning Python!\n",
"\n",
@@ -24,108 +44,139 @@
"We will walk you through different aspects of the Python language interactively. Take your time to experiment with it if you like and don't hesitate to ask or Google things. One of the first lessons in programming is using the documentation and information shared by other programmers. Jupyter has another trick, where you can put your cursor into the brackets of a function call such as `print()` and hit **Shift + Tab**. This will let you see the documentation of a function directly in the notebook (*pro-tip*: you can hit it up to four times for different effects).\n",
"\n",
"As for Python itself, one of the big strengths of Python is the extensibility. It has a surprising amount of functionality directly within the core language, however, you can `import` almost arbitrary code others make available as libraries. Below you can see a special import that shows you the \"Zen of Python\" a set of guidelines that could guide your programming journey.\""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# We can use comments to document our code in a coding cell.\n",
+ "import this # Zen of Python"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# We can use comments to document our code in a coding cell.\n",
- "import this # Zen of Python"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**--> Simple is better than complex.** \n",
"It is really easy to print ***Hello World*** in Python:"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "print('Hello World!')"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "print('Hello World!')"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**PEP 8** is Python's style guide. You can find it here:\n",
"https://www.python.org/dev/peps/pep-0008/\n",
"\n",
"It is good to know about, but when you become a professional programmer there are programs called a \"linter\" that will help you adhere to PEP 8."
- ]
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Before you get started, we just need to do a small amount of set up. We're going to load a module called NumPy which we'll need to complete this notebook. Don't stress if you don't understand this code - it's specific to the JupyterLite notebooks we're using for this course."
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import pyodide_js\n",
+ "\n",
+ "# Install NumPy\n",
+ "await pyodide_js.loadPackage('numpy')"
+ ],
+ "metadata": {
+ "collapsed": false
+ }
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"
\n",
"Variables\n",
"
\n",
"
"
- ]
+ ],
+ "metadata": {
+ "collapsed": false
+ }
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"In programming it's very useful to store values. Accessing values through names is called a variable. Python is very user-friendly in that it will let you store most things in a variable, without making space in the computer's space explicitly. Moreover, computers need to differentiate between the type of data, such as, `5` being an integer and `'Hello'` being a string, but Python attempts to handle these intuitively for you. You will learn about the different types in the following sections!\n",
"\n",
"**Python is an object oriented programming language. You do not need to declare variables (or their types) before using them as every variable in Python is an object.** "
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "name = 'Sandrine'"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "name = 'Sandrine'"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "print(name)"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "print(name)"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Moreover, variables can easily be updated. Try it out below!"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -139,200 +190,219 @@
"\n",
"\n",
"---"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Assign another string to the variable 'name' and print this variable"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_01.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_01.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**!!! Variable can can change type when re-assigned.**"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Assign a number (without quotes) to the variable 'name' and print this variable"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_02.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_02.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"
\n",
"Strings\n",
"
\n",
"
"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"A string always begins and ends with a single ( ' ) or double ( \\\" ) quotes. There is no difference, except if there is an apostrophe ( ' ) inside the string."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "'Beginners Data Workshop'"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "'Beginners Data Workshop'"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "\"Beginner's Data Workshop\""
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "\"Beginner's Data Workshop\""
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Write *I'm enjoying this workshop!* using double quotes, and then single quotes."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# double quotes\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# double quotes\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_03.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_03.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# single quotes\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# single quotes\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_04.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_04.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Oh no, an error!** Errors are nothing to be afraid of. Think of them as friendly messages trying to help you understand what's gone wrong. Here's how to read this:\n",
"- Read errors backwards, so start at the bottom! It's a \"SyntaxError\" which means there's something wrong with our Python code.\n",
@@ -342,294 +412,321 @@
"**It turns out, if you want to use single quotes inside single quotes, you need to \"escape\" the quote with a backslash ( \\\\ ).**\n",
"\n",
">Let's try that!"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# single quotes, second try\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# single quotes, second try\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_05.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_05.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"As mentioned before, Python tries to handle everything as intuitively as possible. That means strings can be added together."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "'We are ' + 'everywhere around the world.'"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "'We are ' + 'everywhere around the world.'"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Also, strings can even be multiplied by a number."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "'Great! 🎉' * 3"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "'Great! 🎉' * 3"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "'😂' * 50"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "'😂' * 50"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can access parts of strings by slicing the string. A character of a string (which is considered as a string of length 1 by Python) using slice, and a substring using slice range. Something important to note:\n",
"\n",
"**!!! Like birthdays, Python starts counting from 0! (It is \"zero-indexed\")** \n",
"\n",
"(If you're interested why, here's a historical [letter from Dijkstra](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) about it.)"
- ]
+ ],
+ "metadata": {}
},
{
+ "cell_type": "markdown",
+ "source": [
+ "![image.png](attachment:image.png)"
+ ],
+ "metadata": {},
"attachments": {
"image.png": {
"image/png": ""
}
- },
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "![image.png](attachment:image.png)"
- ]
+ }
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "s = 'I am a Pythonista.'"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "s = 'I am a Pythonista.'"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"***Slice*** \n",
"We can access characters of a string by referencing the position (\"index\") numbers within square brackets."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# Selecting the first character of the string s\n",
+ "s[0]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# Selecting the first character of the string s\n",
- "s[0]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"> Select the last character of the string s"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_06.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_06.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"***Slice range*** \n",
"We can get a range of characters of a string by using a slicing range."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# Select from position 2 up to but not including position 6 of the string s\n",
+ "s[2:6] # 6 is excluded"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# Select from position 2 up to but not including position 6 of the string s\n",
- "s[2:6] # 6 is excluded"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can skip the start (resp. stop) number `s[:6]`. Then it start form index 0 (resp. end at the highest index)."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Select the last 3 characters of the string s"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_07.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_07.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can check if a string contains another string inside of it (a \"substring\"), using *in* and *not in*.\n",
"\n",
"**!!! Python is case-sensitive! The string `'a'` is not equal to `'A'`***"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
"source": [
"# Checking if s contains 'python'\n",
"print(s)\n",
"'python' in s"
- ]
+ ],
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false
+ },
+ "collapsed": false,
+ "trusted": true
+ },
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"> Check if s does not contain 'I' (capital i)"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_08.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_08.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -637,409 +734,442 @@
"Numbers\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Python has different numerical types. Two of which are used more often than others:\n",
"- **Integers:** These are whole numbers, i.e. `1`, `2`, `-5`\n",
"- **Floats:** \"Floating point\" numbers are those with a decimal point, i.e. `3.14158`, `2.5`, `0.1` and even `3.0`. \n",
"\n",
"Python attempts to deal with these numbers intuitively when integers and floats are mixed."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"### Basic Operators"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Addition:** \n",
- ">Try adding two numbers."
- ]
+ ">Try adding 3 and 4 together."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_09.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_09.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Subtraction:**\n",
- "> Try subtracting two numbers."
- ]
+ "> Try subtracting 6 from 10.0."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_10.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_10.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Multiplication:** \n",
"The sign for multiplication is *. \n",
- ">Try multiplying two numbers."
- ]
+ ">Try multiplying 15 and 12."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_11.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_11.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Exponent:** \n",
"The sign for exponent (or power) is *`**`*.\n",
"\n",
- "> Try the power of an integer."
- ]
+ "> Try raising 2 to the power of 6."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_12.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_12.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Try the power of a float"
- ]
+ ">Try raising a float to a power this time. Calculate the square of 3.1."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_13.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_13.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Try the power of an integer written as a float (e.g. 12.0)"
- ]
+ ">Now try something similar - raise an integer written as a float to a power. Calculate the square of 5, where 5 is written as a float."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_14.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_14.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Division and modulo**\n",
"- The sign for division is **`/`**.\n",
"- The sign for floor division is **`//`**. It returns the \"quotient\" of a division - how many times one number goes into another. (\"If six people can sit around a dinner table, and we have 99 guests coming, how many tables do we need?\")\n",
"- The **`%`** sign is the [modulo](https://en.wikipedia.org/wiki/Modulo_operation), which returns the \"remainder\" after division."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Divide 6 by 2"
- ]
+ ">Divide 6 by 2."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_15.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_15.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Try the floor division of 6 by 2"
- ]
+ ">Try the floor division of 6 by 2."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_16.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_16.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Divide 19 by 5"
- ]
+ ">Divide 19 by 5."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_17.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_17.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"*Note*: division returns a float."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**The floor division returns an int, the non-fractional part.** \n",
- ">Try the floor division of 19 by 5"
- ]
+ ">Try the floor division of 19 by 5."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_18.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_18.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Modulo returns the remainder of the division**"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Calculate 19 modulo 5"
- ]
+ ">Calculate 19 modulo 5."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_19.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_19.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**--> 19 = 3 * 5 + 4**"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1047,44 +1177,48 @@
"Order of operations\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"The order of operations in Python respects the usual rules of mathematics (brackets -> powers -> division/multiplication -> add/subtract). **If in doubt, use brackets** to make it clear (to yourself, and anyone who might be reading your code) what you're trying to do."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "5 + 6 * 10"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "5 + 6 * 10"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "(5 + 6) * 10"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "(5 + 6) * 10"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1092,110 +1226,119 @@
"Booleans\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Booleans are the two constant values `True` and `False`. Python implements the concept of \"truthiness\", that means their numerical values are 1 and 0. These values are especially important for comparisons, therefore, we'll also learn about a new operator:\n",
"\n",
"**`==`** is an equality operator, different from **`=`** which is the assignment operator you used to assign variables."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "True == 1 # '==' is an equality operator, different from '=' which is an assignment operator"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "True == 1 # '==' is an equality operator, different from '=' which is an assignment operator"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "False * 3"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "False * 3"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Using **`!=`** , check if False is not equal to 2.\""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_20.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_20.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Check if the length of your name is greater than 8."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "len()"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_21.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_21.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can use `and` and `or` operators with booleans in Python. Type them below and see how the color of the text changes, because Python recognises the keyword! These can be used to chain together multiple comparisons. These follow mathematical logic.\n",
"\n",
@@ -1207,35 +1350,38 @@
"| False | False | False | False |\n",
"\n",
"> Check if the length of your name is greater than 5 and the length of your mentor's name is less than 7."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_22.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_22.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1243,397 +1389,433 @@
"Lists\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"A list is a list of comma-separated values between square brackets.\n",
"\n",
"The items of a list can have different types."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "list_greeting = ['Hallo', 'Bonjour', 10, 'Hello', 'Ciao', False]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can access a single value using a slice, and several values using slice range. Check the \"string slicing\" section above for this."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Get the first item of the list."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_23.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_23.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Get every the items from the list, starting with the 4th one.\n",
"\n",
"Note that there is no need to put a number after the colon when when want to select until the end of the list."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Get the items starting with the one with index 3 until the end of the list."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_24.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_24.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Get the items from the list until the 4th one.**"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Get the items from the beginning of the list until the value with index 4 (index 4 is excluded)."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_25.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_25.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**Advanced slicing allows us to set how the list's index will increment between the start/stop indexes we select.** \n",
"The slicing then looks like this: [start:stop:step] \n",
"For example, if we want to select every third items of a list, we will set the step as 3."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Get every other items from the list of greetings"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_26.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_26.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**We can update a list by re-assigning a value selected using a slice.** \n",
"For example, we can replace False with 'Ave' in list_greeting."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "list_greeting[-1] = 'Ave'\n",
+ "print(list_greeting)"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "list_greeting[-1] = 'Ave'\n",
- "print(list_greeting)"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Replace 10 with Hola in list_greetings, then print list_greeting to check it."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_27.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_27.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can have lists inside a list, these are often called \"nested lists\":"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "list_of_lists = [[1, 2, 3], [4, 5]]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "list_of_lists = [[1, 2, 3], [4, 5]]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "list_of_lists[0] # access the first element of the list, which is a list."
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "list_of_lists[0] # access the first element of the list, which is a list."
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "list_of_lists[0][-1] # access the last element of the first list."
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "list_of_lists[0][-1] # access the last element of the first list."
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can concatenate lists using ' + '."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "[1, 2, 3] + [4, 5, 6]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "[1, 2, 3] + [4, 5, 6]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can also use the multiplication to repeat values in a list (only works with integers)."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "['Hey'] * 5"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "['Hey'] * 5"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can also use `in` / `not in` with lists."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Check if `10` is in `list_greeting`."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_28.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_28.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Check if 'Ole' is not in list_greeting."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_29.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_29.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1641,216 +1823,236 @@
"Built-in functions\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"You've seen your first function at the very beginning of this notebook, i.e. `print(\"Hello World\")`. Functions are a way for you to write reusable code. In the case before `print()` is Python's way for you to show an output you provide. However, there are many other useful functions that Python comes with, that are provided by other libraries, or written by you yourself to save you from repeating some code.\n",
"\n",
"The functions that are always available in Python can be found here:\n",
"https://docs.python.org/3/library/functions.html"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Print *'Here we are!'*."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_30.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_30.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Compute the length of the string variable `snakes`."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
"source": [
"snakes = \"🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍\""
- ]
+ ],
+ "metadata": {
+ "trusted": true
+ },
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_31.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_31.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Compute the length of list_greeting."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_32.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_32.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Work out the maximum of 1, 2, 3, 4 and 5."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_33.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_33.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Round the number 123.45 to the nearest integer."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_34.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_34.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Round the number 123.45 to 1 decimal place."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_35.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_35.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1858,64 +2060,69 @@
"Methods\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"A method is a function associated to an object. Basically, it provides a way for an object to know functions about themselves.\n",
"\n",
"For example, we can change the string `s` to upper case using the `.upper()` method."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "s.upper()"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "s.upper()"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Using the append method, add 'Aloha' to list_greeting."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_36.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_36.py"
- ]
+ "execution_count": 3,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1923,20 +2130,20 @@
"Importing modules\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can import modules (Python code that can define functions, classes and variables).\n",
"\n",
"Usually, we write all the imports at the beginning of a Python program or notebook."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**A small warning** \n",
"Ensure you trust the packages you're installing!\n",
@@ -1946,77 +2153,84 @@
"2. Do I trust giving their code access to my computer?\n",
"3. `pip install` especially can be a little more dangerous, anyone can upload something to the Python Package Index (\"PyPI\" for short), which is where `pip` installs from. So be very careful especially as the smallest typo can be a security risk!\n",
"4. `conda install` is more of a walled garden and as such better positioned for enterprise usage."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"> From the **math** library, import **sqrt** to work out the square root of **24 336**."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_37.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_37.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Python programmers really are lazy! (Or obsessive about productivity, take your pick!)\n",
"\n",
"Hence, you can define aliases for imports. The `as np` is two characters long, whereas `numpy` is five. It's less to type, when you write long programs.\n",
"\n",
"> Import `numpy as np` and try **`np.sin(np.pi/4)`**."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/01_38.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/01_38.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -2024,28 +2238,8 @@
"Take a well deserved break, brew a relaxing beverage of your choice or go have a snack, maybe share a picture of your snack/drink in the #random channel, we love to see pictures from everyone attending around the world. ☕️\n",
"\n",
"![](https://media0.giphy.com/media/3otPoS81loriI9sO8o/200.gif)"
- ]
+ ],
+ "metadata": {}
}
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python (beginners-data-workshop)",
- "language": "python",
- "name": "beginners-data-workshop"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.13"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
+ ]
}
diff --git a/content/notebooks/2. First steps with Pandas.ipynb b/content/notebooks/2. First steps with Pandas.ipynb
index 85395c0..eb52b7c 100644
--- a/content/notebooks/2. First steps with Pandas.ipynb
+++ b/content/notebooks/2. First steps with Pandas.ipynb
@@ -1,8 +1,28 @@
{
+ "metadata": {
+ "kernelspec": {
+ "name": "python",
+ "display_name": "Python (Pyodide)",
+ "language": "python"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "python",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8"
+ }
+ },
+ "nbformat_minor": 4,
+ "nbformat": 4,
"cells": [
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"
"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"> ***Note***: This notebook contains solution cells with ***a*** solution. Remember there is not only one solution to a problem! \n",
"> \n",
"> You will recognise these cells as they start with **# %**. \n",
"> \n",
"> If you would like to see the solution, you will have to remove the **#** (which can be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"
\n",
"Data analysis packages\n",
"
\n",
"
"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Data Scientists use a wide variety of libraries in Python that make working with data significantly easier. Those libraries primarily consist of:\n",
"\n",
@@ -55,12 +75,31 @@
"\n",
"Though there are countless others available.\n",
"\n",
- "For today, we'll primarily focus ourselves around the library that is 99% of our work: `pandas`. Pandas is built on top of the speed and power of NumPy."
- ]
+ "For today, we'll primarily focus ourselves around the library that is 99% of our work: `pandas`. Pandas is built on top of the speed and power of NumPy.\n",
+ "\n",
+ "Run the code below to get the imports we need for this notebook."
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import pyodide_js\n",
+ "\n",
+ "# Install NumPy\n",
+ "await pyodide_js.loadPackage('numpy')\n",
+ "\n",
+ "# Install Pandas\n",
+ "await pyodide_js.loadPackage('pandas')"
+ ],
+ "metadata": {
+ "trusted": true
+ },
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -68,55 +107,48 @@
"Imports\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "import pandas as pd"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "import pandas as pd"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Import numpy using the convention seen at the end of the first notebook."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_01.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "code",
"execution_count": null,
- "metadata": {
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "# %load ../solutions/02_01.py"
- ]
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -124,121 +156,132 @@
"Loading the data\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"To see a method's documentation, you can use the help function. In Jupyter, you can also just put a question mark before the method."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "?pd.read_csv"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
},
- "scrolled": true
+ "scrolled": true,
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "?pd.read_csv"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"To load the dataframe we are using in this notebook, we will provide the path to the file: ../data/Penguins/penguins.csv"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Load the dataframe, read it into a pandas DataFrame and assign it to df"
- ]
+ ">Load the dataframe, read it into a pandas DataFrame and assign it to df."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_02.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_02.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**To have a look at the first 5 rows of df, we can use the *head* method.**"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df.head()"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df.head()"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Have a look at the last 3 rows of df using the tail method"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_03.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_03.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -246,112 +289,121 @@
"General information about the dataset\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**To get the size of the datasets, we can use the *shape* attribute.** \n",
"The first number is the number of row, the second one the number of columns"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Show the shape of df (do not put brackets at the end)"
- ]
+ ">Show the shape of df (do not put brackets at the end)."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_04.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_04.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Get the names of the columns and info about them (number of non null and type) using the info method."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_05.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_05.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Get the columns of the dataframe using the columns attribute."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_06.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_06.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -359,62 +411,55 @@
"Display settings\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can check the display option of the notebook."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "pd.set_option('display.max_rows', [number of rows])"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "pd.options.display.max_rows"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Force pandas to display 25 rows by changing the value of the above."
- ]
+ ">Force pandas to display 25 rows by changing the value of [number of rows] above."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_07.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "code",
"execution_count": null,
- "metadata": {
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "# %load ../solutions/02_07.py"
- ]
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -422,270 +467,302 @@
"Subsetting data\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can subset a dataframe by label, by index or a combination of both. \n",
"There are different ways to do it, using .loc, .iloc and also []. \n",
"See [documentation ](https://pandas.pydata.org/pandas-docs/stable/indexing.html)."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Display the 'bill_length_mm' column"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_08.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_08.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"*Note:* We could also use `df.bill_length_mm`, but it's not the greatest idea because it could be mixed with methods and does not work for columns with spaces."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Have a look at the 12th observation:"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# using .iloc (uses positions, \"i\" stands for integer)\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# using .iloc (uses positions, \"i\" stands for integer)\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_09.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_09.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# using .loc (uses indexes and labels)\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# using .loc (uses indexes and labels)\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_10.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_10.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Display the **bill_length_mm** of the last three observations."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# using .iloc\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# using .iloc\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_11.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_11.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# using .loc\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# using .loc\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_12.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
},
- "scrolled": true
+ "scrolled": true,
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_12.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"And finally look at the **flipper_length_mm** and **body_mass_g** of the 146th, the 8th and the 1rst observations:"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# using .iloc\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# using .iloc\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_13.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_13.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# using .loc\n"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# using .loc\n"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_14.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_14.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**!!WARNING!!** Unlike Python and ``.iloc``, the end value in a range specified by ``.loc`` **includes** the last index specified. "
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df.iloc[5:10]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
},
- "scrolled": true
+ "scrolled": true,
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df.iloc[5:10]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df.loc[5:10]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df.loc[5:10]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -693,85 +770,92 @@
"Filtering data on conditions\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**We can also use condition(s) to filter.** \n",
"We want to display the rows of df where **body_mass_g** is greater than 4000. We will start by creating a mask with this condition."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "mask_PW = df['body_mass_g'] > 4000\n",
+ "mask_PW"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
},
- "scrolled": true
+ "scrolled": true,
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "mask_PW = df['body_mass_g'] > 4000\n",
- "mask_PW"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Note that this return booleans. If we pass this mask to our dataframe, it will display only the rows where the mask is True."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df[mask_PW]"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df[mask_PW]"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Display the rows of df where **body_mass_g** is greater than 4000 and **flipper_length_mm** is less than 185."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_15.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_15.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -779,65 +863,70 @@
"Values\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can get the number of unique values from a certain column by using the `nunique` method.\n",
"\n",
"For example, we can get the number of unique values from the species column:"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df['species'].nunique()"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df['species'].nunique()"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can also get the list of unique values from a certain column by using the `unique` method.\n",
">Return the list of unique values from the species column"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_16.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_16.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -845,341 +934,373 @@
"Null Values and NaN\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"When you work with data, you will quickly learn that data is never \"clean\". These values are usually referred to as null value. In computation it is best practice to define a \"special number\" that is \"**N**ot **a** **N**umber\" also called NaN.\n",
"\n",
"We can use the `isnull` method to know if a value is null or not. It returns boolean values."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df['flipper_length_mm'].isnull()"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df['flipper_length_mm'].isnull()"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"**We can apply different methods one after the other.**. \n",
"For example, we could apply to method `sum` after the method `isnull` to know the number of null observations in the **flipper_length_mm** column.\n",
">Get the total number of null values for **flipper_length_mm**."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
"source": [
- "# %load ../solutions/02_17.py"
- ]
+ "# %run ../solutions/02_17.py"
+ ],
+ "metadata": {
+ "trusted": true
+ },
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"To get the count of the different values of a column, we can use the `value_counts` method.\n",
"\n",
"For example, for the species column:"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df['species'].value_counts()"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df['species'].value_counts()"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"If we want to know the count of NaN values, we have to pass the value `False` to the parameter **dropna** (set to `True` by default).\n",
"> Return the proportion for each sex, including the NaN values.\""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_18.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_18.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"To get the proportion instead of the count of these values, we have to pass the value `True` to the parameter **normalize**.\n",
">Return the proportion for each species."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_19.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_19.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Using the index attribute, get the indexes of the observation without **flipper_length_mm**"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_20.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_20.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"Use the **[dropna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)** method to remove the row which only has NaN values.\n",
">Get the help for the dropna method."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_21.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
},
- "scrolled": true
+ "scrolled": true,
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_21.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Use the dropna method to remove the row of `df` where all of the values are NaN, and assign it to `df_2`."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_22.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_22.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can use a f-string to format a string. We have to write a `f` before the quotation mark, and write what you want to format between curly brackets."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "print(f'shape of df: {df.shape}')"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "print(f'shape of df: {df.shape}')"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"> Print the number of rows of `df_2` using a f_string. Did we lose any rows between `df` and `df_2`? If not, why not?"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_23.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_23.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Use the dropna method to remove the rows of `df_2` which contains any NaN values, and assign it to `df_3`"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_24.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_24.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Print the number of rows of `df_3` using a f_string."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_25.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_25.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1187,64 +1308,69 @@
"Duplicates\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Remove the duplicates rows from `df_3`, and assign the new dataframe to `df_4`"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_26.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
},
- "scrolled": true
+ "scrolled": true,
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_26.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# checking the shape of df_4\n",
+ "df_4.shape"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# checking the shape of df_4\n",
- "df_4.shape"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"You should see that 4 rows have been dropped. "
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1252,185 +1378,208 @@
"Some stats\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Use the describe method to see how the data is distributed (numerical features only!)"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_27.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_27.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can also change the **species** column to save memory space. Note: You may receive a **SettingWithCopyWarning** - you can safely ignore this error for this notebook."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df_4['species'] = df_4['species'].astype('category')"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df_4['species'] = df_4['species'].astype('category')"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Using the dtypes attribute, check the types of the columns of `df_4`"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_28.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_28.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can also use the functions count(), mean(), sum(), median(), std(), min() and max() separately if we are only interested in one of those."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Get the minimum for each numerical column of `df_4`"
- ]
+ ">Get the minimum for each numerical column of `df_4`. Make sure to include the argument `numeric_only=True` in the function to filter results to only numeric columns."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_29.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_29.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- ">Calculate the maximum of the **flipper_length_mm**"
- ]
+ ">Calculate the maximum of the **flipper_length_mm**."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_30.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_30.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"We can also get information for each species using the `groupby` method.\n",
"\n",
"\n",
- "> Get the median for each **species**."
- ]
+ "> Get the median for each **species**. Again, make sure to include the argument `numeric_only=True` in the function to filter results to only numeric columns."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
+ "source": [],
+ "metadata": {},
"execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# %run ../solutions/02_31.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "# %load ../solutions/02_31.py"
- ]
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -1438,59 +1587,42 @@
"Saving the dataframe as a csv file\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
">Save df_4 using this path: `'../data/Penguins/my_penguins.csv'`"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "# %run ../solutions/02_32.py"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
- },
- "outputs": [],
- "source": [
- "# %load ../solutions/02_32.py"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (system-wide)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
+ },
+ "collapsed": false,
+ "trusted": true
},
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
+ "execution_count": null,
+ "outputs": []
}
- },
- "nbformat": 4,
- "nbformat_minor": 4
+ ]
}
diff --git a/content/notebooks/3.1 Visualization with Matplotlib.ipynb b/content/notebooks/3.1 Visualization with Matplotlib.ipynb
index 5385877..567fb7f 100644
--- a/content/notebooks/3.1 Visualization with Matplotlib.ipynb
+++ b/content/notebooks/3.1 Visualization with Matplotlib.ipynb
@@ -1,8 +1,28 @@
{
+ "metadata": {
+ "kernelspec": {
+ "name": "python",
+ "display_name": "Python (Pyodide)",
+ "language": "python"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "python",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8"
+ }
+ },
+ "nbformat_minor": 4,
+ "nbformat": 4,
"cells": [
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"
",
+ "image/png": ""
+ },
+ "metadata": {}
+ }
]
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"---\n",
"\n",
@@ -207,193 +319,241 @@
"Exercise: Compare bill length of different species of penguin\n",
" \n",
""
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- "Let's use box plot to compare the bill length of different species of penguin. We need the DataFrame to be slightly different so we can compare the different type species of penguin. We would like to pivot the data so each column are bill length of different species of penguin."
- ]
+ "Let's use box plot to compare the bill length of different species of penguin. We need the DataFrame to be slightly different so we can compare the different species of penguin. We would like to pivot the data so each column are bill length of different species of penguin."
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"#### Prepare the data set"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df_pivot = df.pivot(index=None, columns='species', values='bill_length_mm')\n",
+ "# tell the pivot() method to make the 'species' as columns, and using the 'bill_length_mm' as the value"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df_pivot = df.pivot(index=None, columns='species', values='bill_length_mm')\n",
- "# tell the pivot() method to make the 'species' as columns, and using the 'bill_length_mm' as the value"
- ]
+ "execution_count": 8,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df_pivot.sample(10)"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df_pivot.sample(10)"
+ "execution_count": 9,
+ "outputs": [
+ {
+ "execution_count": 9,
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": "species Adelie Chinstrap Gentoo\n200 NaN 51.5 NaN\n180 NaN 46.4 NaN\n136 35.6 NaN NaN\n30 39.5 NaN NaN\n199 NaN 49.0 NaN\n337 NaN NaN 48.8\n165 NaN 52.0 NaN\n129 44.1 NaN NaN\n194 NaN 50.9 NaN\n155 NaN 45.4 NaN",
+ "text/html": "
\n\n
\n \n
\n
species
\n
Adelie
\n
Chinstrap
\n
Gentoo
\n
\n \n \n
\n
200
\n
NaN
\n
51.5
\n
NaN
\n
\n
\n
180
\n
NaN
\n
46.4
\n
NaN
\n
\n
\n
136
\n
35.6
\n
NaN
\n
NaN
\n
\n
\n
30
\n
39.5
\n
NaN
\n
NaN
\n
\n
\n
199
\n
NaN
\n
49.0
\n
NaN
\n
\n
\n
337
\n
NaN
\n
NaN
\n
48.8
\n
\n
\n
165
\n
NaN
\n
52.0
\n
NaN
\n
\n
\n
129
\n
44.1
\n
NaN
\n
NaN
\n
\n
\n
194
\n
NaN
\n
50.9
\n
NaN
\n
\n
\n
155
\n
NaN
\n
45.4
\n
NaN
\n
\n \n
\n
"
+ },
+ "metadata": {}
+ }
]
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"#### Box plot of df_pivot\n",
"\n",
"Now we can use `plot()` on `df_pivot`. To make a box plot, remember to set the parameter `kind` to 'box'. Also make the presentation nice by setting a good `figsize` and with a good `title`. Don't forget the `legend`."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"#### Additional exercise\n",
"\n",
"Challenge yourself by making your own `df_pivot` pivoting on a different measure (e.g. Body Mass). Also try using a histogram (hist) instead of a boxplot. You can also try making a plot with 3 subplots, each is a histogram of a type of penguin."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false
},
- "outputs": [],
- "source": []
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"So far we are not using `matplotlib.pyplot` directly. Although it is very convenient to use `df.plot()`, sometimes we would like to have more control with what we are plotting and make more complex graphs. In the following sections, we will use `matplotlib.pyplot` (which is imported as `plt` now) directly."
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
- "### Divide the data into 3 types accordingly"
- ]
+ "### Divide the data into 3 types accordingly\n",
+ "\n",
+ "In order to create the following plots, we need to create different pandas DataFrames for each penguin species."
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df['species'].unique()"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df['species'].unique()"
+ "execution_count": 10,
+ "outputs": [
+ {
+ "execution_count": 10,
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": "array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)"
+ },
+ "metadata": {}
+ }
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df_adelie = df[df['species'] == 'Adelie']"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df_adelie = df[df['species'] == 'Adelie']"
- ]
+ "execution_count": 11,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df_chinstrap = df[df['species'] == 'Chinstrap']"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df_chinstrap = df[df['species'] == 'Chinstrap']"
- ]
+ "execution_count": 12,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "source": [
+ "df_gentoo = df[df['species'] == 'Gentoo']"
+ ],
"metadata": {
"jupyter": {
"outputs_hidden": false
- }
+ },
+ "collapsed": false,
+ "trusted": true
},
- "outputs": [],
- "source": [
- "df_gentoo = df[df['species'] == 'Gentoo']"
- ]
+ "execution_count": 13,
+ "outputs": []
},
{
"cell_type": "markdown",
- "metadata": {},
"source": [
"### Scatter plot example: plot on Bill Length and Width"
- ]
+ ],
+ "metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {
- "jupyter": {
- "outputs_hidden": false
- }
- },
- "outputs": [],
"source": [
"plt.scatter(df_adelie['bill_length_mm'], df_adelie['bill_depth_mm'], c='r')\n",
"plt.scatter(df_chinstrap['bill_length_mm'], df_chinstrap['bill_depth_mm'], c='g')\n",
"plt.scatter(df_gentoo['bill_length_mm'], df_gentoo['bill_depth_mm'], c='b')"
+ ],
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false
+ },
+ "collapsed": false,
+ "trusted": true
+ },
+ "execution_count": 14,
+ "outputs": [
+ {
+ "execution_count": 14,
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": ""
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "