diff --git a/Teaching/JHU/DataMining-553.436/Getting Started with SQL using SciServer.CasJobs.ipynb b/Teaching/JHU/DataMining-553.436/Getting Started with SQL using SciServer.CasJobs.ipynb deleted file mode 100644 index fb822bf..0000000 --- a/Teaching/JHU/DataMining-553.436/Getting Started with SQL using SciServer.CasJobs.ipynb +++ /dev/null @@ -1,3542 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Getting Started with SQL \n", - "\n", - "#### TAMÁS BUDAVÁRI – AUGUST 2016 \n", - " \n", - "Databases are easy: they consist of tables, which in turn have columns – just like the files with which you \n", - "are probably used to working. The difference is that you don’t have to think about how to read and write \n", - "the files but can focus on the questions you try to answer. The following code snippets and exercises will \n", - "teach you the basics of expressing your questions in the Structure Query Language, or SQL for short. SQL \n", - "is a standard language and most of these commands will work on any relational database but there are \n", - "minor differences in dialects. \n", - " \n", - "The database you will be querying stores a collection of measurements of some (X,Y) quantities. There is \n", - "a relation between them but the Y measurements are noisy. Our database is synthetic but carries several \n", - "aspects of real measurements: separate instruments, multiple students and runs of measurements in \n", - "different observational domains. The relevant tables are Data, Runs, Instruments and Users. The names \n", - "should be suggestive of their contents.\n", - "\n", - "You have tried out some SQL queries using the online CasJobs web application. Here we use many of these same queries as examples for accessing a set of relational databases from within Jupyter notebooks running python inside a \n", - "SciServer/Compute container.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## SciServer.CasJobs\n", - "From a python notebook in SciServer/Compute you can submit SQL queries to the same service that runs queries from the CasJobs webpage. The advantage is that you can directly access the result and visualize it or analyze it in any way you want.\n", - "\n", - "To do so you must import a library that is available on all the SciServer Compute images." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import SciServer.CasJobs as cj" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## The First Queries \n", - "To get all (X,Y) values from the table Data, you could use the following SQL:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "``` sql\n", - "-- never run queries like this \n", - "select X, Y \n", - "from Data \n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "But don’t do it! First you should always think about what will happen. \n", - "\n", - "The table might have hundreds of millions of rows! Do you really want all that data dumped on you? Try \n", - "the next set of commands to see their effects and understand how they work. Consider them as \n", - "illustrations accompanying the lecture and ask questions if something is not clear! " - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "sql=\"\"\"\n", - "-- have a quick peek \n", - "select top 5 X, Y \n", - "from Data \n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To execute this query you will use the SciServer.CasJobs library, which we aliased to 'cj' in the import statement.\n", - "Its 'executeQuery' function submits the specified sql to the specified context
\n", - "As written here this query will be executed \"synchronously\" and return the result as a pandas DataFrame that will be printed at the end of the cell." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
XY
00.145097-1.131867
10.200108-0.337185
20.405262-0.246609
30.338955-1.156640
40.089844-1.062386
\n", - "
" - ], - "text/plain": [ - " X Y\n", - "0 0.145097 -1.131867\n", - "1 0.200108 -0.337185\n", - "2 0.405262 -0.246609\n", - "3 0.338955 -1.156640\n", - "4 0.089844 -1.062386" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df=cj.executeQuery(sql=sql,context=\"IntroSQL\")\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you only want to inspect the result you do not need to store it in a variable. And note that you do not need to be explicit about the variables. See docs for documentation on all the SciServer python modules. In particular SciServer.CasJobs for the CasJobs module." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
XY
00.001319-1.121337
10.001944-0.888723
20.002475-0.770705
\n", - "
" - ], - "text/plain": [ - " X Y\n", - "0 0.001319 -1.121337\n", - "1 0.001944 -0.888723\n", - "2 0.002475 -0.770705" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# sorting to peek at the extremes \n", - "sql=\"\"\"\n", - "select top 3 X, Y \n", - "from Data \n", - "order by X \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
x
00.546278
10.434254
20.438969
30.501610
40.430480
\n", - "
" - ], - "text/plain": [ - " x\n", - "0 0.546278\n", - "1 0.434254\n", - "2 0.438969\n", - "3 0.501610\n", - "4 0.430480" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- filtering with formulas and functions \n", - "select top 5 x \n", - "from Data \n", - "where 2*y between -sin(x) and x \n", - "order by RunID desc, y desc \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The last query showed you that you can comment your SQL code by using the following formats \n", - "
\n",
-    "-- single-line comments go after -- \n",
-    " \n",
-    "/* multi-line comments are  \n",
-    "   like this one \n",
-    "*/ \n",
-    "
\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# NOT TRUE:\n", - "### SAVE YOUR WORK! \n", - "The website you are using will not save your queries. If you would \n", - "like to keep them for future reference, open a text editor to cut & \n", - "paste the relevant lines into a file that you can regularly save. \n", - " \n", - "IF you save your notebook in a volume in your private /Storage/<username> area, for example /persistent, your notebook will be backed up and available in other containers as well.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Aggregation\n", - "Consider getting only the relevant information from the database and not everything. Run the following \n", - "commands and see what the different constructs achieve: " - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NColumn1Column2
054654610
\n", - "
" - ], - "text/plain": [ - " N Column1 Column2\n", - "0 546 546 10" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- counting \n", - "select COUNT(ID) as N, COUNT(RunID), COUNT(distinct RunID) \n", - "from Data \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Column1Column2Column3Column4
0230127.1400581.5091371.403175
\n", - "
" - ], - "text/plain": [ - " Column1 Column2 Column3 Column4\n", - "0 230 127.140058 1.509137 1.403175" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- aggregates in general \n", - "select COUNT(id), SUM(x), AVG(y), STDEV(x-y) \n", - "from Data \n", - "where Y>0 \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
RunIDColumn1Column2Column3Column4
01010.5053330.994198-0.0869702.975272
11090.3560640.879783-1.2074121.644311
21040.2557260.789319-2.3741071.774286
31020.2429260.773327-0.9357951.074530
41050.2025500.797702-8.0147736.073653
51080.3028710.599753-0.7003610.266618
61060.2020570.799781-5.9708504.230965
71030.0045610.881169-7.4800405.935146
81000.0172040.492729-1.6191210.487096
91070.0013190.097952-1.163017-0.693459
\n", - "
" - ], - "text/plain": [ - " RunID Column1 Column2 Column3 Column4\n", - "0 101 0.505333 0.994198 -0.086970 2.975272\n", - "1 109 0.356064 0.879783 -1.207412 1.644311\n", - "2 104 0.255726 0.789319 -2.374107 1.774286\n", - "3 102 0.242926 0.773327 -0.935795 1.074530\n", - "4 105 0.202550 0.797702 -8.014773 6.073653\n", - "5 108 0.302871 0.599753 -0.700361 0.266618\n", - "6 106 0.202057 0.799781 -5.970850 4.230965\n", - "7 103 0.004561 0.881169 -7.480040 5.935146\n", - "8 100 0.017204 0.492729 -1.619121 0.487096\n", - "9 107 0.001319 0.097952 -1.163017 -0.693459" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- grouping data \n", - "select RunID, MIN(x), MAX(x), MIN(y), MAX(Y) \n", - "from Data \n", - "group by RunID \n", - "order by AVG(X) desc \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
RunIDColumn1Column2Column3Column4
\n", - "
" - ], - "text/plain": [ - "Empty DataFrame\n", - "Columns: [RunID, Column1, Column2, Column3, Column4]\n", - "Index: []" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- having: contraints on aggregates \n", - "select RunID, MIN(x), MAX(x), MIN(y), MAX(Y) \n", - "from Data \n", - "where X>0.2 -- filtering on the input \n", - "group by RunID \n", - "having MAX(Y) < 0 -- filtering on aggregate \n", - "order by AVG(X) desc \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"AstroinformIntroSQLatics2018\")" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
RunIDColumn1Column2Column3Column4
01030.2070090.881169-5.7232225.935146
11050.2025500.797702-8.0147736.073653
21080.3028710.599753-0.7003610.266618
31060.2020570.799781-5.9708504.230965
\n", - "
" - ], - "text/plain": [ - " RunID Column1 Column2 Column3 Column4\n", - "0 103 0.207009 0.881169 -5.723222 5.935146\n", - "1 105 0.202550 0.797702 -8.014773 6.073653\n", - "2 108 0.302871 0.599753 -0.700361 0.266618\n", - "3 106 0.202057 0.799781 -5.970850 4.230965" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- or \n", - "select RunID, MIN(x), MAX(x), MIN(y), MAX(Y) \n", - "from Data \n", - "where X>0.2 \n", - "group by RunID \n", - "having COUNT(*) > 30 \n", - "order by AVG(X) desc \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
RunIDColumn1
010399
110599
210699
310873
410749
510230
610927
710025
810125
910420
\n", - "
" - ], - "text/plain": [ - " RunID Column1\n", - "0 103 99\n", - "1 105 99\n", - "2 106 99\n", - "3 108 73\n", - "4 107 49\n", - "5 102 30\n", - "6 109 27\n", - "7 100 25\n", - "8 101 25\n", - "9 104 20" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- number of measurements in each run \n", - "select RunID, COUNT(*) \n", - "from Data \n", - "group by RunID \n", - "order by 2 desc \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
XColumn1
00.1450970.02
10.2001080.04
20.4052620.16
30.3389550.11
40.0898440.01
50.1311270.02
60.4651170.22
70.0494290.00
80.3910010.15
90.3572760.13
100.1299080.02
110.3483020.12
120.3303930.11
130.1780240.03
140.4046100.16
150.0616590.00
160.4927290.24
170.0172040.00
180.0653570.00
190.3550850.13
200.3698500.14
210.3641920.13
220.0683440.00
230.3962230.16
240.1049860.01
250.9817930.96
260.6139450.38
270.6627630.44
280.7675580.59
290.9941980.99
.........
700.4357930.19
710.5116130.26
720.7380970.54
730.3058460.09
740.7241110.52
750.5937630.35
760.3717760.14
770.4629560.21
780.4312360.19
790.2429260.06
800.8398750.71
810.7838150.61
820.6010050.36
830.7802510.61
840.6524540.43
850.3846270.15
860.1865400.03
870.8715520.76
880.7408270.55
890.4375150.19
900.4720080.22
910.7166930.51
920.1575530.02
930.3097540.10
940.0431830.00
950.3862680.15
960.7371910.54
970.5854780.34
980.5260380.28
990.1930360.04
\n", - "

100 rows × 2 columns

\n", - "
" - ], - "text/plain": [ - " X Column1\n", - "0 0.145097 0.02\n", - "1 0.200108 0.04\n", - "2 0.405262 0.16\n", - "3 0.338955 0.11\n", - "4 0.089844 0.01\n", - "5 0.131127 0.02\n", - "6 0.465117 0.22\n", - "7 0.049429 0.00\n", - "8 0.391001 0.15\n", - "9 0.357276 0.13\n", - "10 0.129908 0.02\n", - "11 0.348302 0.12\n", - "12 0.330393 0.11\n", - "13 0.178024 0.03\n", - "14 0.404610 0.16\n", - "15 0.061659 0.00\n", - "16 0.492729 0.24\n", - "17 0.017204 0.00\n", - "18 0.065357 0.00\n", - "19 0.355085 0.13\n", - "20 0.369850 0.14\n", - "21 0.364192 0.13\n", - "22 0.068344 0.00\n", - "23 0.396223 0.16\n", - "24 0.104986 0.01\n", - "25 0.981793 0.96\n", - "26 0.613945 0.38\n", - "27 0.662763 0.44\n", - "28 0.767558 0.59\n", - "29 0.994198 0.99\n", - ".. ... ...\n", - "70 0.435793 0.19\n", - "71 0.511613 0.26\n", - "72 0.738097 0.54\n", - "73 0.305846 0.09\n", - "74 0.724111 0.52\n", - "75 0.593763 0.35\n", - "76 0.371776 0.14\n", - "77 0.462956 0.21\n", - "78 0.431236 0.19\n", - "79 0.242926 0.06\n", - "80 0.839875 0.71\n", - "81 0.783815 0.61\n", - "82 0.601005 0.36\n", - "83 0.780251 0.61\n", - "84 0.652454 0.43\n", - "85 0.384627 0.15\n", - "86 0.186540 0.03\n", - "87 0.871552 0.76\n", - "88 0.740827 0.55\n", - "89 0.437515 0.19\n", - "90 0.472008 0.22\n", - "91 0.716693 0.51\n", - "92 0.157553 0.02\n", - "93 0.309754 0.10\n", - "94 0.043183 0.00\n", - "95 0.386268 0.15\n", - "96 0.737191 0.54\n", - "97 0.585478 0.34\n", - "98 0.526038 0.28\n", - "99 0.193036 0.04\n", - "\n", - "[100 rows x 2 columns]" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- rounding is easy \n", - "select top 100 X, ROUND(X*X,2) from Data \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
XN
00.004
10.018
20.026
30.036
40.046
50.058
60.068
70.075
80.085
90.0911
100.105
110.113
120.121
130.132
140.152
150.161
160.171
170.181
180.193
190.204
200.2112
210.224
220.233
230.245
240.253
250.265
260.275
270.288
280.297
290.3010
.........
640.656
650.668
660.672
670.686
680.697
690.706
700.715
710.728
720.736
730.744
740.754
750.763
760.7711
770.785
780.794
790.805
800.811
810.832
820.842
830.851
840.865
850.873
860.883
870.901
880.911
890.921
900.931
910.971
920.982
930.991
\n", - "

94 rows × 2 columns

\n", - "
" - ], - "text/plain": [ - " X N\n", - "0 0.00 4\n", - "1 0.01 8\n", - "2 0.02 6\n", - "3 0.03 6\n", - "4 0.04 6\n", - "5 0.05 8\n", - "6 0.06 8\n", - "7 0.07 5\n", - "8 0.08 5\n", - "9 0.09 11\n", - "10 0.10 5\n", - "11 0.11 3\n", - "12 0.12 1\n", - "13 0.13 2\n", - "14 0.15 2\n", - "15 0.16 1\n", - "16 0.17 1\n", - "17 0.18 1\n", - "18 0.19 3\n", - "19 0.20 4\n", - "20 0.21 12\n", - "21 0.22 4\n", - "22 0.23 3\n", - "23 0.24 5\n", - "24 0.25 3\n", - "25 0.26 5\n", - "26 0.27 5\n", - "27 0.28 8\n", - "28 0.29 7\n", - "29 0.30 10\n", - ".. ... ..\n", - "64 0.65 6\n", - "65 0.66 8\n", - "66 0.67 2\n", - "67 0.68 6\n", - "68 0.69 7\n", - "69 0.70 6\n", - "70 0.71 5\n", - "71 0.72 8\n", - "72 0.73 6\n", - "73 0.74 4\n", - "74 0.75 4\n", - "75 0.76 3\n", - "76 0.77 11\n", - "77 0.78 5\n", - "78 0.79 4\n", - "79 0.80 5\n", - "80 0.81 1\n", - "81 0.83 2\n", - "82 0.84 2\n", - "83 0.85 1\n", - "84 0.86 5\n", - "85 0.87 3\n", - "86 0.88 3\n", - "87 0.90 1\n", - "88 0.91 1\n", - "89 0.92 1\n", - "90 0.93 1\n", - "91 0.97 1\n", - "92 0.98 2\n", - "93 0.99 1\n", - "\n", - "[94 rows x 2 columns]" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- building a histogram \n", - "select ROUND(x,2) as X, COUNT(*) as N \n", - "from Data \n", - "group by ROUND(x,2) \n", - "order by 1 \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use the same
 SELECT ... INTO ...
pattern to save results in your MyDB. Make sure though that a table with this name does not already exist!" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Rows Affected
061
\n", - "
" - ], - "text/plain": [ - " Rows Affected\n", - "0 61" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- custom bin size using a variable and save the results \n", - "declare @bin float = 0.016 \n", - "select ROUND(x/@bin,0)*@bin as X, COUNT(*) as Cts \n", - "into AHistogram -- name of new table \n", - "from Data \n", - "group by ROUND(x/@bin,0)*@bin \n", - "order by ROUND(x/@bin,0)*@bin \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Check that the table was indeed created in your MyDB in the CasJobs UI at https://skyserver.sdss.org/CasJobs/MyDB.aspx" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Asking Questions using SQL \n", - "\n", - "You’ll see that every question you have about the data will nicely translate to commands. For example, \n", - "let’s consider the following question: Who ran the first measurement? The following 3 select statements \n", - "will get you an answer. Note, you can run all three queries using a single execution. When more than 1 result is returned from the database, the result is a list of data frames: " - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "sql=\"\"\"\n", - "select top 1 RunID from Data order by ID -- 100 \n", - "select UserID from Runs where RunID=100 -- 12 \n", - "select * from Users where UserID=12 \n", - "\"\"\"\n", - "dfs=cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Loop over the elements in the result, each of which is a DataFrame:" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 :\n", - " RunID\n", - "0 100\n", - "1 :\n", - " UserID\n", - "0 12\n", - "2 :\n", - " UserID Name AdvisorID\n", - "0 12 Hugo First 10\n" - ] - } - ], - "source": [ - "for idx, df in enumerate(dfs):\n", - " print(idx,\":\")\n", - " print(df)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What did we really mean by “first” measurement? Any other definition to use? \n", - " \n", - "Nested queries can combine multiple searches into one request. Run and analyze the following queries \n", - "and their results: " - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [], - "source": [ - "sql=\"\"\"\n", - "-- nested queries \n", - "select UserID \n", - "from Runs \n", - "where RunID = (select top 1 RunID from Data order by ID) \n", - " \n", - "-- doubly so \n", - "select * from Users \n", - "where UserID = ( \n", - " select UserID from Runs \n", - " where RunID = (select top 1 RunID from Data order by ID) \n", - ") \n", - " \n", - "-- whole set of runs using the 'in' keyword \n", - "select UserID \n", - "from Runs \n", - "where RunID in (select top 1 RunID \n", - " from Data order by ID) \n", - "\"\"\"\n", - "dfs=cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 :\n", - " UserID\n", - "0 12\n", - "1 :\n", - " UserID Name AdvisorID\n", - "0 12 Hugo First 10\n", - "2 :\n", - " UserID\n", - "0 12\n" - ] - } - ], - "source": [ - "for idx, df in enumerate(dfs):\n", - " print(idx,\":\")\n", - " print(df)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Combining tables is where relational database engines really shine. The terminology is “joining tables”. \n", - "Here are different implementations of similar questions. Make sure you understand these queries \n", - "because they will be important: \n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameRunIDColumn1
0Hugo First1000.5
1Hugo First1010.5
2Hammond Eggs1020.6
3Gil T. Azell1030.9
4Holly Wood1040.6
5Gil T. Azell1050.6
6Gil T. Azell1060.6
7Holly Wood1070.1
8Holly Wood1080.3
9Levy Tate1090.6
\n", - "
" - ], - "text/plain": [ - " Name RunID Column1\n", - "0 Hugo First 100 0.5\n", - "1 Hugo First 101 0.5\n", - "2 Hammond Eggs 102 0.6\n", - "3 Gil T. Azell 103 0.9\n", - "4 Holly Wood 104 0.6\n", - "5 Gil T. Azell 105 0.6\n", - "6 Gil T. Azell 106 0.6\n", - "7 Holly Wood 107 0.1\n", - "8 Holly Wood 108 0.3\n", - "9 Levy Tate 109 0.6" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- inner join (old style) \n", - "select u.Name, r.RunID, r.Xmax - r.Xmin \n", - "from Runs r, Users u -- aliases are convenient \n", - "where r.UserID=u.UserID \n", - "-- risk of forgetting a constraint when many tables \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameRunIDColumn1
0Hugo First1000.5
1Hugo First1010.5
2Hammond Eggs1020.6
3Gil T. Azell1030.9
4Holly Wood1040.6
5Gil T. Azell1050.6
6Gil T. Azell1060.6
7Holly Wood1070.1
8Holly Wood1080.3
9Levy Tate1090.6
\n", - "
" - ], - "text/plain": [ - " Name RunID Column1\n", - "0 Hugo First 100 0.5\n", - "1 Hugo First 101 0.5\n", - "2 Hammond Eggs 102 0.6\n", - "3 Gil T. Azell 103 0.9\n", - "4 Holly Wood 104 0.6\n", - "5 Gil T. Azell 105 0.6\n", - "6 Gil T. Azell 106 0.6\n", - "7 Holly Wood 107 0.1\n", - "8 Holly Wood 108 0.3\n", - "9 Levy Tate 109 0.6" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- inner join (preferred) \n", - "select u.Name, r.RunID, r.Xmax - r.Xmin \n", - "from Runs r \n", - " join Users u on u.UserID=r.UserID \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameRunIDColumn1
0Hugo First1000.5
1Hugo First1010.5
2Hammond Eggs1020.6
3Gil T. Azell1030.9
4Holly Wood1040.6
5Gil T. Azell1050.6
6Gil T. Azell1060.6
7Holly Wood1070.1
8Holly Wood1080.3
9Levy Tate1090.6
\n", - "
" - ], - "text/plain": [ - " Name RunID Column1\n", - "0 Hugo First 100 0.5\n", - "1 Hugo First 101 0.5\n", - "2 Hammond Eggs 102 0.6\n", - "3 Gil T. Azell 103 0.9\n", - "4 Holly Wood 104 0.6\n", - "5 Gil T. Azell 105 0.6\n", - "6 Gil T. Azell 106 0.6\n", - "7 Holly Wood 107 0.1\n", - "8 Holly Wood 108 0.3\n", - "9 Levy Tate 109 0.6" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"-- or explicitly \n", - "select u.Name, r.RunID, r.Xmax - r.Xmin \n", - "from Runs r \n", - " inner join Users u on u.UserID=r.UserID \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Name
0Gil T. Azell
1Hammond Eggs
2Holly Wood
3Hugo First
4Levy Tate
\n", - "
" - ], - "text/plain": [ - " Name\n", - "0 Gil T. Azell\n", - "1 Hammond Eggs\n", - "2 Holly Wood\n", - "3 Hugo First\n", - "4 Levy Tate" - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\" \n", - "-- list of users with measurements \n", - "select distinct u.Name \n", - "from Runs r \n", - " join Users u on u.UserID=r.UserID \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A different kind of join is used less frequently to look at all combinations of rows in the specified tables. \n", - "Compare the following queries to the previous ones and run them to see the differences in the results: \n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameRunIDColumn1
0Andy Structible1000.5
1Andy Structible1010.5
2Andy Structible1020.6
3Andy Structible1030.9
4Andy Structible1040.6
5Andy Structible1050.6
6Andy Structible1060.6
7Andy Structible1070.1
8Andy Structible1080.3
9Andy Structible1090.6
10Jack Pot1000.5
11Jack Pot1010.5
12Jack Pot1020.6
13Jack Pot1030.9
14Jack Pot1040.6
15Jack Pot1050.6
16Jack Pot1060.6
17Jack Pot1070.1
18Jack Pot1080.3
19Jack Pot1090.6
20Hugo First1000.5
21Hugo First1010.5
22Hugo First1020.6
23Hugo First1030.9
24Hugo First1040.6
25Hugo First1050.6
26Hugo First1060.6
27Hugo First1070.1
28Hugo First1080.3
29Hugo First1090.6
............
40Gil T. Azell1000.5
41Gil T. Azell1010.5
42Gil T. Azell1020.6
43Gil T. Azell1030.9
44Gil T. Azell1040.6
45Gil T. Azell1050.6
46Gil T. Azell1060.6
47Gil T. Azell1070.1
48Gil T. Azell1080.3
49Gil T. Azell1090.6
50Holly Wood1000.5
51Holly Wood1010.5
52Holly Wood1020.6
53Holly Wood1030.9
54Holly Wood1040.6
55Holly Wood1050.6
56Holly Wood1060.6
57Holly Wood1070.1
58Holly Wood1080.3
59Holly Wood1090.6
60Levy Tate1000.5
61Levy Tate1010.5
62Levy Tate1020.6
63Levy Tate1030.9
64Levy Tate1040.6
65Levy Tate1050.6
66Levy Tate1060.6
67Levy Tate1070.1
68Levy Tate1080.3
69Levy Tate1090.6
\n", - "

70 rows × 3 columns

\n", - "
" - ], - "text/plain": [ - " Name RunID Column1\n", - "0 Andy Structible 100 0.5\n", - "1 Andy Structible 101 0.5\n", - "2 Andy Structible 102 0.6\n", - "3 Andy Structible 103 0.9\n", - "4 Andy Structible 104 0.6\n", - "5 Andy Structible 105 0.6\n", - "6 Andy Structible 106 0.6\n", - "7 Andy Structible 107 0.1\n", - "8 Andy Structible 108 0.3\n", - "9 Andy Structible 109 0.6\n", - "10 Jack Pot 100 0.5\n", - "11 Jack Pot 101 0.5\n", - "12 Jack Pot 102 0.6\n", - "13 Jack Pot 103 0.9\n", - "14 Jack Pot 104 0.6\n", - "15 Jack Pot 105 0.6\n", - "16 Jack Pot 106 0.6\n", - "17 Jack Pot 107 0.1\n", - "18 Jack Pot 108 0.3\n", - "19 Jack Pot 109 0.6\n", - "20 Hugo First 100 0.5\n", - "21 Hugo First 101 0.5\n", - "22 Hugo First 102 0.6\n", - "23 Hugo First 103 0.9\n", - "24 Hugo First 104 0.6\n", - "25 Hugo First 105 0.6\n", - "26 Hugo First 106 0.6\n", - "27 Hugo First 107 0.1\n", - "28 Hugo First 108 0.3\n", - "29 Hugo First 109 0.6\n", - ".. ... ... ...\n", - "40 Gil T. Azell 100 0.5\n", - "41 Gil T. Azell 101 0.5\n", - "42 Gil T. Azell 102 0.6\n", - "43 Gil T. Azell 103 0.9\n", - "44 Gil T. Azell 104 0.6\n", - "45 Gil T. Azell 105 0.6\n", - "46 Gil T. Azell 106 0.6\n", - "47 Gil T. Azell 107 0.1\n", - "48 Gil T. Azell 108 0.3\n", - "49 Gil T. Azell 109 0.6\n", - "50 Holly Wood 100 0.5\n", - "51 Holly Wood 101 0.5\n", - "52 Holly Wood 102 0.6\n", - "53 Holly Wood 103 0.9\n", - "54 Holly Wood 104 0.6\n", - "55 Holly Wood 105 0.6\n", - "56 Holly Wood 106 0.6\n", - "57 Holly Wood 107 0.1\n", - "58 Holly Wood 108 0.3\n", - "59 Holly Wood 109 0.6\n", - "60 Levy Tate 100 0.5\n", - "61 Levy Tate 101 0.5\n", - "62 Levy Tate 102 0.6\n", - "63 Levy Tate 103 0.9\n", - "64 Levy Tate 104 0.6\n", - "65 Levy Tate 105 0.6\n", - "66 Levy Tate 106 0.6\n", - "67 Levy Tate 107 0.1\n", - "68 Levy Tate 108 0.3\n", - "69 Levy Tate 109 0.6\n", - "\n", - "[70 rows x 3 columns]" - ] - }, - "execution_count": 42, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "-- cross join (old style) \n", - "select u.Name, r.RunID, r.Xmax - r.Xmin \n", - "from Runs r, Users u \n", - "-- same as old-style inner join w/o contraint \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameRunIDColumn1
0Andy Structible1000.5
1Andy Structible1010.5
2Andy Structible1020.6
3Andy Structible1030.9
4Andy Structible1040.6
5Andy Structible1050.6
6Andy Structible1060.6
7Andy Structible1070.1
8Andy Structible1080.3
9Andy Structible1090.6
10Jack Pot1000.5
11Jack Pot1010.5
12Jack Pot1020.6
13Jack Pot1030.9
14Jack Pot1040.6
15Jack Pot1050.6
16Jack Pot1060.6
17Jack Pot1070.1
18Jack Pot1080.3
19Jack Pot1090.6
20Hugo First1000.5
21Hugo First1010.5
22Hugo First1020.6
23Hugo First1030.9
24Hugo First1040.6
25Hugo First1050.6
26Hugo First1060.6
27Hugo First1070.1
28Hugo First1080.3
29Hugo First1090.6
............
40Gil T. Azell1000.5
41Gil T. Azell1010.5
42Gil T. Azell1020.6
43Gil T. Azell1030.9
44Gil T. Azell1040.6
45Gil T. Azell1050.6
46Gil T. Azell1060.6
47Gil T. Azell1070.1
48Gil T. Azell1080.3
49Gil T. Azell1090.6
50Holly Wood1000.5
51Holly Wood1010.5
52Holly Wood1020.6
53Holly Wood1030.9
54Holly Wood1040.6
55Holly Wood1050.6
56Holly Wood1060.6
57Holly Wood1070.1
58Holly Wood1080.3
59Holly Wood1090.6
60Levy Tate1000.5
61Levy Tate1010.5
62Levy Tate1020.6
63Levy Tate1030.9
64Levy Tate1040.6
65Levy Tate1050.6
66Levy Tate1060.6
67Levy Tate1070.1
68Levy Tate1080.3
69Levy Tate1090.6
\n", - "

70 rows × 3 columns

\n", - "
" - ], - "text/plain": [ - " Name RunID Column1\n", - "0 Andy Structible 100 0.5\n", - "1 Andy Structible 101 0.5\n", - "2 Andy Structible 102 0.6\n", - "3 Andy Structible 103 0.9\n", - "4 Andy Structible 104 0.6\n", - "5 Andy Structible 105 0.6\n", - "6 Andy Structible 106 0.6\n", - "7 Andy Structible 107 0.1\n", - "8 Andy Structible 108 0.3\n", - "9 Andy Structible 109 0.6\n", - "10 Jack Pot 100 0.5\n", - "11 Jack Pot 101 0.5\n", - "12 Jack Pot 102 0.6\n", - "13 Jack Pot 103 0.9\n", - "14 Jack Pot 104 0.6\n", - "15 Jack Pot 105 0.6\n", - "16 Jack Pot 106 0.6\n", - "17 Jack Pot 107 0.1\n", - "18 Jack Pot 108 0.3\n", - "19 Jack Pot 109 0.6\n", - "20 Hugo First 100 0.5\n", - "21 Hugo First 101 0.5\n", - "22 Hugo First 102 0.6\n", - "23 Hugo First 103 0.9\n", - "24 Hugo First 104 0.6\n", - "25 Hugo First 105 0.6\n", - "26 Hugo First 106 0.6\n", - "27 Hugo First 107 0.1\n", - "28 Hugo First 108 0.3\n", - "29 Hugo First 109 0.6\n", - ".. ... ... ...\n", - "40 Gil T. Azell 100 0.5\n", - "41 Gil T. Azell 101 0.5\n", - "42 Gil T. Azell 102 0.6\n", - "43 Gil T. Azell 103 0.9\n", - "44 Gil T. Azell 104 0.6\n", - "45 Gil T. Azell 105 0.6\n", - "46 Gil T. Azell 106 0.6\n", - "47 Gil T. Azell 107 0.1\n", - "48 Gil T. Azell 108 0.3\n", - "49 Gil T. Azell 109 0.6\n", - "50 Holly Wood 100 0.5\n", - "51 Holly Wood 101 0.5\n", - "52 Holly Wood 102 0.6\n", - "53 Holly Wood 103 0.9\n", - "54 Holly Wood 104 0.6\n", - "55 Holly Wood 105 0.6\n", - "56 Holly Wood 106 0.6\n", - "57 Holly Wood 107 0.1\n", - "58 Holly Wood 108 0.3\n", - "59 Holly Wood 109 0.6\n", - "60 Levy Tate 100 0.5\n", - "61 Levy Tate 101 0.5\n", - "62 Levy Tate 102 0.6\n", - "63 Levy Tate 103 0.9\n", - "64 Levy Tate 104 0.6\n", - "65 Levy Tate 105 0.6\n", - "66 Levy Tate 106 0.6\n", - "67 Levy Tate 107 0.1\n", - "68 Levy Tate 108 0.3\n", - "69 Levy Tate 109 0.6\n", - "\n", - "[70 rows x 3 columns]" - ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\" \n", - "-- cross join: explicitly \n", - "select u.Name, r.RunID, r.Xmax - r.Xmin \n", - "from Runs r cross join Users u \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Column1
07
\n", - "
" - ], - "text/plain": [ - " Column1\n", - "0 7" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\" \n", - "-- compare the size of the result set with these \n", - "select COUNT(*) from users \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Column1
010
\n", - "
" - ], - "text/plain": [ - " Column1\n", - "0 10" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "sql=\"\"\"\n", - "select COUNT(*) from runs \n", - "-- all combinations \n", - "\"\"\"\n", - "cj.executeQuery(sql,\"IntroSQL\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.8 (py38)", - "language": "python", - "name": "py38" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/Teaching/class20181128.ipynb b/Teaching/JHU/DataMining-553.436/class20181128.ipynb similarity index 100% rename from Teaching/class20181128.ipynb rename to Teaching/JHU/DataMining-553.436/class20181128.ipynb