Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

Latest commit

 

History

History
144 lines (114 loc) · 3.35 KB

sql101.md

File metadata and controls

144 lines (114 loc) · 3.35 KB

Docs > SQL 101

What is SQL

SQL is a declarative language for accessing, updating, and transforming data.

Key Terms

  • Table - A dataset defined by one or more columns, having zero or more rows.
  • Column - A field or attribute in a dataset which has a specific name and data type.
  • Row - A single entry in a table dataset.

The three types of SQL Commands

  • SELECT - The command for pulling one or more rows data out of a data source.
  • DELETE - The command to remove one or more rows from an existing table based upon a specific condition.
  • UPDATE - The command to modify one or more rows in an existing table.

NOTE: Modern big data applications and analytical use cases almost exclusivily use SELECT statements. For this reason, we will ignore DELETE and UPDATE in this introductory course.

The anatomy of SELECT

SELECT    ...a list of columns and expressions...
FROM      ...your primary table...
JOIN      ...to any additional tables...
WHERE     ...rows meet certain conditions...
GROUP BY  ...any summary columns...
HAVING    ...met any additional criteria after aggregations are performed...

SQL by Example

Can you tell what each of the below statements is returning?

SELECT *
FROM fact_sales
WHERE sales_date = TODAY()
SELECT DISTINCT
    customer_id
  , product_id
FROM fact_sales
WHERE sales_date = TODAY()
SELECT list_customers.customer_name
FROM fact_sales
JOIN list_customers
  ON fact_sales.customer_id = list_customers.customer_id
WHERE fact_sales.sales_date = TODAY()
SELECT list_customers.customer_name
FROM fact_sales
JOIN list_customers
  ON fact_sales.customer_id = list_customers.customer_id
WHERE fact_sales.sales_date = TODAY()
SELECT
      fact_sales.sales_date
    , sum(fact_sales.sales) as sales
FROM fact_sales
GROUP BY fact_sales.sales_date
WHERE fact_sales.sales_date = TODAY()
SELECT
      list_regions.region_name
    , sum(fact_sales.sales) as sales
FROM fact_sales
LEFT JOIN list_customers
  ON fact_sales.customer_id = list_customers.customer_id
LEFT JOIN list_regions
  ON list_customers.region_id = list_regions.region_id
WHERE fact_sales.sales_date = TODAY()
GROUP BY list_regions.region_name

Example Tables

  • fact_sales
    • sales_date
    • order_id
    • customer_id
    • product_id
    • sales
    • quantity
  • list_customers
    • customer_id
    • customer_name
    • region_id
  • list_regions
    • region_id
    • region_name
    • country_code

The "CTAS" COMMAND (CREATE TABLE AS)

The CTAS command creates a new permanent table from any valid SELECT statement.

Consider the following example.

CREATE TABLE fact_sales_summary_daily AS
SELECT
      sales_date
    , sum(sales) AS sales
FROM fact_sales
GROUP BY sales_date

The above CTAS command will create a new, smaller, and faster table which only has a single summary row per day.

NOTE: Most databases also support "CREATE OR REPLACE" to replace any existing table at the same time:

CREATE OR REPLACE TABLE ______ AS...

5 types of tables

  1. LIST
    • aka: "Type 1 Dimension"
  2. HIST
    • aka: "Type 2 Dimension"
  3. EVENTS
    • aka: "Fact Table"
  4. FACT
    • aka: "Aggregate Table"
  5. MAP
    • aka: "Factless Fact Table"

Interactive Online Tutorial