-
Notifications
You must be signed in to change notification settings - Fork 31
/
document.qmd
81 lines (65 loc) · 2.15 KB
/
document.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
title: "ETL Step 2 - Tidy data"
date: "`r lubridate::now(tzone = 'EST')` EST"
output: html_document
---
This notebook tidies the raw *raw_data_table* from the *Content DB* database. The tidy data is written back to the *Content DB* database to the *bike_model_data* table.
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(dplyr)
library(dbplyr)
library(lubridate)
```
## Get data from database
Connect to *Content DB* to get the *bike_raw_data* and *bike_station_info* table.
```{r, get_data_from_db}
con <- DBI::dbConnect(
odbc::odbc(),
Driver = "postgresql",
Server = Sys.getenv("DB_SERVER"),
Port = "5432",
Database = "soleng",
UID = Sys.getenv("DB_USER"),
PWD = Sys.getenv("DB_PASSWORD"),
BoolsAsChar = "",
timeout = 10
)
bike_raw_data <- tbl(con,
DBI::Id(schema="content", name="bike_raw_data"))
glimpse(bike_raw_data)
bike_station_info <- tbl(con, DBI::Id(schema="content", name="bike_station_info"))
glimpse(bike_station_info)
```
## Tidy the data
Apply the data tidying steps. All of the tidying steps are performed in SQL. The results are written back to *Content DB* to the *bike_model_data* table.
```{r clean_data}
if (odbc::dbExistsTable(con, DBI::Id(schema="content", name="bike_model_data"))) {
odbc::dbRemoveTable(con, DBI::Id(schema="content", name="bike_model_data"))
}
# Build a SQL query to tidy the data.
query <- bike_raw_data %>%
group_by(
id = station_id,
hour = hour(time),
date = date(time),
month = month(time),
dow = TRIM(to_char(time, "Day"))
) %>%
summarize(
n_bikes = mean(num_bikes_available, na.rm = TRUE),
.groups = "drop"
) %>%
inner_join(
select(bike_station_info, id = station_id, lat, lon)
) %>%
dbplyr::sql_render() %>%
stringr::str_replace("SELECT", "CREATE TABLE content.bike_model_data AS SELECT")
# Execute the SQL query.
odbc::dbSendQuery(con, query)
# Preview the table.
tbl(con, DBI::Id(schema="content", name="bike_model_data"))
```
```{r close_connection}
odbc::dbDisconnect(con)
print("Complete 🎉")
```