Skip to content

Commit

Permalink
docs: enhance bq2bq plugin documentation for end user
Browse files Browse the repository at this point in the history
  • Loading branch information
deryrahman committed Nov 15, 2023
1 parent fe52f52 commit 67108ca
Showing 1 changed file with 41 additions and 10 deletions.
51 changes: 41 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,47 @@
Optimus's transformation plugins are implementations of Task and Hook interfaces that allows
execution of arbitrary jobs in optimus.

## To install plugins via homebrew
```shell
brew tap goto/taps
brew install optimus-plugins-goto
# Capabilities

- Transform data by BQ SQL syntax and store the transformed data to certain BQ table
- Execute the transformation process with a certain GCP project
- Support various load method, eg. APPEND, REPLACE, MERGE
- Support Bigquery DML Merge statement to handle spillover case
- Support transformation for partitioned tables such as partition by ingestion time (default) and partition by column
- Dry run support

# Use Cases

Base configurations:
```yaml
# ./job.yaml
...
task:
name: bq2bq
config:
LOAD_METHOD: REPLACE
SQL_TYPE: STANDARD
PROJECT: project
DATASET: dataset
TABLE: destination
BQ_SERVICE_ACCOUNT: bq_secret_here
...
...
```

```sql
-- ./assets/query.sql
select field1, field2 from `project.dataset.source`
```

## To install plugins via shell
## Basic transforming the data and store it to the destination BQ table

Use the base configuration above to extract the data from `project.dataset.source` table. The query written on `./assets/query.sql` is used for selecting the records to be loaded to `project.dataset.destination` table (it's configurable through `PROJECT`, `DATASET`, and `TABLE`). The schema of destination table should match with the schema of the record result of that query. `BQ_SERVICE_ACCOUNT` is mandatory credentials to access the BQ api to execute the query.

## Load the queried records to destination BQ table by appending / replace / merge

How the query result load to destination table is depend on `LOAD_METHOD` configuration. [more about load method](https://github.com/goto/transformers/tree/main/task/bq2bq)

## Extracting data through configurable BQ EXECUTION_PROJECT

```shell
curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
```
`EXECUTION_PROJECT` is an additional configuration for the job to execute the query through non-default project. It's useful for customing the allocation of BQ slots. For example when the job requires a lot of resources, it's better to delegate this execution to another dedicated project.

0 comments on commit 67108ca

Please sign in to comment.