Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add GenAI capabilities #272

Merged
merged 33 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d24c99c
Add GenAI use case
shanecglass Oct 9, 2023
2cbeec8
Correct IAM block for BQ connection to Vertex
shanecglass Oct 9, 2023
85e4979
Typo correction
shanecglass Oct 9, 2023
6cb4038
Correcting typos
shanecglass Oct 9, 2023
fb612d4
Correcting development errors
shanecglass Oct 9, 2023
389ca04
Formatting
shanecglass Oct 9, 2023
66edef3
Reduce wait_after_apis time
shanecglass Oct 9, 2023
cc1f6ff
Update dependencies
shanecglass Oct 9, 2023
510e60c
Fixing issue of conflicting resource generation
shanecglass Oct 9, 2023
a18affe
Comment out GenAI query on routine creation
shanecglass Oct 11, 2023
ef7be60
Adding additional instructions
shanecglass Oct 11, 2023
01e0297
Merge branch 'scg-dev'
shanecglass Oct 11, 2023
3d94663
Testing BQ connection
shanecglass Oct 11, 2023
92c8414
Correcting connection reference
shanecglass Oct 11, 2023
13042dd
Correct BigQuery connection reference
shanecglass Oct 11, 2023
11ac934
Typo correction
shanecglass Oct 11, 2023
14ecbed
Merge pull request #10 from shanecglass/scg-dev
shanecglass Oct 11, 2023
758a4f5
Add BQML model creations to workflow
shanecglass Oct 11, 2023
293a0e7
Update workflow dataset reference
shanecglass Oct 11, 2023
472a2bf
Automating remote model creation
shanecglass Oct 11, 2023
6a180b4
Updating GenAI model parameters
shanecglass Oct 11, 2023
c4a9b57
Formatting
shanecglass Oct 13, 2023
4cc9814
Update architecture diagram
shanecglass Oct 18, 2023
7b90a2e
update architecture diagrams
shanecglass Oct 19, 2023
d5589d2
Merge branch 'master' of https://github.com/terraform-google-modules/…
shanecglass Oct 19, 2023
596c5e2
Updated architecture diagrams
shanecglass Oct 19, 2023
0e85b5b
Updated architecture diagram
shanecglass Oct 19, 2023
d09f553
Documentation updates
shanecglass Oct 20, 2023
2c14c62
Merge branch 'master' into scg-dev
shanecglass Oct 20, 2023
74f7cd2
Lint fix
shanecglass Oct 20, 2023
d3dfe91
Fixing lint issues
shanecglass Oct 23, 2023
d311ecf
Docs updates
shanecglass Oct 23, 2023
b20bd15
Merge branch 'master' into scg-dev
davenportjw Oct 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 17 additions & 10 deletions modules/data_warehouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The resources/services/activations/deletions that this module will create/trigge
- Loads the Google Cloud Storage bucket with data from [TheLook eCommerce Public Dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce)
- Provides SQL examples
- Creates and inferences with a BigQuery ML model
- Creates a remote model and uses Generative AI to generate text through a BigQuery ML remote model
- Creates a Looker Studio report

### preDeploy
Expand All @@ -27,7 +28,7 @@ To deploy this blueprint you must have an active billing account and billing per
## Usage

Functional examples are included in the
[examples](./examples/) directory.
[examples](../../examples/data_warehouse/README.md) directory.

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Inputs
Expand All @@ -40,6 +41,7 @@ Functional examples are included in the
| labels | A map of labels to apply to contained resources. | `map(string)` | <pre>{<br> "data-warehouse": true<br>}</pre> | no |
| project\_id | Google Cloud Project ID | `string` | n/a | yes |
| region | Google Cloud Region | `string` | n/a | yes |
| text\_generation\_model\_name | Name of the BigQuery ML GenAI remote model that connects to the LLM used for text generation | `string` | `"text_generate_model"` | no |

## Outputs

Expand All @@ -61,8 +63,8 @@ These sections describe requirements for using this module.

The following dependencies must be available:

- [Terraform][terraform] v0.13
- [Terraform Provider for GCP][terraform-provider-gcp] plugin v3.0
- [Terraform](https://github.com/hashicorp/terraform) v0.13
- [Terraform Provider for GCP](https://github.com/hashicorp/terraform-provider-google) plugin v3.0

### Service Account

Expand All @@ -76,22 +78,27 @@ the resources of this module:
- Pub/Sub Admin: `roles/pubsub.admin`
- Dataplex Admin: `roles/dataplex.admin`

The [Project Factory module][project-factory-module] and the
[IAM module][iam-module] may be used in combination to provision a
The [Project Factory module](./.terraform/modules/project-services/README.md) and the
[IAM module](https://github.com/terraform-google-modules/terraform-google-iam) may be used in combination to provision a
service account with the necessary roles applied.

### APIs

A project with the following APIs enabled must be used to host the
resources of this module:

- Vertex AI API: `aiplatform.googleapis.com`
- BigQuery API: `bigquery.googleapis.com`
- BigQuery Connection API: `bigqueryconnection.googleapis.com`
- BigQuery Data Policy API: `bigquerydatapolicy.googleapis.com`
- BigQuery Data Transfer Service API: `bigquerydatatransfer.googleapis.com`
- BigQuery Migration API: `bigquerymigration.googleapis.com`
- BigQuery Storage API: `bigquerystorage.googleapis.com`
- BigQuery Connection API: `bigqueryconnection.googleapis.com`
- BigQuery Reservations API: `bigqueryreservation.googleapis.com`
- BigQuery Data Transfer Service API: `bigquerydatatransfer.googleapis.com`
- BigQuery Storage API: `bigquerystorage.googleapis.com`
- Google Cloud APIs: `cloudapis.googleapis.com`
- Cloud Build API: `cloudbuild.googleapis.com`
- Compute Engine API: `compute.googleapis.com`
- Infrastructure Manager API: `config.googleapis.com`
- Data Catalog API: `datacatalog.googleapis.com`
- Data Lineage API: `datalineage.googleapis.com`
- Eventarc API: `eventarc.googleapis.com`
Expand All @@ -101,10 +108,10 @@ resources of this module:
- Google Cloud Storage JSON API: `storage-api.googleapis.com`
- Google Cloud Workflows API: `workflows.googleapis.com`

The [Project Factory module][project-factory-module] can be used to
The [Project Factory module](./.terraform/modules/project-services/README.md) can be used to
provision a project with the necessary APIs enabled.


## Security Disclosures

Please see our [security disclosure process](./SECURITY.md).
Please see our [security disclosure process](../../SECURITY.md).

Large diffs are not rendered by default.

Binary file modified modules/data_warehouse/assets/data-warehouse-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,763 changes: 2,739 additions & 24 deletions modules/data_warehouse/assets/data-warehouse-architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
145 changes: 111 additions & 34 deletions modules/data_warehouse/bigquery.tf
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ resource "google_bigquery_dataset" "ds_edw" {
depends_on = [time_sleep.wait_after_apis]
}

# # Create a BigQuery connection
# # Create a BigQuery connection for Cloud Storage to create BigLake tables
resource "google_bigquery_connection" "ds_connection" {
project = module.project-services.project_id
connection_id = "ds_connection"
Expand All @@ -47,6 +47,29 @@ resource "google_storage_bucket_iam_binding" "bq_connection_iam_object_viewer" {
]
}

# # Create a BigQuery connection for Vertex AI to support GenerativeAI use cases
resource "google_bigquery_connection" "vertex_ai_connection" {
project = module.project-services.project_id
connection_id = "genai_connection"
location = var.region
friendly_name = "BigQuery ML Connection"
cloud_resource {}
depends_on = [time_sleep.wait_after_apis]
}

# # Grant IAM access to the BigQuery Connection account for Vertex AI
resource "google_project_iam_member" "bq_connection_iam_vertex_ai" {
for_each = toset([
"roles/aiplatform.user",
"roles/bigquery.connectionUser",
"roles/serviceusage.serviceUsageConsumer",
]
)
project = module.project-services.project_id
role = each.key
member = "serviceAccount:${google_bigquery_connection.vertex_ai_connection.cloud_resource[0].service_account_id}"
}

# # Create a Biglake table for events with metadata caching
resource "google_bigquery_table" "tbl_edw_events" {
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
Expand Down Expand Up @@ -164,22 +187,30 @@ resource "google_bigquery_table" "tbl_edw_users" {
# Load Queries for Stored Procedure Execution
# # Load Distribution Center Lookup Data Tables
resource "google_bigquery_routine" "sp_provision_lookup_tables" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_provision_lookup_tables"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_provision_lookup_tables.sql", { project_id = module.project-services.project_id, dataset_id = google_bigquery_dataset.ds_edw.dataset_id })
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_provision_lookup_tables"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_provision_lookup_tables.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
}
)
}

# Add Looker Studio Data Report Procedure
resource "google_bigquery_routine" "sproc_sp_demo_lookerstudio_report" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_lookerstudio_report"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_lookerstudio_report.sql", { project_id = module.project-services.project_id, dataset_id = google_bigquery_dataset.ds_edw.dataset_id })
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_lookerstudio_report"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_lookerstudio_report.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
}
)

depends_on = [
google_bigquery_table.tbl_edw_inventory_items,
Expand All @@ -190,12 +221,16 @@ resource "google_bigquery_routine" "sproc_sp_demo_lookerstudio_report" {

# # Add Sample Queries
resource "google_bigquery_routine" "sp_sample_queries" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_sample_queries"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_sample_queries.sql", { project_id = module.project-services.project_id, dataset_id = google_bigquery_dataset.ds_edw.dataset_id })
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_sample_queries"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_sample_queries.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
}
)

depends_on = [
google_bigquery_table.tbl_edw_inventory_items,
Expand All @@ -204,29 +239,71 @@ resource "google_bigquery_routine" "sp_sample_queries" {
}


# Add Bigquery ML Model
# Add Bigquery ML Model for clustering
resource "google_bigquery_routine" "sp_bigqueryml_model" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_bigqueryml_model"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_bigqueryml_model.sql", { project_id = module.project-services.project_id, dataset_id = google_bigquery_dataset.ds_edw.dataset_id })

project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_bigqueryml_model"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_bigqueryml_model.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
}
)
depends_on = [
google_bigquery_table.tbl_edw_order_items,
]
}

# Create Bigquery ML Model for using text generation
resource "google_bigquery_routine" "sp_bigqueryml_generate_create" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_bigqueryml_generate_create"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_bigqueryml_generate_create.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id,
connection_id = google_bigquery_connection.vertex_ai_connection.id,
model_name = var.text_generation_model_name,
region = var.region
}
)
}

# Query Bigquery ML Model for describing customer clusters
resource "google_bigquery_routine" "sp_bigqueryml_generate_describe" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_bigqueryml_generate_describe"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_bigqueryml_generate_describe.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id,
model_name = var.text_generation_model_name
}
)

depends_on = [
google_bigquery_routine.sp_bigqueryml_generate_create
]
}

# # Add Translation Scripts
resource "google_bigquery_routine" "sp_sample_translation_queries" {
project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_sample_translation_queries"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_sample_translation_queries.sql", { project_id = module.project-services.project_id, dataset_id = google_bigquery_dataset.ds_edw.dataset_id })

project = module.project-services.project_id
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
routine_id = "sp_sample_translation_queries"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = templatefile("${path.module}/src/sql/sp_sample_translation_queries.sql", {
project_id = module.project-services.project_id,
dataset_id = google_bigquery_dataset.ds_edw.dataset_id
}
)
depends_on = [
google_bigquery_table.tbl_edw_inventory_items,
]
Expand Down
3 changes: 2 additions & 1 deletion modules/data_warehouse/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ module "project-services" {
enable_apis = var.enable_apis

activate_apis = [
"aiplatform.googleapis.com",
"bigquery.googleapis.com",
"bigqueryconnection.googleapis.com",
"bigquerydatatransfer.googleapis.com",
Expand Down Expand Up @@ -61,7 +62,7 @@ module "project-services" {
}

resource "time_sleep" "wait_after_apis" {
create_duration = "120s"
create_duration = "90s"
depends_on = [module.project-services]
}

Expand Down
3 changes: 3 additions & 0 deletions modules/data_warehouse/metadata.display.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,6 @@ spec:
region:
name: region
title: Region
text_generation_model_name:
name: text_generation_model_name
title: Text Generation Model Name
Loading