Skip to content

cloudera-labs/terraform-cdp-modules

Repository files navigation

Terraform Modules for CDP Prerequisites

This repository contains a number of Terraform modules for creation of the pre-requisite Cloud resources on AWS, Azure and GCP and the deployment of Cloudera Data Platform (CDP) Public Cloud.

Modules

Module name Description
terraform-cdp-aws-pre-reqs For all AWS pre-requisite Cloud resources
terraform-cdp-azure-pre-reqs For all Azure pre-requisite Cloud resources
terraform-cdp-gcp-pre-reqs For all GCP pre-requisite Cloud resources
terraform-cdp-deploy For deployment of CDP on AWS, Azure or GCP.
terraform-aws-cred-permissions Module for creation of the Cross Account Credential pre-requisite on AWS. Note that this module is called from the terraform-cdp-aws-prereqs module.
terraform-aws-permissions Module for creation of the AWS IAM permissions required by the (CDP) Public Cloud environment and datalake deployment. Note that this module is called from the terraform-cdp-aws-prereqs module.
terraform-aws-vpc Module for creation of the VPC networking resources on AWS suitable. Can be used to create the CDP VPC and Subnets. Note that this module is called from the terraform-cdp-aws-prereqs module.
terraform-aws-tgw Module for creation of AWS Transity Gateway (TGW) and attaching a specified list of VPCs via the TGW. This module can be used to assist in deploying Cloudera Data Platform (CDP) Public Cloud in a fully private networking configuration where a CDP VPC and Networking VPC are connected using the Transit Gateway.
terraform-aws-proxy Module to create and configure and EC2 Auto-Scaling Group for a highly available Squid Proxy service with Network Load Balancer (NLB) to forward traffic to the proxy instances. This module can be used to assist in deploying Cloudera Data Platform (CDP) Public Cloud in a fully private networking configuration where a the CDP Environments uses a proxy config via the NLB.
terraform-azure-nfs Module for creation of Azure NFS File Share required for Cloudera Machine Learning (CML) Public Cloud. Also optionally creates a Virtual Machine which can be used to mount and set the required ownership for CML workspace's projects folder.
terraform-azure-cdw-permissions Module for creation of the Azure Kubernetes Service (AKS) managed identity required for the Cloudera Data Warehouse (CDW) service.
terraform-azure-storage-endpoints Module for creation creation of Azure private endpoints between specified storage accounts and VNet subnets.

Each module contains Terraform resource configuration and example variable definition files.

Usage

The cdp-tf-quickstarts repository demonstrates how to use the modules together to deploy CDP on different cloud environments.

Each module also has a set of examples to show different configuration options for that module.

Deployment

Create infrastructure

Note that the instructions below give the steps to create pre-requisite resources and the CDP deployment all together. The modules can be used on their own to allow further customization.

  1. Clone this repository using the following commands:
git clone https://github.com/cloudera-labs/terraform-cdp-modules.git
cd terraform-cdp-modules
  1. To create cloud pre-requisite resources and the CDP deployment all together, change to the terraform-cdp-deploy directory and select one of the cloud providers.
cd modules/terraform-cdp-deploy/examples/ex<deployment_type>/
  1. Create a terraform.tfvars file with variable definitions to run the module. Reference the terraform.tfvars.sample file in each example folder to create this file.

  2. Run the Terraform module for the chosen deployment type:

terraform init
terraform apply

Once the deployment completes, you can create CDP Data Hubs and Data Services from the CDP Management Console (https://cdp.cloudera.com/).

Clean up the infrastructure

If you no longer need the infrastructure that’s provisioned by the Terraform module, run the following command to remove the deployment infrastructure and terminate all resources.

terraform destroy

Dependencies

To set up CDP via deployment automation using this guide, the following dependencies must be installed in your local environment:

Configure Terraform Provider for AWS, Azure or GCP

Notes on Azure authentication

  • Where you have more than one Azure Subscription the id to use can be passed via the the ARM_SUBSCRIPTION_ID environment variable.

  • When using a Service Principal (SP) to authenticate with Azure, it is not possible to authenticate with azuread Terraform Provider (the provider used to create the Azure Cross Account AD Application) with the command az login --service-principal. We found the the best way to authenticate using an SP is by setting environment variables. Details of required environment variables are in the azuread docs and azurerm docs and summarized below.

export ARM_CLIENT_ID="<sp_client_id>"
export ARM_CLIENT_SECRET="<sp_client_secret>"
export ARM_TENANT_ID="<sp_tenant_id>"
export ARM_SUBSCRIPTION_ID="<sp_subscription_id>" 

Notes on GCP authentication

As outlined in the Getting Started Docs for Google Terraform Provider there are two recommended ways to authenticate with the GCP API.

  1. The Google Cloud SDK (gcloud) can be installed and a User Application Default Credentials ("ADCs") can be created by running the command gcloud auth application-default login

  2. A Google Cloud Service Account key file can be generated and downloaded. The GOOGLE_APPLICATION_CREDENTIALS environment variable can then be set to the location of the file.

export GOOGLE_APPLICATION_CREDENTIALS=<location_of_gcp_sa_json_file>

The Google project Id can be specified via the Google provider configuration variable or the GOOGLE_PROJECT environment variable. This is described in the Google Provider Default Values Configuration documentation.

Local Development Environment

See the DEVELOPMENT.md file for instructions on how to set up an environment for local development of modules.