Terraform module to create AWS Cloud infrastructure resources required to run DataRobot.
module "datarobot_infra" {
source = "datarobot-oss/dr-infra/aws"
name = "datarobot"
domain_name = "yourdomain.com"
create_network = true
network_address_space = "10.7.0.0/16"
create_dns_zones = false
existing_public_route53_zone_id = "Z06110132R7HO9BLI64XY"
create_acm_certificate = false
existing_acm_certificate_arn = "arn:aws:acm:us-east-1:000000000000:certificate/00000000-0000-0000-0000-000000000000"
create_encryption_key = true
create_storage = true
create_container_registry = true
create_kubernetes_cluster = true
create_app_identity = true
cluster_autoscaler = true
descheduler = true
ebs_csi_driver = true
aws_load_balancer_controller = true
ingress_nginx = true
internet_facing_ingress_lb = true
cert_manager = true
external_dns = true
nvidia_device_plugin = true
metrics_server = true
tags = {
application = "datarobot"
environment = "dev"
managed-by = "terraform"
}
}
- Complete - Demonstrates all input variables
- Partial - Demonstrates the use of existing resources
- Minimal - Demonstrates the minimum set of input variables needed to deploy all infrastructure
- Clone the repo
git clone https://github.com/datarobot-oss/terraform-aws-dr-infra.git
- Change directories into the example that best suits your needs
cd terraform-aws-dr-infra/examples/minimal
- Modify
main.tf
as needed with any changes to the input variables passed to thedatarobot_infra
module - Run terraform commands
terraform init
terraform plan
terraform apply
terraform destroy
create_network
to create a new VPCexisting_vpc_id
to use an existing VPC
Uses the terraform-aws-vpc module to create a new VPC with one public and private subnet per Availability Zone, a NAT gateway with an Elastic IP, and an Internet Gateway.
An interface VPC endpoint for the S3 service is created by default. More can be specified by updating the network_private_endpoints
input variable.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowVPCActions",
"Effect": "Allow",
"Action": [
"ec2:DescribeAvailabilityZones",
"ec2:CreateVpc",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:ModifyVpcAttribute",
"ec2:DeleteVpc",
"ec2:CreateSubnet",
"ec2:DescribeSubnets",
"ec2:DeleteSubnet",
"ec2:CreateRouteTable",
"ec2:DescribeRouteTables",
"ec2:AssociateRouteTable",
"ec2:DisassociateRouteTable",
"ec2:DeleteRouteTable",
"ec2:CreateRoute",
"ec2:DeleteRoute",
"ec2:CreateInternetGateway",
"ec2:DescribeInternetGateways",
"ec2:AttachInternetGateway",
"ec2:DetachInternetGateway",
"ec2:DeleteInternetGateway",
"ec2:CreateNatGateway",
"ec2:DescribeNatGateways",
"ec2:DeleteNatGateway",
"ec2:AllocateAddress",
"ec2:DescribeAddresses",
"ec2:DescribeAddressesAttribute",
"ec2:DisassociateAddress",
"ec2:ReleaseAddress",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSecurityGroupRules",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:CreateNetworkAclEntry",
"ec2:DescribeNetworkAcls",
"ec2:DeleteNetworkAclEntry",
"ec2:DescribeNetworkInterfaces",
"ec2:CreateTags"
],
"Resource": "*"
}
]
}
create_dns_zones
to create new Route53 zonesexisting_public_route53_zone_id
/existing_private_route53_zone_id
to use an existing Route53 zone
Uses the terraform-aws-route53 module to create new public and/or private Route53 hosted zone with name domain_name
.
A public Route53 zone is used by external_dns
to create records for the DataRobot ingress resources when internet_facing_ingress_lb
is true
. It is also used for DNS validation when creating a new ACM certificate.
A private Route53 zone is used by external_dns
to create records for the DataRobot ingress resources when internet_facing_ingress_lb
is false
.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowRoute53Actions",
"Effect": "Allow",
"Action": [
"route53:CreateHostedZone",
"route53:GetHostedZone",
"route53:DeleteHostedZone",
"route53:ListResourceRecordSets",
"route53:GetChange",
"route53:GetDNSSEC",
"route53:ListTagsForResource",
"route53:ChangeTagsForResource"
],
"Resource": "*"
}
]
}
create_acm_certificate
to create a new ACM certificateexisting_acm_certificate_arn
to use an existing ACM certificate
Uses the terraform-aws-acm module to create a new ACM certificate with SANs of domain_name
and *.domain_name
. Validation is performed against either an existing Route53 hosted zone id specified in the existing_public_route53_zone_id
input variable or the public zone created by the dns
module.
This certificate will be used on the NLB deployed by the ingress-nginx
helm chart.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowACMActions",
"Effect": "Allow",
"Action": [
"acm:RequestCertificate",
"acm:DescribeCertificate",
"acm:DeleteCertificate",
"acm:AddTagsToCertificate",
"acm:ListTagsForCertificate",
"route53:ChangeResourceRecordSets"
],
"Resource": "*"
}
]
}
create_encryption_key
to create a new KMS keyexisting_kms_key_arn
to use an existing KMS key
Uses the terraform-aws-kms module to create a new KMS encryption key with the current caller identity as a key administrator and the autoscaling service role (autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling
). The key is used to encrypt EBS volumes in the EKS cluster.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowKMSActions",
"Effect": "Allow",
"Action": [
"kms:TagResource",
"kms:CreateKey",
"kms:CreateAlias",
"kms:ListAliases",
"kms:DeleteAlias"
],
"Resource": "*"
}
]
}
create_storage
to create a new S3 bucketexisting_s3_bucket_id
to use an existing S3 bucket
Uses the terraform-aws-s3 module to create a new S3 storage bucket.
The DataRobot application will use this storage bucket for persistent file storage.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3Actions",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetBucketPolicy",
"s3:GetBucketAcl",
"s3:GetBucketCORS",
"s3:GetBucketWebsite",
"s3:GetBucketVersioning",
"s3:GetBucketLogging",
"s3:GetBucketRequestPayment",
"s3:GetBucketTagging",
"s3:PutBucketTagging",
"s3:GetBucketPublicAccessBlock",
"s3:PutBucketPublicAccessBlock",
"s3:GetBucketObjectLockConfiguration",
"s3:GetAccelerateConfiguration",
"s3:GetLifecycleConfiguration",
"s3:GetReplicationConfiguration",
"s3:GetEncryptionConfiguration",
"s3:DeleteObjectVersion",
"s3:DeleteBucket"
],
"Resource": "*"
}
]
}
create_container_registry
to create a new Amazon Elastic Container Registry
Uses the terraform-aws-ecr module to create a new ECR repositories used by the DataRobot application to host custom images created by various services.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowECRActions",
"Effect": "Allow",
"Action": [
"ecr:CreateRepository",
"ecr:DescribeRepositories",
"ecr:DeleteRepository",
"ecr:TagResource",
"ecr:ListTagsForResource"
],
"Resource": "*"
}
]
}
create_kubernetes_cluster
to create a new Amazon Elastic Kubernetes Service Clusterexisting_eks_cluster_name
to use an existing EKS cluster
Uses the terraform-aws-eks module to create a new EKS cluster to host the DataRobot application and any other helm charts installed by this module.
Included EKS addons:
coredns
eks-pod-identity-agent
kube-proxy
vpc-cni
An access entry for the identity of the cluster creator is added as a cluster admin. More access entries can be created via the kubernetes_cluster_access_entries
variable.
Network access to the cluster's public API endpoint (via the public internet) is enabled by default. This access can be restricted to a specific set of public IP addresses using the kubernetes_cluster_endpoint_public_access_cidrs
variable or disabled completely by setting the kubernetes_cluster_endpoint_public_access
variable to false
.
Network access to the cluster's private API endpoint is only allowed for the Kubernetes nodes by default. If the private API endpoint needs to be accessed from other hosts (such as a provisioner or bastion within the same VPC), the IP address of that host needs to be specified in the kubernetes_cluster_endpoint_private_access_cidrs
variable.
Two node groups are created:
- A
primary
node group intended to host the majority of the DataRobot pods - A
gpu
node group intended to host GPU workload pods
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowEKSActions",
"Effect": "Allow",
"Action": [
"ec2:CreateSecurityGroup",
"ec2:DeleteSecurityGroup",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:CreateLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DeleteLaunchTemplate",
"ec2:RunInstances",
"ec2:DescribeTags",
"ec2:DeleteTags",
"eks:CreateCluster",
"eks:DescribeCluster",
"eks:DeleteCluster",
"eks:CreateAccessEntry",
"eks:DescribeAccessEntry",
"eks:DeleteAccessEntry",
"eks:CreateNodegroup",
"eks:DescribeNodegroup",
"eks:DeleteNodegroup",
"eks:AssociateAccessPolicy",
"eks:ListAssociatedAccessPolicies",
"eks:DisassociateAccessPolicy",
"eks:CreateAddon",
"eks:DescribeAddon",
"eks:DescribeAddonVersions",
"eks:DeleteAddon",
"eks:TagResource",
"iam:CreateRole",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:TagRole",
"iam:PassRole",
"iam:DeleteRole",
"iam:CreatePolicy",
"iam:GetPolicy",
"iam:TagPolicy",
"iam:GetPolicyVersion",
"iam:ListPolicyVersions",
"iam:DeletePolicy",
"iam:AttachRolePolicy",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies",
"iam:PutRolePolicy",
"iam:DetachRolePolicy",
"iam:DeleteRolePolicy",
"iam:ListInstanceProfilesForRole",
"iam:CreateOpenIDConnectProvider",
"iam:GetOpenIDConnectProvider",
"iam:TagOpenIDConnectProvider",
"iam:DeleteOpenIDConnectProvider",
"logs:CreateLogGroup",
"logs:DescribeLogGroups",
"logs:DeleteLogGroup",
"logs:PutRetentionPolicy",
"logs:TagResource",
"logs:ListTagsForResource"
],
"Resource": "*"
}
]
}
aws_load_balancer_controller
to install theaws-load-balancer-controller
helm chart
Uses the terraform-aws-eks-pod-identity module to create a pod identity for the aws-load-balancer-controller
service account in the aws-load-balancer-controller
namespace with an IAM policy that allows the management of AWS load balancers.
Uses the terraform-helm-release module to install the https://aws.github.io/eks-charts/aws-load-balancer-controller
helm chart into the aws-load-balancer-controller
namespace.
This helm chart provisions Network Load Balancers for Kubernetes Service resources. In the default use-case, the AWS Load Balancer Controller will create a NLB directing traffic to the ingress-nginx
Kubernetes services.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPodIdentityActions",
"Effect": "Allow",
"Action": [
"eks:CreatePodIdentityAssociation",
"eks:DescribePodIdentityAssociation",
"eks:DeletePodIdentityAssociation"
],
"Resource": "*"
}
]
}
cluster_autoscaler
to install thecluster-autoscaler
helm chart
Uses the terraform-aws-eks-pod-identity module to create a pod identity for the cluster-autoscaler-aws-cluster-autoscaler
service account in the cluster-autoscaler
namespace with an IAM policy that allows the creation and management of EC2 instances.
Uses the terraform-helm-release module to install the cluster-autoscaler
helm chart from the https://kubernetes.github.io/autoscaler
helm repo into the cluster-autoscaler
namespace.
This helm chart allows for automatic horizontal scaling of EKS cluster nodes.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPodIdentityActions",
"Effect": "Allow",
"Action": [
"eks:CreatePodIdentityAssociation",
"eks:DescribePodIdentityAssociation",
"eks:DeletePodIdentityAssociation"
],
"Resource": "*"
}
]
}
descheduler
to install thedescheduler
helm chart
Uses the terraform-helm-release module to install the descheduler
helm chart from the https://kubernetes-sigs.github.io/descheduler/
helm repo into the descheduler
namespace.
This helm chart allows for automatic rescheduling of pods for optimizing resource consumption.
Not required
ebs_csi_driver
to install theaws-ebs-csi-driver
helm chart
Uses the terraform-aws-eks-pod-identity module to create a pod identity for the ebs-csi-controller-sa
service account in the aws-ebs-csi-driver
namespace with an IAM policy that allows the creation and management of EBS volumes.
Uses the terraform-helm-release module to install the aws-ebs-csi-driver
helm chart from the https://kubernetes-sigs.github.io/aws-ebs-csi-driver/
repo into the aws-ebs-csi-driver
namespace.
This helm chart creates default Delete
and Retain
storage classes called ebs-standard
and ebs-standard-retain
, respectively, of type gp3
using the encryption key passed in from the existing_kms_key_arn
variable or the KMS key created in the encryption_key
module. These storage classes are used by the DataRobot application Persistent Volume Claims.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPodIdentityActions",
"Effect": "Allow",
"Action": [
"eks:CreatePodIdentityAssociation",
"eks:DescribePodIdentityAssociation",
"eks:DeletePodIdentityAssociation"
],
"Resource": "*"
}
]
}
ingress_nginx
to install theingress-nginx
helm chart
Uses the terraform-helm-release module to install the ingress-nginx
helm chart from the https://kubernetes.github.io/ingress-nginx
repo into the ingress-nginx
namespace.
The ingress-nginx
helm chart will trigger the deployment of an AWS Network Load Balancer to act as ingress for the DataRobot application. When internet_facing_ingress_lb
is true
, the NLB will be of type internet-facing
. When internet_facing_ingress_lb
is false
, the NLB will be of type internal
.
By default this NLB will terminate TLS using either the certificate specified with the existing_acm_certificate_arn
variable or the certificate created in the ACM module if create_acm_certificate
is true
. It is possible not to use ACM at all by setting create_acm_certificate
to false
and overriding the controller.service.targetPorts.https
setting as demonstrated in the complete example.
Not required
cert_manager
to install thecert-manager
helm chart
Uses the terraform-aws-eks-pod-identity module to create a pod identity for the cert-manager
service account in the cert-manager
namespace with an IAM policy that allows the creation of DNS resources within the specified DNS zone.
Uses the terraform-helm-release module to install the cert-manager
helm chart from the https://charts.jetstack.io
repo into the cert-manager
namespace.
cert-manager
can be used by the DataRobot application to create and manage various certificates. When an ACM certificate is used in the ingress load balancer, cert-manager
is typically just used to generate self-signed certificates that can be used for service to service communications.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPodIdentityActions",
"Effect": "Allow",
"Action": [
"eks:CreatePodIdentityAssociation",
"eks:DescribePodIdentityAssociation",
"eks:DeletePodIdentityAssociation"
],
"Resource": "*"
}
]
}
external_dns
to install theexternal-dns
helm chart
Uses the terraform-aws-eks-pod-identity module to create a pod identity for the external-dns
service account in the external-dns
namespace with an IAM policy that allows the creation of DNS resources within the specified DNS zone.
Uses the terraform-helm-release module to install the external-dns
helm chart from the https://charts.bitnami.com/bitnami
repo into the external-dns
namespace.
external-dns
is used to automatically create DNS records for ingress resources in the Kubernetes cluster. When the DataRobot application is installed and the ingress resources are created, external-dns
will automatically create a DNS record pointing at the ingress resource.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPodIdentityActions",
"Effect": "Allow",
"Action": [
"eks:CreatePodIdentityAssociation",
"eks:DescribePodIdentityAssociation",
"eks:DeletePodIdentityAssociation"
],
"Resource": "*"
}
]
}
nvidia_device_plugin
to install thenvidia-device-plugin
helm chart
Uses the terraform-helm-release module to install the nvidia-device-plugin
helm chart from the https://nvidia.github.io/k8s-device-plugin
repo into the nvidia-device-plugin
namespace.
This helm chart is used to expose GPU resources on nodes intended for GPU workloads such as the default gpu
node group.
Not required
metrics_server
to install themetrics-server
helm chart
Uses the terraform-helm-release module to install the metrics-server
helm chart from the https://kubernetes-sigs.github.io/metrics-server
repo into the metrics-server
namespace.
This helm chart is used to expose CPU and memory metrics to the Kubernetes cluster.
Not required
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowVPCActions",
"Effect": "Allow",
"Action": [
"ec2:DescribeAvailabilityZones",
"ec2:CreateVpc",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:ModifyVpcAttribute",
"ec2:DeleteVpc",
"ec2:CreateSubnet",
"ec2:DescribeSubnets",
"ec2:DeleteSubnet",
"ec2:CreateRouteTable",
"ec2:DescribeRouteTables",
"ec2:AssociateRouteTable",
"ec2:DisassociateRouteTable",
"ec2:DeleteRouteTable",
"ec2:CreateRoute",
"ec2:DeleteRoute",
"ec2:CreateInternetGateway",
"ec2:DescribeInternetGateways",
"ec2:AttachInternetGateway",
"ec2:DetachInternetGateway",
"ec2:DeleteInternetGateway",
"ec2:CreateNatGateway",
"ec2:DescribeNatGateways",
"ec2:DeleteNatGateway",
"ec2:AllocateAddress",
"ec2:DescribeAddresses",
"ec2:DescribeAddressesAttribute",
"ec2:DisassociateAddress",
"ec2:ReleaseAddress",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSecurityGroupRules",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:CreateNetworkAclEntry",
"ec2:DescribeNetworkAcls",
"ec2:DeleteNetworkAclEntry",
"ec2:DescribeNetworkInterfaces",
"ec2:CreateTags"
],
"Resource": "*"
},
{
"Sid": "AllowRoute53Actions",
"Effect": "Allow",
"Action": [
"route53:CreateHostedZone",
"route53:GetHostedZone",
"route53:DeleteHostedZone",
"route53:ListResourceRecordSets",
"route53:GetChange",
"route53:GetDNSSEC",
"route53:ListTagsForResource",
"route53:ChangeTagsForResource"
],
"Resource": "*"
},
{
"Sid": "AllowACMActions",
"Effect": "Allow",
"Action": [
"acm:RequestCertificate",
"acm:DescribeCertificate",
"acm:DeleteCertificate",
"acm:AddTagsToCertificate",
"acm:ListTagsForCertificate",
"route53:ChangeResourceRecordSets"
],
"Resource": "*"
},
{
"Sid": "AllowKMSActions",
"Effect": "Allow",
"Action": [
"kms:TagResource",
"kms:CreateKey",
"kms:CreateAlias",
"kms:ListAliases",
"kms:DeleteAlias"
],
"Resource": "*"
},
{
"Sid": "AllowS3Actions",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetBucketPolicy",
"s3:GetBucketAcl",
"s3:GetBucketCORS",
"s3:GetBucketWebsite",
"s3:GetBucketVersioning",
"s3:GetBucketLogging",
"s3:GetBucketRequestPayment",
"s3:GetBucketTagging",
"s3:PutBucketTagging",
"s3:GetBucketPublicAccessBlock",
"s3:PutBucketPublicAccessBlock",
"s3:GetBucketObjectLockConfiguration",
"s3:GetAccelerateConfiguration",
"s3:GetLifecycleConfiguration",
"s3:GetReplicationConfiguration",
"s3:GetEncryptionConfiguration",
"s3:DeleteObjectVersion",
"s3:DeleteBucket"
],
"Resource": "*"
},
{
"Sid": "AllowECRActions",
"Effect": "Allow",
"Action": [
"ecr:CreateRepository",
"ecr:DescribeRepositories",
"ecr:DeleteRepository",
"ecr:TagResource",
"ecr:ListTagsForResource"
],
"Resource": "*"
},
{
"Sid": "AllowEKSActions",
"Effect": "Allow",
"Action": [
"ec2:CreateSecurityGroup",
"ec2:DeleteSecurityGroup",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:CreateLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DeleteLaunchTemplate",
"ec2:RunInstances",
"ec2:DescribeTags",
"ec2:DeleteTags",
"eks:CreateCluster",
"eks:DescribeCluster",
"eks:DeleteCluster",
"eks:CreateAccessEntry",
"eks:DescribeAccessEntry",
"eks:DeleteAccessEntry",
"eks:CreateNodegroup",
"eks:DescribeNodegroup",
"eks:DeleteNodegroup",
"eks:AssociateAccessPolicy",
"eks:ListAssociatedAccessPolicies",
"eks:DisassociateAccessPolicy",
"eks:CreateAddon",
"eks:DescribeAddon",
"eks:DescribeAddonVersions",
"eks:DeleteAddon",
"eks:TagResource",
"iam:CreateRole",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:TagRole",
"iam:PassRole",
"iam:DeleteRole",
"iam:CreatePolicy",
"iam:GetPolicy",
"iam:TagPolicy",
"iam:GetPolicyVersion",
"iam:ListPolicyVersions",
"iam:DeletePolicy",
"iam:AttachRolePolicy",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies",
"iam:PutRolePolicy",
"iam:DetachRolePolicy",
"iam:DeleteRolePolicy",
"iam:ListInstanceProfilesForRole",
"iam:CreateOpenIDConnectProvider",
"iam:GetOpenIDConnectProvider",
"iam:TagOpenIDConnectProvider",
"iam:DeleteOpenIDConnectProvider",
"logs:CreateLogGroup",
"logs:DescribeLogGroups",
"logs:DeleteLogGroup",
"logs:PutRetentionPolicy",
"logs:TagResource",
"logs:ListTagsForResource"
],
"Resource": "*"
},
{
"Sid": "AllowPodIdentityActions",
"Effect": "Allow",
"Action": [
"eks:CreatePodIdentityAssociation",
"eks:DescribePodIdentityAssociation",
"eks:DeletePodIdentityAssociation"
],
"Resource": "*"
}
]
}
Release | Supported DR Versions |
---|---|
~> 1.0 | ~> 10.1 |
Name | Version |
---|---|
terraform | >= 1.3.2 |
aws | >= 5.61 |
helm | >= 2.15 |
Name | Version |
---|---|
aws | >= 5.61 |
Name | Source | Version |
---|---|---|
acm | terraform-aws-modules/acm/aws | ~> 4.0 |
app_identity | terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc | ~> 5.0 |
aws_load_balancer_controller | ./modules/aws-load-balancer-controller | n/a |
aws_vpc_cni_ipv4_pod_identity | terraform-aws-modules/eks-pod-identity/aws | ~> 1.0 |
cert_manager | ./modules/cert-manager | n/a |
cluster_autoscaler | ./modules/cluster-autoscaler | n/a |
container_registry | terraform-aws-modules/ecr/aws | ~> 2.0 |
descheduler | ./modules/descheduler | n/a |
dns | terraform-aws-modules/route53/aws//modules/zones | ~> 3.0 |
ebs_csi_driver | ./modules/ebs-csi-driver | n/a |
encryption_key | terraform-aws-modules/kms/aws | ~> 3.0 |
endpoints | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | ~> 5.0 |
external_dns | ./modules/external-dns | n/a |
ingress_nginx | ./modules/ingress-nginx | n/a |
kubernetes | terraform-aws-modules/eks/aws | ~> 20.0 |
metrics_server | ./modules/metrics-server | n/a |
network | terraform-aws-modules/vpc/aws | ~> 5.0 |
nvidia_device_plugin | ./modules/nvidia-device-plugin | n/a |
storage | terraform-aws-modules/s3-bucket/aws | ~> 4.0 |
Name | Type |
---|---|
aws_autoscaling_group_tag.gpu | resource |
aws_autoscaling_group_tag.primary | resource |
aws_availability_zones.available | data source |
aws_caller_identity.current | data source |
aws_eks_cluster.existing | data source |
aws_eks_cluster_auth.this | data source |
aws_route53_zone.private | data source |
aws_route53_zone.public | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
aws_load_balancer_controller | Install the aws-load-balancer-controller helm chart to use AWS Network Load Balancers as ingress to the EKS cluster. All other aws_load_balancer_controller variables are ignored if this variable is false. | bool |
true |
no |
aws_load_balancer_controller_values | Path to templatefile containing custom values for the aws-load-balancer-controller helm chart | string |
"" |
no |
aws_load_balancer_controller_variables | Variables passed to the aws_load_balancer_controller_values templatefile | any |
{} |
no |
cert_manager | Install the cert-manager helm chart. All other cert_manager variables are ignored if this variable is false. | bool |
true |
no |
cert_manager_values | Path to templatefile containing custom values for the cert-manager helm chart | string |
"" |
no |
cert_manager_variables | Variables passed to the cert_manager_values templatefile | any |
{} |
no |
cluster_autoscaler | Install the cluster-autoscaler helm chart to enable horizontal autoscaling of the EKS cluster nodes. All other cluster_autoscaler variables are ignored if this variable is false | bool |
true |
no |
cluster_autoscaler_values | Path to templatefile containing custom values for the cluster-autoscaler helm chart | string |
"" |
no |
cluster_autoscaler_variables | Variables passed to the cluster_autoscaler_values templatefile | any |
{} |
no |
create_acm_certificate | Create a new ACM certificate for the ingress load balancer to use. Ignored if existing_acm_certificate_arn is specified. | bool |
true |
no |
create_app_identity | Create an IAM role for the DataRobot application service accounts | bool |
true |
no |
create_container_registry | Create DataRobot image builder container repositories in Amazon Elastic Container Registry | bool |
true |
no |
create_dns_zones | Create DNS zones for domain_name. Ignored if existing_public_route53_zone_id and existing_private_route53_zone_id are specified. | bool |
true |
no |
create_encryption_key | Create a new KMS key used for EBS volume encryption on EKS nodes. Ignored if existing_kms_key_arn is specified. | bool |
true |
no |
create_kubernetes_cluster | Create a new Amazon Elastic Kubernetes Cluster. All kubernetes and helm chart variables are ignored if this variable is false. | bool |
true |
no |
create_network | Create a new Virtual Private Cloud. Ignored if an existing existing_vpc_id is specified. | bool |
true |
no |
create_storage | Create a new S3 storage bucket to use for DataRobot application file storage. Ignored if an existing_s3_bucket_id is specified. | bool |
true |
no |
datarobot_namespace | Kubernetes namespace in which the DataRobot application will be installed | string |
"dr-app" |
no |
descheduler | Install the descheduler helm chart to enable rescheduling of pods. All other descheduler variables are ignored if this variable is false | bool |
true |
no |
descheduler_values | Path to templatefile containing custom values for the descheduler helm chart | string |
"" |
no |
descheduler_variables | Variables passed to the descheduler templatefile | any |
{} |
no |
dns_zones_force_destroy | Force destroy the public and private Route53 zones. Ignored if an existing route53_zone_id is specified or create_dns_zones is false. | bool |
false |
no |
domain_name | Name of the domain to use for the DataRobot application. If create_dns_zones is true then zones will be created for this domain. It is also used by ACM for DNS validation and as a domain filter by the external-dns helm chart. | string |
"" |
no |
ebs_csi_driver | Install the aws-ebs-csi-driver helm chart to enable use of EBS for Kubernetes persistent volumes. All other ebs_csi_driver variables are ignored if this variable is false | bool |
true |
no |
ebs_csi_driver_values | Path to templatefile containing custom values for the aws-ebs-csi-driver helm chart | string |
"" |
no |
ebs_csi_driver_variables | Variables passed to the ebs_csi_driver_values templatefile | any |
{} |
no |
ecr_repositories | Repositories to create | set(string) |
[ |
no |
ecr_repositories_force_destroy | Force destroy the ECR repositories. Ignored if create_container_registry is false. | bool |
false |
no |
existing_acm_certificate_arn | ARN of existing ACM certificate to use with the ingress load balancer created by the ingress_nginx module. When specified, create_acm_certificate will be ignored. | string |
"" |
no |
existing_eks_cluster_name | Name of existing EKS cluster to use. When specified, all other kubernetes variables will be ignored. | string |
null |
no |
existing_kms_key_arn | ARN of existing KMS key used for EBS volume encryption on EKS nodes. When specified, create_encryption_key will be ignored. | string |
"" |
no |
existing_kubernetes_nodes_subnet_id | List of existing subnet IDs to be used for the EKS cluster. Required when an existing_network_id is specified. Ignored if create_network is true and no existing_network_id is specified. Subnets must adhere to VPC requirements and considerations https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html. | list(string) |
[] |
no |
existing_private_route53_zone_id | ID of existing private Route53 hosted zone to use for private DNS records created by external-dns. This is required when create_dns_zones is false and ingress_nginx is true with internet_facing_ingress_lb false. | string |
"" |
no |
existing_public_route53_zone_id | ID of existing public Route53 hosted zone to use for public DNS records created by external-dns and ACM certificate validation. This is required when create_dns_zones is false and ingress_nginx and internet_facing_ingress_lb are true or when create_acm_certificate is true. | string |
"" |
no |
existing_s3_bucket_id | ID of existing S3 storage bucket to use for DataRobot application file storage. When specified, all other storage variables will be ignored. | string |
"" |
no |
existing_vpc_id | ID of an existing VPC to use. When specified, other network variables are ignored. | string |
"" |
no |
external_dns | Install the external_dns helm chart to create DNS records for ingress resources matching the domain_name variable. All other external_dns variables are ignored if this variable is false. | bool |
true |
no |
external_dns_values | Path to templatefile containing custom values for the external-dns helm chart | string |
"" |
no |
external_dns_variables | Variables passed to the external_dns_values templatefile | any |
{} |
no |
ingress_nginx | Install the ingress-nginx helm chart to use as the ingress controller for the EKS cluster. All other ingress_nginx variables are ignored if this variable is false. | bool |
true |
no |
ingress_nginx_values | Path to templatefile containing custom values for the ingress-nginx helm chart. | string |
"" |
no |
ingress_nginx_variables | Variables passed to the ingress_nginx_values templatefile | any |
{} |
no |
internet_facing_ingress_lb | Determines the type of NLB created for EKS ingress. If true, an internet-facing NLB will be created. If false, an internal NLB will be created. Ignored when ingress_nginx is false. | bool |
true |
no |
kubernetes_cluster_access_entries | Map of access entries to add to the cluster | any |
{} |
no |
kubernetes_cluster_endpoint_private_access_cidrs | List of additional CIDR blocks allowed to access the Amazon EKS private API server endpoint. By default only the kubernetes nodes are allowed, if any other hosts such as a provisioner need to access the EKS private API endpoint they need to be added here. | list(string) |
[] |
no |
kubernetes_cluster_endpoint_public_access | Indicates whether or not the Amazon EKS public API server endpoint is enabled | bool |
true |
no |
kubernetes_cluster_endpoint_public_access_cidrs | List of CIDR blocks which can access the Amazon EKS public API server endpoint | list(string) |
[ |
no |
kubernetes_cluster_version | EKS cluster version | string |
null |
no |
kubernetes_gpu_nodegroup_ami_type | Type of Amazon Machine Image (AMI) associated with the EKS GPU Node Group. See the AWS documentation for valid values | string |
"AL2_x86_64_GPU" |
no |
kubernetes_gpu_nodegroup_desired_size | Desired number of nodes in the GPU node group | number |
0 |
no |
kubernetes_gpu_nodegroup_instance_types | Instance types used for the GPU node group | list(string) |
[ |
no |
kubernetes_gpu_nodegroup_labels | Key-value map of Kubernetes labels to be applied to the nodes in the GPU node group. Only labels that are applied with the EKS API are managed by this argument. Other Kubernetes labels applied to the EKS Node Group will not be managed | map(string) |
{ |
no |
kubernetes_gpu_nodegroup_max_size | Maximum number of nodes in the GPU node group | number |
10 |
no |
kubernetes_gpu_nodegroup_min_size | Minimum number of nodes in the GPU node group | number |
0 |
no |
kubernetes_gpu_nodegroup_name | Name of the GPU node group | string |
"gpu" |
no |
kubernetes_gpu_nodegroup_taints | The Kubernetes taints to be applied to the nodes in the GPU node group. Maximum of 50 taints per node group | any |
{ |
no |
kubernetes_primary_nodegroup_ami_type | Type of Amazon Machine Image (AMI) associated with the EKS Primary Node Group. See the AWS documentation for valid values | string |
"AL2023_x86_64_STANDARD" |
no |
kubernetes_primary_nodegroup_desired_size | Desired number of nodes in the primary node group | number |
1 |
no |
kubernetes_primary_nodegroup_instance_types | Instance types used for the primary node group | list(string) |
[ |
no |
kubernetes_primary_nodegroup_labels | Key-value map of Kubernetes labels to be applied to the nodes in the primary node group. Only labels that are applied with the EKS API are managed by this argument. Other Kubernetes labels applied to the EKS Node Group will not be managed. | map(string) |
{ |
no |
kubernetes_primary_nodegroup_max_size | Maximum number of nodes in the primary node group | number |
10 |
no |
kubernetes_primary_nodegroup_min_size | Minimum number of nodes in the primary node group | number |
0 |
no |
kubernetes_primary_nodegroup_name | Name of the primary EKS node group | string |
"primary" |
no |
kubernetes_primary_nodegroup_taints | The Kubernetes taints to be applied to the nodes in the primary node group. Maximum of 50 taints per node group | any |
{} |
no |
metrics_server | Install the metrics-server helm chart to expose resource metrics for Kubernetes built-in autoscaling pipelines. All other metrics_server variables are ignored if this variable is false. | bool |
true |
no |
metrics_server_values | Path to templatefile containing custom values for the metrics_server helm chart | string |
"" |
no |
metrics_server_variables | Variables passed to the metrics_server_values templatefile | any |
{} |
no |
name | Name to use as a prefix for created resources | string |
n/a | yes |
network_address_space | CIDR block to be used for the new VPC | string |
"10.0.0.0/16" |
no |
network_private_endpoints | List of AWS services to create interface VPC endpoints for | list(string) |
[ |
no |
nvidia_device_plugin | Install the nvidia-device-plugin helm chart to expose node GPU resources to the EKS cluster. All other nvidia_device_plugin variables are ignored if this variable is false. | bool |
true |
no |
nvidia_device_plugin_values | Path to templatefile containing custom values for the nvidia-device-plugin helm chart | string |
"" |
no |
nvidia_device_plugin_variables | Variables passed to the nvidia_device_plugin_values templatefile | any |
{} |
no |
s3_bucket_force_destroy | Force destroy the public and private Route53 zones | bool |
false |
no |
tags | A map of tags to add to all created resources | map(string) |
{ |
no |
Name | Description |
---|---|
acm_certificate_arn | ARN of the ACM certificate |
app_role_arn | ARN of the IAM role to be assumed by the DataRobot app service accounts |
ebs_encryption_key_id | ARN of the EBS KMS key |
ecr_repository_urls | URLs of the image builder repositories |
kubernetes_cluster_certificate_authority_data | Base64 encoded certificate data required to communicate with the cluster |
kubernetes_cluster_endpoint | Endpoint for your Kubernetes API server |
kubernetes_cluster_name | Name of the EKS cluster |
private_route53_zone_arn | Zone ARN of the private Route53 zone |
private_route53_zone_id | Zone ID of the private Route53 zone |
public_route53_zone_arn | Zone ARN of the public Route53 zone |
public_route53_zone_id | Zone ID of the public Route53 zone |
s3_bucket_id | Name of the S3 bucket |
vpc_id | The ID of the VPC |
If you'd like to report an issue or bug, suggest improvements, or contribute code to this project, please refer to CONTRIBUTING.md.
This project has adopted the Contributor Covenant for its Code of Conduct. See CODE_OF_CONDUCT.md to read it in full.
Licensed under the Apache License 2.0. See LICENSE to read it in full.