Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into marcin/use-official-c…
Browse files Browse the repository at this point in the history
…apz-image
  • Loading branch information
maciaszczykm committed Oct 10, 2023
2 parents 8ec01f5 + 4bb87b2 commit 26f0663
Show file tree
Hide file tree
Showing 38 changed files with 771 additions and 97 deletions.
61 changes: 61 additions & 0 deletions airflow/plural/docs/aws-secrets-backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## Connecting to AWS Secrets Backend

Airflow allows you the opportunity to connect to various services as a Secrets Backend as an alternative to using the
Airflow UI to manage connections. One of these services is [AWS Secrets Manager](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/secrets-backends/aws-secrets-manager.html).
Once you add below configurations, Airflow will be able to retrieve Secrets from AWS Secrets Manager (provided that they
have the same prefixes specified in the `KWARGS` config).

In this scenario, the prefixes are `airflow/connections` & `airflow/variables`, so any values stored under the
`airflow/connections` prefix would be treated the same as an object stored in the `Admin >> Connections` menu of the
Airflow UI. Any values stored under the `airflow/variables` prefix would be treated the same as an object stored in the
`Admin >> Variables` menu of the Airflow UI.

### edit values.yaml

You'll then want to edit `airflow/helm/airflow/values.yaml` in your installation repo with something like:

```yaml
airflow:
airflow:
airflow:
config:
AIRFLOW__SECRETS__BACKEND: airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
AIRFLOW__SECRETS__BACKEND_KWARGS: '{"connections_prefix": "airflow/connections","variables_prefix":
"airflow/variables"}'
```
Alternatively, you should be able to do this in the configuration section for airflow in your plural console as well.
### add policy to AWS role
When installing the Airflow Application, Plural added a default role for Airflow. The role will be called
`<your-cluster-name>-airflow`. You will need to add a policy to that role to allow it to access AWS Secrets Manager. You
can use this policy as a starting point:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"secretsmanager:GetRandomPassword",
"secretsmanager:ListSecrets"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "secretsmanager:*",
"Resource": "arn:aws:secretsmanager:<insert-aws-region>:<insert-aws-account-number>:secret:airflow/*"
}
]
}
```

### redeploy

From there, you should be able to run `plural build --only airflow && plural deploy --commit "use aws secrets manager
backend"` to use the secrets backend
146 changes: 146 additions & 0 deletions airflow/plural/docs/running-dbt-via-cosmos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
## Running dbt core in Airflow via Cosmos

[Cosmos](https://github.com/astronomer/astronomer-cosmos) is an open source project that allows you to run dbt core
projects in Airflow natively. To date, it is probably one of the best ways to run dbt core in Airflow.

### custom dockerfile

In order to run dbt core effectively, we recommend you bake a new docker image against ours and then wiring it into your
installation. Please follow the [pip-packages](./pip-packages.md) guide for instructions on baking your own image.

Airflow and dbt both share common dependencies (i.e. Jinja). This can cause dependency clashes between Airflow and your
dbt adapter when you upgrade them. To solve for this, we can put our dbt adapter in its own python virtual environment.
This is possible by adding the following step to your custom `Dockerfile`:

```dockerfile
FROM docker.io/apache/airflow:2.6.3-python3.10

USER root
RUN apt-get -yq update \
&& apt-get -yq install --no-install-recommends \
git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

USER airflow

COPY requirements.txt requirements.txt
RUN pip freeze | grep -i apache-airflow > protected-packages.txt \
&& pip install --constraint ./protected-packages.txt --no-cache-dir -r ./requirements.txt \
&& rm -rf ./protected-packages.txt ./requirements.txt

## create virtual environments for dbt
RUN export PIP_USER=false && python -m venv dbt_venv && source dbt_venv/bin/activate && \
pip install --no-cache-dir dbt-redshift==1.6.1 && deactivate && export PIP_USER=true

```

In this example, we've installed the `dbt-redshift` adapter into the python virtual environment. However, you can swap
the adapter for the one that suites your needs (i.e. `dbt-bigquery`, `dbt-snowflake`, etc.)

### add dbt project to your dags directory

In your dags directory, add a folder called `dbt`. Within that folder, copy your dbt project. For example, if you were
going to add Fishtown Analytic's classic [Jaffle Shop project](https://github.com/dbt-labs/jaffle_shop), your project
directory would look something like this:

```yaml
plural-airflow-repo
└── dags
└── dbt
└── jaffle_shop
├── LICENSE
├── README.md
├── dbt_project.yml
├── etc
│   ├── dbdiagram_definition.txt
│   └── jaffle_shop_erd.png
├── models
│   ├── customers.sql
│   ├── docs.md
│   ├── orders.sql
│   ├── overview.md
│   ├── schema.yml
│   └── staging
│   ├── schema.yml
│   ├── stg_customers.sql
│   ├── stg_orders.sql
│   └── stg_payments.sql
└── seeds
├── raw_customers.csv
├── raw_orders.csv
└── raw_payments.csv
```

### point Cosmos class to the nested dbt project directory

In your dags directory, add a `jaffle_shop.py` file to create a DAG, and add the following contents to it:

```python
"""
## Jaffle Shop
Example of using cosmos to run the jaffle shop dbt project
"""
import os
from datetime import datetime

from airflow import DAG
from cosmos import DbtTaskGroup, ExecutionConfig, ProfileConfig, ProjectConfig
from cosmos.profiles.redshift.user_pass import RedshiftUserPasswordProfileMapping

'''these next lines help to resolve the path to your dbt project in the plural airflow instance vs. local development'''

# Dynamically retrieves the Airflow Home directory
airflow_home = os.getenv("AIRFLOW_HOME", "/usr/local/airflow")

# I've set a local env variable ENVIRONMENT=DEV to determine if dag is running in plural airflow or local airflow
if os.getenv("ENVIRONMENT", "PROD") == "DEV":
# the project path when running Airflow locally
dbt_project_path = f"{airflow_home}/dags/dbt/jaffle_shop"
else:
# the project path in plural cluster
dbt_project_path = f"{airflow_home}/dags/repo/dags/dbt/jaffle_shop"

# the path to the dbt executable that's within the venv created in Dockerfile
dbt_executable_path = f"{airflow_home}/dbt_venv/bin/dbt"

# Profile mapping to connect dbt to a target
profile_mapping = RedshiftUserPasswordProfileMapping(
# airflow connection id to use for the dbt target
conn_id="redshift_default",
profile_args={
# my redshift database name
"dbname": "dev",
# default schema to write to if one isn't specified in .yml or .sql dbt files
"schema": "a_default_schema_name"
}
)

with DAG(
dag_id="jaffle_shop",
start_date=datetime(2023, 10, 6),
schedule=None,
doc_md=__doc__,
tags=["dbt", "redshift"],
):
DbtTaskGroup(
project_config=ProjectConfig(
dbt_project_path=dbt_project_path
),
execution_config=ExecutionConfig(
dbt_executable_path=dbt_executable_path
),
profile_config=ProfileConfig(
profile_name="jaffle_shop", # the default profile - recommended to be your dbt project name
target_name="cosmos_target", # the default target - recommended to just leave as cosmos_target
profile_mapping=profile_mapping,
)
)
```

This example uses a Redshift Data Warehouse as a target, but you can also configure profiles for other targets (i.e.
Snowflake, BigQuery, etc.). For more information, please review Cosmos Docs [here](https://astronomer.github.io/astronomer-cosmos/profiles/index.html).
After making these changes, you should see the DAG parse like so:

![jaffle_shop_dag.png](https://github.com/astronomer/astronomer-cosmos/blob/main/docs/_static/jaffle_shop_task_group.png)
6 changes: 3 additions & 3 deletions bootstrap/helm/bootstrap/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ dependencies:
version: 9.25.0
- name: aws-load-balancer-controller
repository: https://aws.github.io/eks-charts
version: 1.4.8
version: 1.6.1
- name: aws-ebs-csi-driver
repository: https://kubernetes-sigs.github.io/aws-ebs-csi-driver
version: 2.17.1
Expand All @@ -26,5 +26,5 @@ dependencies:
- name: tigera-operator
repository: https://docs.tigera.io/calico/charts
version: v3.25.0
digest: sha256:1d124ca9acb4e93009dfeb4273d149d075616babbad1fe3e5bb6c88540b5f96d
generated: "2023-03-07T15:21:37.729265+01:00"
digest: sha256:b7ee91be180afabfb812e9c8e7f7bfdfd2a1c4ebef9592ccd37e6eadd65409a2
generated: "2023-10-05T13:27:49.248688-04:00"
4 changes: 2 additions & 2 deletions bootstrap/helm/bootstrap/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ maintainers:
email: [email protected]
- name: David van der Spek
email: [email protected]
version: 0.8.75
version: 0.8.77
dependencies:
- name: external-dns
version: 6.14.1
Expand All @@ -30,7 +30,7 @@ dependencies:
repository: https://kubernetes.github.io/autoscaler
- name: aws-load-balancer-controller
condition: aws-load-balancer-controller.enabled
version: 1.4.8
version: 1.6.1
repository: https://aws.github.io/eks-charts
- name: aws-ebs-csi-driver
condition: aws-ebs-csi-driver.enabled
Expand Down
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion bootstrap/helm/bootstrap/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ aws-load-balancer-controller:
enabled: false
image:
repository: public.ecr.aws/eks/aws-load-balancer-controller # TODO: this should be migrated to our vendored images
tag: v2.4.7
tag: v2.6.1

snapshot-validation-webhook:
enabled: false
Expand Down
2 changes: 1 addition & 1 deletion bootstrap/terraform/aws-bootstrap/deps.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: plural.sh/v1alpha1
kind: Dependencies
metadata:
description: Creates an EKS cluster and prepares it for bootstrapping
version: 0.1.54
version: 0.1.55
spec:
breaking: false
dependencies: []
Expand Down
30 changes: 17 additions & 13 deletions bootstrap/terraform/aws-bootstrap/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,23 @@ module "vpc" {
}

module "cluster" {
source = "github.com/pluralsh/terraform-aws-eks?ref=output-service-cidr"
cluster_name = var.cluster_name
cluster_version = var.kubernetes_version
private_subnets = local.private_subnet_ids
public_subnets = local.public_subnet_ids
worker_private_subnets = local.worker_private_subnet_ids
vpc_id = local.vpc_id
enable_irsa = true
write_kubeconfig = false
create_eks = var.create_cluster
cluster_enabled_log_types = var.cluster_enabled_log_types
cluster_log_retention_in_days = var.cluster_log_retention_in_days
cluster_log_kms_key_id = var.cluster_log_kms_key_id
source = "github.com/pluralsh/terraform-aws-eks?ref=output-service-cidr"
cluster_name = var.cluster_name
cluster_version = var.kubernetes_version
private_subnets = local.private_subnet_ids
public_subnets = local.public_subnet_ids
worker_private_subnets = local.worker_private_subnet_ids
vpc_id = local.vpc_id
enable_irsa = true
write_kubeconfig = false
create_eks = var.create_cluster
cluster_enabled_log_types = var.cluster_enabled_log_types
cluster_log_retention_in_days = var.cluster_log_retention_in_days
cluster_log_kms_key_id = var.cluster_log_kms_key_id
cluster_endpoint_public_access = var.cluster_endpoint_public_access
cluster_endpoint_private_access = var.cluster_endpoint_private_access
cluster_encryption_config = var.cluster_encryption_config
cluster_endpoint_public_access_cidrs = var.cluster_endpoint_public_access_cidrs

node_groups_defaults = {}

Expand Down
28 changes: 28 additions & 0 deletions bootstrap/terraform/aws-bootstrap/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,34 @@ Name for the vpc for the cluster
EOF
}


variable "cluster_endpoint_private_access" {
description = "Indicates whether or not the Amazon EKS private API server endpoint is enabled."
type = bool
default = false
}

variable "cluster_endpoint_public_access" {
description = "Indicates whether or not the Amazon EKS public API server endpoint is enabled."
type = bool
default = true
}

variable "cluster_endpoint_public_access_cidrs" {
description = "List of CIDR blocks which can access the Amazon EKS public API server endpoint."
type = list(string)
default = ["0.0.0.0/0"]
}

variable "cluster_encryption_config" {
description = "Configuration block with encryption configuration for the cluster. See examples/secrets_encryption/main.tf for example format"
type = list(object({
provider_key_arn = string
resources = list(string)
}))
default = []
}

variable "cluster_enabled_log_types" {
default = []
description = "A list of the desired control plane logging to enable. Supported options are: api, audit, authenticator, controllerManager, scheduler. For more information, see Amazon EKS Control Plane Logging documentation (https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html)"
Expand Down
6 changes: 3 additions & 3 deletions mage/helm/mage/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ dependencies:
version: 0.1.5
- name: mageai
repository: https://mage-ai.github.io/helm-charts
version: 0.1.2
digest: sha256:dd45698821c408ea2e7d4092759b3349b81560831eecc5ae62585864c453f6c5
generated: "2023-06-26T13:31:32.353565+02:00"
version: 0.1.4
digest: sha256:85b3a36cafc18f811f99fceabc7ce5ee791d0994a10d03213979321a57a6e1e7
generated: "2023-10-04T22:26:18.459861143-07:00"
5 changes: 3 additions & 2 deletions mage/helm/mage/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@ name: mage
description: helm chart for mageai
type: application
version: 0.1.10
appVersion: 0.8.102
appVersion: 0.9.31
dependencies:
- name: postgres
version: 0.1.5
repository: https://pluralsh.github.io/module-library
- name: mageai
version: 0.1.2
version: 0.1.4
repository: https://mage-ai.github.io/helm-charts
Binary file added mage/helm/mage/charts/mageai-0.1.4.tgz
Binary file not shown.
4 changes: 2 additions & 2 deletions mage/helm/mage/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ mageai:
cert-manager.io/cluster-issuer: letsencrypt-prod
image:
repository: dkr.plural.sh/mage/mageai/mageai
tag: 0.8.102
tag: 0.9.31

volumes:
- name: mage-fs
persistentVolumeClaim:
claimName: mageai-pvc
claimName: mageai-pvc
3 changes: 2 additions & 1 deletion mage/repository.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ homepage: https://www.mage.ai/
gitUrl: https://github.com/mage-ai/mage-ai
contributors:
- [email protected]
- [email protected]
- [email protected]
- [email protected]
8 changes: 4 additions & 4 deletions ray/helm/kuberay-operator/Chart.lock
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
dependencies:
- name: kuberay-operator
repository: https://kevin85421.github.io/kuberay
version: 0.3.0
digest: sha256:abb43e05246ec58ef6137df26d2d1692f59066058695b84d17986e5622d82690
generated: "2022-09-21T15:06:07.583379+02:00"
repository: https://ray-project.github.io/kuberay-helm/
version: 0.6.0
digest: sha256:68767f4de687430221785f64d5b752285141d2192cae4c91a55b13d40106d063
generated: "2023-10-05T14:25:03.985572273-07:00"
Loading

0 comments on commit 26f0663

Please sign in to comment.