Skip to content

Commit

Permalink
Replace architecture diagram of Airflow with diagrams-generated one (a…
Browse files Browse the repository at this point in the history
…pache#36035)

The architecture diagram of Airflow has been long time outdated.

This is an attempt to generate it using generated diagrams using
Python's diagrams library (already used by some tools in our
ecosystem).
  • Loading branch information
potiuk authored Dec 5, 2023
1 parent acf91af commit 5dfee8b
Show file tree
Hide file tree
Showing 14 changed files with 241 additions and 55 deletions.
7 changes: 7 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,13 @@ repos:
files: ^Dockerfile$
pass_filenames: false
additional_dependencies: ['rich>=12.4.4']
- id: generate-airflow-diagrams
name: Generate airflow diagrams
entry: ./scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
language: python
files: ^scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
pass_filenames: false
additional_dependencies: ['rich>=12.4.4', "diagrams>=0.23.4"]
- id: update-supported-versions
name: Updates supported versions in documentation
entry: ./scripts/ci/pre_commit/pre_commit_supported_versions.py
Expand Down
2 changes: 2 additions & 0 deletions STATIC_CODE_CHECKS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,8 @@ require Breeze Docker image to be built locally.
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| flynt | Run flynt string format converter for Python | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| generate-airflow-diagrams | Generate airflow diagrams | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| generate-pypi-readme | Generate PyPI README | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| identity | Print input to the static check hooks for troubleshooting | |
Expand Down
1 change: 1 addition & 0 deletions dev/breeze/src/airflow_breeze/pre_commit_ids.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
"end-of-file-fixer",
"fix-encoding-pragma",
"flynt",
"generate-airflow-diagrams",
"generate-pypi-readme",
"identity",
"insert-license",
Expand Down
21 changes: 20 additions & 1 deletion docs/apache-airflow/core-concepts/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,37 @@ An Airflow installation generally consists of the following components:

* An :doc:`executor <executor/index>`, which handles running tasks. In the default Airflow installation, this runs everything *inside* the scheduler, but most production-suitable executors actually push task execution out to *workers*.

* A *triggerer*, which executes deferred tasks - executed in an async-io event loop.

* A *webserver*, which presents a handy user interface to inspect, trigger and debug the behaviour of DAGs and tasks.

* A folder of *DAG files*, read by the scheduler and executor (and any workers the executor has)

* A *metadata database*, used by the scheduler, executor and webserver to store state.

.. image:: ../img/arch-diag-basic.png

Basic airflow architecture
--------------------------

This is the basic architecture of Airflow that you'll see in simple installations:

.. image:: ../img/diagram_basic_airflow_architecture.png

Most executors will generally also introduce other components to let them talk to their workers - like a task queue - but you can still think of the executor and its workers as a single logical component in Airflow overall, handling the actual task execution.

Airflow itself is agnostic to what you're running - it will happily orchestrate and run anything, either with high-level support from one of our providers, or directly as a command using the shell or Python :doc:`operators`.

Separate DAG processing architecture
------------------------------------

In a more complex installation where security and isolation are important, you'll also see the standalone **dag file processor** component that allows to separate scheduler from accessing DAG file. This is suitable if the
deployment focus is on isolation between parsed tasks. While Airflow does not yet support full multi-tenant features, it can be used to make sure that DAG-author provided code is never executed in the context of the scheduler.

.. image:: ../img/diagram_dag_processor_airflow_architecture.png

You can read more about the different types of users and how they interact with Airflow and how the
security model of Airflow access look like in the :doc:`/security/security_model`

Workloads
---------

Expand Down
Binary file removed docs/apache-airflow/img/arch-diag-basic.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 17 additions & 5 deletions docs/apache-airflow/security/security_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@
specific language governing permissions and limitations
under the License.
.. contents::
:local:

Airflow Security Model
======================

Expand All @@ -32,8 +29,23 @@ security reports are handled by the security team of Airflow, head to
Airflow security model - user types
-----------------------------------

The Airflow security model involves different types of users with
varying access and capabilities:
The Airflow security model involves different types of users with varying access and capabilities:

While - in smaller installations - all the actions related to Airflow can be performed by a single user,
in larger installations it is apparent that there different responsibilities, roles and
capabilities that need to be separated.

This is why Airflow has the following user types:

* Deployment Managers - overall responsible for the Airflow installation, security and configuration
* Authenticated UI users - users that can access Airflow UI and API and interact with it
* DAG Authors - responsible for creating DAGs and submitting them to Airflow

You can see more on how the user types influence Airflow's architecture in :doc:`/core-concepts/overview`,
including, seeing the diagrams of less and more complex deployments.




Deployment Managers
...................
Expand Down
Loading

0 comments on commit 5dfee8b

Please sign in to comment.