These are instructions for running data ingestion tasks through Apache Airflow as an orchestrator, to improve transparency and traceability.
- Airflow: Rory Sie ([email protected])
- Event Hubs producer/consumer: Bas Jongewaard ([email protected])
- Apache Airflow https://airflow.apache.org/docs/apache-airflow/stable/start.html
pip install apache-airflow
- If using a different folder than
~/airflow/dags
for your DAGs, follow these steps to add several DAG folders: https://medium.com/@xnuinside/how-to-load-use-several-dag-folders-airflow-dagbags-b93e4ef4663c (or edit and move misc\add_dagbags.py to ~/airflow/dags and runpython3 ~/airflow/dags/add_dagbags.py
)
- Start the Gremlin server:
- Open a cmd and type:
cd C:\Program Files\Azure Cosmos DB Emulator
Microsoft.Azure.Cosmos.Emulator.exe /EnableGremlinEndpoint
cd /d C:\sdk\apache-tinkerpop-gremlin-console-3.3.4-bin\apache-tinkerpop-gremlin-console-3.3.4
bin\gremlin.bat
:remote connect tinkerpop.server conf/remote-localcompute.yaml
:remote console
- Ensure you have a DAG specified, for instance dags\airflow-dag.py
- Start the webserver in a terminal:
$ airflow webserver -p 8080
- Select the DAG and click Trigger ('play' button). Now watch you workflow run.