Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow Tutorial and Workflow Development #9

Open
13 tasks
Jan-Willem opened this issue Dec 3, 2024 · 0 comments
Open
13 tasks

Airflow Tutorial and Workflow Development #9

Jan-Willem opened this issue Dec 3, 2024 · 0 comments

Comments

@Jan-Willem
Copy link
Member

Objective

Familiarize the team with Apache Airflow by completing tutorials, developing a simple Airflow workflow based on the structure outlined in Figure 1 of the An example RADPS Workflow Decomposition, testing reliability and fault tolerance, and summarizing key learnings in a presentation.


Requirements

  1. Explore and decide on suitable Airflow tutorials from the official Airflow documentation or community resources.
  2. Complete the selected tutorials to understand key concepts, including:
    • Tasks and DAGs (Directed Acyclic Graphs)
    • Task dependencies
    • Scheduling
    • Error handling and retries
  3. Develop an Airflow DAG replicating the structure of Figure 1 in the An example RADPS Workflow Decomposition.
    • Tasks can be no-ops or simple sleep operations.
    • The DAG should be run locally to verify execution.
    • Include at least one conditional loop or branching task to demonstrate dynamic behavior.
  4. Implement testing for:
    • Reliability: Ensure the DAG is robust by testing its ability to recover from failures (e.g., retries, alerting).
    • Fault Tolerance: Introduce simulated failures (e.g., task failures) to verify how the DAG handles and recovers from errors.
  5. Create slides summarizing:
    • Key learnings from the tutorials.
    • Implementation details of the DAG, including how reliability and fault tolerance were incorporated.
    • How conditional loops or task branching were implemented and their impact.

Definition of Done

Tutorial Completion

  • Airflow tutorials are identified, agreed upon, and completed by all team members.
  • Key learnings and notes from the tutorials are shared with the team.

DAG Development

  • An Airflow DAG mimicking the structure in Figure 1 is created.
  • Tasks in the DAG use placeholders (e.g., no-op or sleep tasks) to demonstrate flow structure.
  • At least one conditional loop or branching task is added to the DAG to illustrate dynamic behavior.
  • The DAG is successfully executed in a local environment.
  • The DAG includes basic reliability and fault tolerance features, such as retries and failure notifications.
  • Simulated task failures are tested, and the DAG handles them as expected (e.g., retrying or continuing on error).

Presentation

  • Slides are created summarizing:
    • Core concepts learned during the tutorials.
    • The design and implementation of the DAG, including reliability and fault tolerance strategies.
    • Insights on using conditional loops or task branching in Airflow.

Key Decision Points

  1. Tutorial Selection:
    • Choose relevant tutorials from Airflow's documentation or community guides.
  2. Figure 1 Workflow Interpretation:
    • Agree on the structure and task definitions for the DAG based on Figure 1.
  3. Conditional Task Design:
    • Decide how and where conditional tasks or task branching should be integrated into the DAG.
  4. Testing Reliability and Fault Tolerance:
    • Determine how to test and incorporate failure recovery (e.g., retries) and fault tolerance (e.g., notifications) in the DAG.

Artifacts

  • Notes and resources from completed tutorials.
  • A functional Airflow DAG script with reliability and fault tolerance features.
  • Logs and outputs from local DAG execution, including tests for failure handling.
  • Slide deck summarizing key learnings and implementation details.

Success Criteria

  • The team has a foundational understanding of Airflow's core features, including reliability and fault tolerance.
  • A working Airflow DAG, including conditional loops or task branching and failure handling, is successfully run locally.
  • A comprehensive slide deck effectively communicates the learnings and DAG design.
  • The team is equipped to design and implement more complex workflows in the future, with robust testing for reliability and fault tolerance.

Potential Challenges

  • Identifying tutorials that are both comprehensive and time-efficient.
  • Correctly interpreting and implementing Figure 1's DAG structure.
  • Troubleshooting environment or execution issues during local DAG testing.
  • Implementing conditional task branching correctly and ensuring its functionality in the DAG.
  • Simulating task failures and ensuring that the DAG can recover as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants