A curated list of awesome open source tools and commercial products that will help you manage machine learning and data-science workflows and pipelines π
- Argo: Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
- Airflow: A platform created by the community to programmatically author, schedule and monitor workflows.
- Beam: A unified programming model for Batch and Streaming.
- ClearML: Auto-Magical CI/CD to streamline your ML workflow.
- CML: Open-source library for implementing CI/CD in machine learning projects.
- Couler: Unified interface for constructing and managing workflows on different workflow engines.
- Dagster: A data orchestrator for machine learning, analytics, and ETL.
- Flyte: Easy to create concurrent, scalable, and maintainable workflows for machine learning.
- Kale: Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
- Kedro: Library that implements software engineering best-practice for data and ML pipelines.
- Kubeflow Pipelines: Machine learning pipelines for Kubeflow.
- Luigi: Python module that helps you build complex pipelines of batch jobs.
- Metaflow: Human-friendly lib that helps scientists and engineers build and manage data science projects.
- MLRun: Generic mechanism for data scientists to build, run, and monitor ML tasks and pipelines.
- Orchest: Build data pipelines, the easy way.
- Ploomber: Write maintainable, production-ready pipelines. Develop locally, deploy to the cloud.
- Polyaxon Automation: Container-native engine platform for running machine learning pipelines
- Prefect: A workflow management system, designed for modern infrastructure.
- ZenML: An extensible open-source MLOps framework to create reproducible pipelines.