diff --git a/contents/core/ops/ops.qmd b/contents/core/ops/ops.qmd index 1018d76a..14b49cce 100644 --- a/contents/core/ops/ops.qmd +++ b/contents/core/ops/ops.qmd @@ -10,7 +10,7 @@ Resources: [Slides](#sec-embedded-aiops-resource), [Videos](#sec-embedded-aiops- ![_DALLĀ·E 3 Prompt: Create a detailed, wide rectangular illustration of an AI workflow. The image should showcase the process across six stages, with a flow from left to right: 1. Data collection, with diverse individuals of different genders and descents using a variety of devices like laptops, smartphones, and sensors to gather data. 2. Data processing, displaying a data center with active servers and databases with glowing lights. 3. Model training, represented by a computer screen with code, neural network diagrams, and progress indicators. 4. Model evaluation, featuring people examining data analytics on large monitors. 5. Deployment, where the AI is integrated into robotics, mobile apps, and industrial equipment. 6. Monitoring, showing professionals tracking AI performance metrics on dashboards to check for accuracy and concept drift over time. Each stage should be distinctly marked and the style should be clean, sleek, and modern with a dynamic and informative color scheme._](images/png/cover_ml_ops.png) -This chapter explores the practices and architectures needed to effectively develop, deploy, and manage ML models across their entire lifecycle. We examine the various phases of the ML process, including data collection, model training, evaluation, deployment, and monitoring. The importance of automation, collaboration, and continuous improvement is also something we discuss. We contrast different environments for ML model deployment, from cloud servers to embedded edge devices, and analyze their distinct constraints. We demonstrate how to tailor ML system design and operations through concrete examples for reliable and optimized model performance in any target environment. The goal is to provide readers with a comprehensive understanding of ML model management so they can successfully build and run ML applications that sustainably deliver value. +In this chapter, we will dive into the practices and frameworks needed to successfully develop, deploy, and manage machine learning models from start to finish. You will learn about each stage in the ML workflow, from data collection and model training to evaluation, deployment, and ongoing monitoring. We will discuss the role of automation, collaboration, and continuous improvement, highlighting why they are essential for keeping ML systems efficient and reliable. We will also explore different deployment environments, from powerful cloud servers to resource-limited edge devices, looking at the unique challenges each presents. Through concrete examples, you will see how to design and operate ML systems that deliver consistent, reliable performance, no matter where they are deployed. By the end of this chapter, you will have a solid grasp of ML model management and be ready to build and maintain ML applications that provide lasting value. ::: {.callout-tip} @@ -111,7 +111,7 @@ Learn more about ML Lifecycles through a case study featuring speech recognition ## Key Components of MLOps -In this chapter, we will provide an overview of the core components of MLOps, an emerging set of practices that enables robust delivery and lifecycle management of ML models in production. While some MLOps elements like automation and monitoring were covered in previous chapters, we will integrate them into a framework and expand on additional capabilities like governance. Additionally, we will describe and link to popular tools used within each component, such as [LabelStudio](https://labelstud.io/) for data labeling. By the end, we hope that you will understand the end-to-end MLOps methodology that takes models from ideation to sustainable value creation within organizations. +The core components of MLOps form a comprehensive framework that supports the end-to-end lifecycle of ML models in production, from initial development to deployment and ongoing management. In this section, we build on topics like automation and monitoring from previous chapters, integrating them into a broader framework while also introducing additional key practices like governance. Each component contributes to smoother, more streamlined ML operations, with popular tools helping teams tackle specific tasks within this ecosystem. Together, these elements make MLOps a robust approach to managing ML models and creating long-term value within organizations. @fig-ops-layers illustrates the comprehensive MLOps system stack. It shows the various layers involved in machine learning operations. At the top of the stack are ML Models/Applications, such as BERT, followed by ML Frameworks/Platforms like PyTorch. The core MLOps layer, labeled as Model Orchestration, encompasses several key components: Data Management, CI/CD, Model Training, Model Evaluation, Deployment, and Model Serving. Underpinning the MLOps layer is the Infrastructure layer, represented by technologies such as Kubernetes. This layer manages aspects such as Job Scheduling, Resource Management, Capacity Management, and Monitoring, among others. Holding it all together is the Hardware layer, which provides the necessary computational resources for ML operations. @@ -121,19 +121,15 @@ This layered approach in @fig-ops-layers demonstrates how MLOps integrates vario ### Data Management {#sec-ops-data-mgmt} -Robust data management and data engineering actively empower successful [MLOps](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) implementations. Teams properly ingest, store, and prepare raw data from sensors, databases, apps, and other systems for model training and deployment. +Data in its raw form, whether collected from sensors, databases, apps, or other systems, often requires significant preparation before it can be used for training or inference. Issues like inconsistent formats, missing values, and evolving labeling conventions can lead to inefficiencies and poor model performance if not systematically addressed. Robust data management practices ensure that data remains high quality, traceable, and readily accessible throughout the ML lifecycle, forming the foundation of scalable machine learning systems. -Teams actively track changes to datasets over time using version control with [Git](https://git-scm.com/) and tools like [GitHub](https://github.com/) or [GitLab](https://about.gitlab.com/). Data scientists collaborate on curating datasets by merging changes from multiple contributors. Teams can review or roll back each iteration of a dataset if needed. +One key aspect of data management is version control. Tools like [Git](https://git-scm.com/), [GitHub](https://github.com/), and [GitLab](https://about.gitlab.com/) enable teams to track changes to datasets, collaborate on curation, and revert to earlier versions when necessary. Alongside versioning, annotating and labeling datasets is crucial for supervised learning tasks. Software like [LabelStudio](https://labelstud.io/) helps distributed teams tag data consistently across large-scale datasets while maintaining access to earlier versions as labeling conventions evolve. These practices not only enhance collaboration but also ensure that models are trained on reliable, well-organized data. -Teams meticulously label and annotate data using labeling software like [LabelStudio](https://labelstud.io/), which enables distributed teams to work on tagging datasets together. As the target variables and labeling conventions evolve, teams maintain accessibility to earlier versions. +Once prepared, datasets are typically stored on scalable cloud storage solutions like [Amazon S3](https://aws.amazon.com/s3/) or [Google Cloud Storage](https://cloud.google.com/storage). These services provide versioning, resilience, and granular access controls, safeguarding sensitive data while maintaining flexibility for analysis and modeling. To streamline the transition from raw data to analysis-ready formats, teams build automated pipelines using tools such as [Prefect](https://www.prefect.io/), [Apache Airflow](https://airflow.apache.org/), and [dbt](https://www.getdbt.com/). These pipelines automate tasks like data extraction, cleaning, deduplication, and transformation, reducing manual overhead and improving efficiency. -Teams store the raw dataset and all derived assets on cloud storage services like [Amazon S3](https://aws.amazon.com/s3/) or [Google Cloud Storage](https://cloud.google.com/storage). These services provide scalable, resilient storage with versioning capabilities. Teams can set granular access permissions. +For example, a data pipeline might ingest information from [PostgreSQL](https://www.postgresql.org/) databases, REST APIs, and CSV files stored in S3, applying transformations to produce clean, aggregated datasets. The output can be stored in feature stores like [Tecton](https://www.tecton.ai/) or [Feast](https://feast.dev/), which provide low-latency access for both training and predictions. In an industrial predictive maintenance scenario, sensor data could be processed alongside maintenance records, resulting in enriched datasets stored in Feast for models to access the latest information seamlessly. -Robust data pipelines created by teams automate raw data extraction, joining, cleansing, and transformation into analysis-ready datasets. [Prefect](https://www.prefect.io/), [Apache Airflow](https://airflow.apache.org/), and [dbt](https://www.getdbt.com/) are workflow orchestrators that allow engineers to develop flexible, reusable data processing pipelines. - -For instance, a pipeline may ingest data from [PostgreSQL](https://www.postgresql.org/) databases, REST APIs, and CSVs stored on S3. It can filter, deduplicate, and aggregate the data, handle errors, and save the output to S3. The pipeline can also push the transformed data into a feature store like [Tecton](https://www.tecton.ai/) or [Feast](https://feast.dev/) for low-latency access. - -In an industrial predictive maintenance use case, sensor data is ingested from devices into S3. A perfect pipeline processes the sensor data, joining it with maintenance records. The enriched dataset is stored in Feast so models can easily retrieve the latest data for training and predictions. +By integrating version control, annotation tools, storage solutions, and automated pipelines, data management becomes a critical enabler for effective [MLOps](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning). These practices ensure that data is not only clean and accessible but also consistently aligned with evolving project needs, allowing machine learning systems to deliver reliable and scalable performance in production environments. @vid-datapipe below is a short overview of data pipelines. @@ -161,55 +157,43 @@ CI/CD pipelines empower teams to iterate and deliver ML models rapidly by connec ### Model Training -In the model training phase, data scientists actively experiment with different ML architectures and algorithms to create optimized models that extract insights and patterns from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible. - -Modern ML frameworks like [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/) and [Keras](https://keras.io/) provide pre-built components that simplify designing neural networks and other model architectures. Data scientists leverage built-in modules for layers, activations, losses, etc., and high-level APIs like Keras to focus more on model architecture. - -MLOps enables teams to package model training code into reusable, tracked scripts and notebooks. As models are developed, capabilities like [hyperparameter tuning](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview), [neural architecture search](https://arxiv.org/abs/1808.05377) and [automatic feature selection](https://scikit-learn.org/stable/modules/feature_selection.html) rapidly iterate to find the best-performing configurations. +Model training is a critical phase where data scientists experiment with various ML architectures and algorithms to optimize models that extract insights from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible. Modern ML frameworks like [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), and [Keras](https://keras.io/) provide pre-built components that simplify designing neural networks and other model architectures. These tools allow data scientists to focus on creating high-performing models using built-in modules for layers, activations, and loss functions. -Teams use Git to version control training code and host it in repositories like GitHub to track changes over time. This allows seamless collaboration between data scientists. +To make the training process efficient and reproducible, MLOps introduces best practices such as version-controlling training code using Git and hosting it in repositories like GitHub. Reproducible environments, often managed through interactive tools like [Jupyter](https://jupyter.org/) notebooks, allow teams to bundle data ingestion, preprocessing, model development, and evaluation in a single document. These notebooks are not only version-controlled but can also be integrated into automated pipelines for continuous retraining. -Notebooks like [Jupyter](https://jupyter.org/) create an excellent interactive model development environment. The notebooks contain data ingestion, preprocessing, model declaration, training loop, evaluation, and export code in one reproducible document. +Automation plays a significant role in standardizing training workflows. Capabilities such as [hyperparameter tuning](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview), [neural architecture search](https://arxiv.org/abs/1808.05377), and [automatic feature selection](https://scikit-learn.org/stable/modules/feature_selection.html) are commonly integrated into MLOps pipelines to iterate rapidly and find optimal configurations. CI/CD pipelines orchestrate training workflows by automating tasks like data preprocessing, model training, evaluation, and registration. For example, a Jenkins pipeline can trigger a Python script to retrain a TensorFlow model, validate its performance against pre-defined metrics, and deploy it if thresholds are met. -Finally, teams orchestrate model training as part of a CI/CD pipeline for automation. For instance, a Jenkins pipeline can trigger a Python script to load new training data, retrain a TensorFlow classifier, evaluate model metrics, and automatically register the model if performance thresholds are met. +Cloud-managed training services have revolutionized the accessibility of high-performance hardware for training models. These services provide on-demand access to GPU-accelerated infrastructure, making advanced training feasible even for small teams. Depending on the provider, developers may manage the training workflow themselves or rely on fully managed options like [Vertex AI Fine Tuning](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models), which can automatically finetune a base model using a labeled dataset. However, it is important to note that GPU hardware demand often exceeds supply, and availability may vary based on region or contractual agreements, posing potential bottlenecks for teams relying on cloud services. An example workflow has a data scientist using a PyTorch notebook to develop a CNN model for image classification. The [fastai](https://www.fast.ai/) library provides high-level APIs to simplify training CNNs on image datasets. The notebook trains the model on sample data, evaluates accuracy metrics, and tunes hyperparameters like learning rate and layers to optimize performance. This reproducible notebook is version-controlled and integrated into a retraining pipeline. -Automating and standardizing model training empowers teams to accelerate experimentation and achieve the rigor needed to produce ML systems. +By automating and standardizing model training, leveraging managed cloud services, and integrating modern frameworks, teams can accelerate experimentation and build robust, production-ready ML models. ### Model Evaluation -Before deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps introduces best practices around model validation, auditing, and [canary testing](https://martinfowler.com/bliki/CanaryRelease.html). +Before deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps provides best practices for model validation, auditing, and controlled testing methods to minimize risks during deployment. -Teams typically evaluate models against holdout [test datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) that are not used during training. The test data originates from the same distribution as production data. Teams calculate metrics like [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision), [AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve), [precision](https://en.wikipedia.org/wiki/Precision_and_recall), [recall](https://en.wikipedia.org/wiki/Precision_and_recall), and [F1 score](https://en.wikipedia.org/wiki/F1_score). +The evaluation process begins with testing models against holdout [test datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) that are independent of the training data but originate from the same distribution as production data. Key metrics such as [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision), [AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve), [precision](https://en.wikipedia.org/wiki/Precision_and_recall), [recall](https://en.wikipedia.org/wiki/Precision_and_recall), and [F1 score](https://en.wikipedia.org/wiki/F1_score) are calculated to quantify model performance. Tracking these metrics over time helps teams identify trends and potential degradation in model behavior, particularly when evaluation data comes from live production streams. This is vital for detecting [data drift](https://www.ibm.com/cloud/learn/data-drift), where changes in input data distributions can erode model accuracy. -Teams also track the same metrics over time against test data samples. If evaluation data comes from live production streams, this catches [data drifts](https://www.ibm.com/cloud/learn/data-drift) that degrade model performance over time. +To validate real-world performance, [canary testing](https://martinfowler.com/bliki/CanaryRelease.html) deploys the model to a small subset of users. This gradual rollout allows teams to monitor metrics in a live environment and catch potential issues before full-scale deployment. By incrementally increasing traffic to the new model, teams can confidently evaluate its impact on end-user experience. For instance, a retailer might test a personalized recommendation model by comparing its accuracy and diversity metrics against historical data. During the testing phase, the team tracks live performance metrics and identifies a slight accuracy decline over two weeks. To ensure stability, the model is initially deployed to 5% of web traffic, monitored for potential issues, and only rolled out widely after proving robust in production. -Human oversight for model release remains important. Data scientists review performance across key segments and slices. Error analysis helps identify model weaknesses to guide enhancement. Teams apply [fairness](https://developers.google.com/machine-learning/fairness-overview) and [bias detection](https://developers.google.com/machine-learning/fairness-overview) techniques. +ML models deployed to the cloud benefit from constant internet connectivity and the ability to log every request and response. This makes it feasible to replay or generate synthetic requests for comparing different models and versions. Some providers offer tools that automate parts of the evaluation process, such as tracking hyperparameter experiments or comparing model runs. For instance, platforms like [Weights and Biases](https://wandb.ai/) streamline this process by automating experiment tracking and generating artifacts from training runs. -Canary testing releases a model to a small subset of users to evaluate real-world performance before wide deployment. Teams incrementally route traffic to the canary release while monitoring for issues. - -For example, a retailer evaluates a personalized product recommendation model against historical test data, reviewing accuracy and diversity metrics. Teams also calculate metrics on live customer data over time, detecting decreased accuracy over the last 2 weeks. Before full rollout, the new model is released to 5% of web traffic to ensure no degradation. - -Automating evaluation and canary releases reduces deployment risks. However, human review still needs to be more critical to assess less quantifiable dynamics of model behavior. Rigorous pre-deployment validation provides confidence in putting models into production. +Automating evaluation and testing processes, combined with careful canary testing, reduces deployment risks. While automated evaluation processes catch many issues, human oversight remains essential for reviewing performance across specific data segments and identifying subtle weaknesses. This combination of rigorous pre-deployment validation and real-world testing provides teams with confidence when putting models into production. ### Model Deployment Teams need to properly package, test, and track ML models to reliably deploy them to production. MLOps introduces frameworks and procedures for actively versioning, deploying, monitoring, and updating models in sustainable ways. -Teams containerize models using [Docker](https://www.docker.com/), which bundles code, libraries, and dependencies into a standardized unit. Containers enable smooth portability across environments. - -Frameworks like [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and [BentoML](https://bentoml.org/) help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling, and monitoring. +One common approach to deployment involves containerizing models using tools like [Docker](https://www.docker.com/), which package code, libraries, and dependencies into standardized units. Containers ensure smooth portability across environments, making deployment consistent and predictable. Frameworks like [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and [BentoML](https://bentoml.org/) help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling, and monitoring. -Teams first deploy updated models to staging or QA environments for testing before full production rollout. Shadow or canary deployments route a sample of traffic to test model variants. Teams incrementally increase access to new models. +Before full-scale rollout, teams deploy updated models to staging or QA environments to rigorously test performance. Techniques such as shadow or canary deployments are used to validate new models incrementally. For instance, canary deployments route a small percentage of traffic to the new model while closely monitoring performance. If no issues arise, traffic to the new model gradually increases. Robust rollback procedures are essential to handle unexpected issues, reverting systems to the previous stable model version to ensure minimal disruption. Integration with CI/CD pipelines further automates the deployment and rollback process, enabling efficient iteration cycles. -Teams build robust rollback procedures in case issues emerge. Rollbacks revert to the last known good model version. Integration with CI/CD pipelines simplifies redeployment if needed. +To maintain lineage and auditability, teams track model artifacts, including scripts, weights, logs, and metrics, using tools like [MLflow](https://mlflow.org/). Model registries, such as [Vertex AI's model registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction), act as centralized repositories for storing and managing trained models. These registries not only facilitate version comparisons but also often include access to base models, which may be open source, proprietary, or a hybrid (e.g., [LLAMA](https://ai.meta.com/llama/)). Deploying a model from the registry to an inference endpoint is streamlined, handling resource provisioning, model weight downloads, and hosting. -Teams carefully track model artifacts, such as scripts, weights, logs, and metrics, for each version with ML metadata tools like [MLflow](https://mlflow.org/). This maintains lineage and auditability. +Inference endpoints typically expose the deployed model via REST APIs for real-time predictions. Depending on performance requirements, teams can configure resources, such as GPU accelerators, to meet latency and throughput targets. Some providers also offer flexible options like serverless or batch inference, eliminating the need for persistent endpoints and enabling cost-efficient, scalable deployments. For example, [AWS SageMaker Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) supports such configurations. -For example, a retailer containerizes a product recommendation model in [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and deploys it to a [Kubernetes](https://kubernetes.io/) staging cluster. After monitoring and approving performance on sample traffic, Kubernetes shifts 10% of production traffic to the new model. If no issues are detected after a few days, the new model takes over 100% of traffic. However, teams should keep the previous version accessible for rollback if needed. - -Model deployment processes enable teams to make ML systems resilient in production by accounting for all transition states. +By leveraging these tools and practices, teams can deploy ML models resiliently, ensuring smooth transitions between versions, maintaining production stability, and optimizing performance across diverse use cases. ### Model Serving @@ -248,7 +232,7 @@ Containers and orchestrators like Docker and Kubernetes allow teams to package m By leveraging cloud elasticity, teams scale resources up and down to meet spikes in workloads like hyperparameter tuning jobs or spikes in prediction requests. [Auto-scaling](https://aws.amazon.com/autoscaling/) enables optimized cost efficiency. -Infrastructure spans on-prem, cloud, and edge devices. A robust technology stack provides flexibility and resilience. Monitoring tools allow teams to observe resource utilization. +Infrastructure spans on-premises (on-prem), cloud, and edge devices. A robust technology stack provides flexibility and resilience. Monitoring tools allow teams to observe resource utilization. For example, a Terraform config may deploy a GCP Kubernetes cluster to host trained TensorFlow models exposed as prediction microservices. The cluster scales up pods to handle increased traffic. CI/CD integration seamlessly rolls out new model containers. @@ -410,7 +394,7 @@ Although financial debt is a good metaphor for understanding tradeoffs, it diffe The [Hidden Technical Debt of Machine Learning Systems](https://papers.nips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) paper spreads awareness of the nuances of ML system-specific tech debt. It encourages additional development in the broad area of maintainable ML. -## Roles and Responsibilities +## Roles and Responsibilities {#sec-roles-and_resp-ops} Given the vastness of MLOps, successfully implementing ML systems requires diverse skills and close collaboration between people with different areas of expertise. While data scientists build the core ML models, it takes cross-functional teamwork to successfully deploy these models into production environments and enable them to deliver sustainable business value. @@ -464,7 +448,7 @@ The ML engineering team enables data science models to progress smoothly into su ### DevOps Engineers -DevOps engineers enable MLOps by building and managing the underlying infrastructure for developing, deploying, and monitoring ML models. They provide the cloud architecture and automation pipelines. Their main responsibilities include: +DevOps engineers enable MLOps by building and managing the underlying infrastructure for developing, deploying, and monitoring ML models. As a specialized branch of software engineering, DevOps focuses on creating automation pipelines, cloud architecture, and operational frameworks. Their main responsibilities include: * Provisioning and managing cloud infrastructure for ML workflows using IaC tools like Terraform, Docker, and Kubernetes. * Developing CI/CD pipelines for model retraining, validation, and deployment. Integrating ML tools into the pipeline, such as MLflow and Kubeflow. @@ -492,56 +476,10 @@ For example, a project manager would create a project plan for developing and en Skilled project managers enable MLOps teams to work synergistically to rapidly deliver maximum business value from ML investments. Their leadership and organization align with diverse teams. -## Embedded System Challenges - -Building on our discussion of [On-device Learning](../ondevice_learning/ondevice_learning.qmd) in the previous chapter, we now turn our attention to the broader context of embedded systems in MLOps. The unique constraints and requirements of embedded environments significantly impact the implementation of machine learning models and operations. To set the stage for the specific challenges that emerge with embedded MLOps, it is important to first review the general challenges associated with embedded systems. This overview will provide a foundation for understanding how these constraints intersect with and shape the practices of MLOps in resource-limited, edge computing scenarios. - -### Limited Compute Resources - -Embedded devices like microcontrollers and mobile phones have much more constrained computing power than data center machines or GPUs. A typical microcontroller may have only KB of RAM, MHz CPU speed, and no GPU. For example, a microcontroller in a smartwatch may only have a 32-bit processor running at 120MHz with 320KB of RAM [@stm2021l4]. This allows simple ML models like small linear regressions or random forests, but more complex deep neural networks would be infeasible. Strategies to mitigate this include quantization, pruning, efficient model architectures, and offloading certain computations to the cloud when connectivity allows. - -### Constrained Memory - -Storing large ML models and datasets directly on embedded devices is often infeasible with limited memory. For example, a deep neural network model can easily take hundreds of MB, which exceeds the storage capacity of many embedded systems. Consider this example. A wildlife camera that captures images to detect animals may have only a 2GB memory card. More is needed to store a deep learning model for image classification that is often hundreds of MB in size. Consequently, this requires optimization of memory usage through weights compression, lower-precision numerics, and streaming inference pipelines. - -### Intermittent Connectivity - -Many embedded devices operate in remote environments without reliable internet connectivity. We must rely on something other than constant cloud access for convenient retraining, monitoring, and deployment. Instead, we need smart scheduling and caching strategies to optimize for intermittent connections. For example, a model predicting crop yield on a remote farm may need to make predictions daily but only have connectivity to the cloud once a week when the farmer drives into town. The model needs to operate independently in between connections. - -### Power Limitations - -Embedded devices like phones, wearables, and remote sensors are battery-powered. Continual inference and communication can quickly drain those batteries, limiting functionality. For example, a smart collar tagging endangered animals runs on a small battery. Continuously running a GPS tracking model would drain the battery within days. The collar has to schedule when to activate the model carefully. Thus, embedded ML has to manage tasks carefully to conserve power. Techniques include optimized hardware accelerators, prediction caching, and adaptive model execution. - -### Fleet Management - -For mass-produced embedded devices, millions of units can be deployed in the field to orchestrate updates. Hypothetically, updating a fraud detection model on 100 million (future smart) credit cards requires securely pushing updates to each distributed device rather than a centralized data center. Such a distributed scale makes fleet-wide management much harder than a centralized server cluster. It requires intelligent protocols for over-the-air updates, handling connectivity issues, and monitoring resource constraints across devices. - -### On-Device Data Collection - -Collecting useful training data requires engineering both the sensors on the device and the software pipelines. This is unlike servers, where we can pull data from external sources. Challenges include handling sensor noise. Sensors on an industrial machine detect vibrations and temperature to predict maintenance needs. This requires tuning the sensors and sampling rates to capture useful data. - -### Device-Specific Personalization - -A smart speaker learns an individual user's voice patterns and speech cadence to improve recognition accuracy while protecting privacy. Adapting ML models to specific devices and users is important, but this poses privacy challenges. On-device learning allows personalization without transmitting as much private data. However, balancing model improvement, privacy preservation, and constraints requires novel techniques. - -### Safety Considerations - -If extremely large embedded ML in systems like self-driving vehicles is not engineered carefully, there are serious safety risks. To ensure safe operation before deployment, self-driving cars must undergo extensive track testing in simulated rain, snow, and obstacle scenarios. This requires extensive validation, fail-safes, simulators, and standards compliance before deployment. - -### Diverse Hardware Targets - -There is a diverse range of embedded processors, including ARM, x86, specialized AI accelerators, FPGAs, etc. Supporting this heterogeneity makes deployment challenging. We need strategies like standardized frameworks, extensive testing, and model tuning for each platform. For example, an object detection model needs efficient implementations across embedded devices like a Raspberry Pi, Nvidia Jetson, and Google Edge TPU. - -### Testing Coverage - -Rigorously testing edge cases is difficult with constrained embedded simulation resources, but exhaustive testing is critical in systems like self-driving cars. Exhaustively testing an autopilot model requires millions of simulated kilometers, exposing it to rare events like sensor failures. Therefore, strategies like synthetic data generation, distributed simulation, and chaos engineering help improve coverage. - -### Concept Drift Detection - -With limited monitoring data from each remote device, detecting changes in the input data over time is much harder. Drift can lead to degraded model performance. Lightweight methods are needed to identify when retraining is necessary. A model predicting power grid loads shows declining performance as usage patterns change over time. With only local device data, this trend is difficult to spot. - ## Traditional MLOps vs. Embedded MLOps +Building on our discussion of [On-device Learning](../ondevice_learning/ondevice_learning.qmd) in the previous chapter, we now turn our attention to the broader context of embedded systems in MLOps. The unique constraints and requirements of embedded environments significantly impact the implementation of machine learning models and operations. As we have discussed in previous chapters, embedded systems introduce unique challenges to MLOps due to their constrained resources, intermittent connectivity, and the need for efficient, power-aware computation. Unlike cloud environments with abundant compute and storage, embedded devices often operate with limited memory, power, and processing capabilities, requiring careful optimization of workflows. These limitations influence all aspects of MLOps, from deployment and data collection to monitoring and updates. + In traditional MLOps, ML models are typically deployed in cloud-based or server environments, with abundant resources like computing power and memory. These environments facilitate the smooth operation of complex models that require significant computational resources. For instance, a cloud-based image recognition model might be used by a social media platform to tag photos with relevant labels automatically. In this case, the model can leverage the extensive resources available in the cloud to efficiently process vast amounts of data. On the other hand, embedded MLOps involves deploying ML models on embedded systems, specialized computing systems designed to perform specific functions within larger systems. Embedded systems are typically characterized by their limited computational resources and power. For example, an ML model might be embedded in a smart thermostat to optimize heating and cooling based on the user's preferences and habits. The model must be optimized to run efficiently on the thermostat's limited hardware without compromising its performance or accuracy. @@ -589,7 +527,7 @@ The volume of aggregated data is much lower, often requiring techniques like fed Furthermore, the models must use simplified architectures optimized for low-power edge hardware. Given the computing limitations, high-end GPUs are inaccessible for intensive deep learning. Training leverages lower-powered edge servers and clusters with distributed approaches to spread load. -Transfer learning emerges as a crucial strategy to address data scarcity and irregularity in machine learning, particularly in edge computing scenarios. As illustrated in @fig-transfer-learning-mlops, this approach involves pre-training models on large public datasets and then fine-tuning them on limited domain-specific edge data. The figure depicts a neural network where initial layers (W_{A1} to W_{A4}), responsible for general feature extraction, are frozen (indicated by a green dashed line). These layers retain knowledge from previous tasks, accelerating learning and reducing resource requirements. The latter layers (W_{A5} to W_{A7}), beyond the blue dashed line, are fine-tuned for the specific task, focusing on task-specific feature learning. +Transfer learning emerges as a crucial strategy to address data scarcity and irregularity in machine learning, particularly in edge computing scenarios. As illustrated in @fig-transfer-learning-mlops, this approach involves pre-training models on large public datasets and then fine-tuning them on limited domain-specific edge data. The figure depicts a neural network where initial layers ($W_{A1}$ to $W_{A4}$), responsible for general feature extraction, are frozen (indicated by a green dashed line). These layers retain knowledge from previous tasks, accelerating learning and reducing resource requirements. The latter layers ($W_{A5}$ to $W_{A7}$), beyond the blue dashed line, are fine-tuned for the specific task, focusing on task-specific feature learning. ![Transfer learning in MLOps. Source: HarvardX.](images/png/transfer_learning.png){#fig-transfer-learning-mlops} @@ -721,35 +659,7 @@ Embedded MLOps governance must encompass privacy, security, safety, transparency So, while Embedded MLOps shares foundational MLOps principles, it faces unique constraints in tailoring workflows and infrastructure specifically for resource-constrained edge devices. -### Traditional MLOps - -Google, Microsoft, and Amazon offer their version of managed ML services. These include services that manage model training and experimentation, model hosting and scaling, and monitoring. These offerings are available via an API and client SDKs, as well as through web UIs. While it is possible to build your own end-to-end MLOps solutions using pieces from each, the greatest ease of use benefits come by staying within a single provider ecosystem to take advantage of interservice integrations. - -The following sections present a quick overview of the services that fit into each part of the MLOps life cycle described above, providing examples of offerings from different providers. It's important to note that the MLOps space is evolving rapidly; new companies and products are entering the scene at a swift pace. The examples mentioned are not meant to serve as endorsements of particular companies' offerings but rather to illustrate the types of solutions available in the market. - -#### Data Management - -Data storage and versioning are table stakes for any commercial offering, and most take advantage of existing general-purpose storage solutions such as S3. Others use more specialized options such as git-based storage (Example: [Hugging Face's Dataset Hub](https://huggingface.co/datasets)). This is an area where providers make it easy to support their competitors' data storage options, as they don't want this to be a barrier for adoptions of the rest of their MLOps services. For example, Vertex AI's training pipeline seamlessly supports datasets stored in S3, Google Cloud Buckets, or Hugging Face's Dataset Hub. - -#### Model Training - -Managed training services are where cloud providers shine, as they provide on-demand access to hardware that is out of reach for most smaller companies. They bill only for hardware during training time, putting GPU-accelerated training within reach of even the smallest developer teams. The control developers have over their training workflow can vary widely depending on their needs. Some providers have services that provide little more than access to the resources and rely on the developer to manage the training loop, logging, and model storage themselves. Other services are as simple as pointing to a base model and a labeled data set to kick off a fully managed finetuning job (example: [Vertex AI Fine Tuning](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models)). - -A word of warning: As of 2023, GPU hardware demand well exceeds supply, and as a result, cloud providers are rationing access to their GPUs. In some data center regions, GPUs may be unavailable or require long-term contracts. - -#### Model Evaluation - -Model evaluation tasks typically involve monitoring models' accuracy, latency, and resource usage in both the testing and production phases. Unlike embedded systems, ML models deployed to the cloud benefit from constant internet connectivity and unlimited logging capacities. As a result, it is often feasible to capture and log every request and response. This makes replaying or generating synthetic requests to compare different models and versions tractable. - -Some providers also offer services that automate the experiment tracking of modifying model hyperparameters. They track the runs and performance and generate artifacts from these model training runs. Example: [WeightsAndBiases](https://wandb.ai/) - -#### Model Deployment - -Each provider typically has a service referred to as a "model registry," where training models are stored and accessed. Often, these registries may also provide access to base models that are either open source or provided by larger technology companies (or, in some cases, like [LLAMA](https://ai.meta.com/llama/), both!). These model registries are a common place to compare all the models and their versions to allow easy decision-making on which to pick for a given use case. Example: [Vertex AI's model registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) - -From the model registry, deploying a model to an inference endpoint is quick and simple, and it handles the resource provisioning, model weight downloading, and hosting of a given model. These services typically give access to the model via a REST API where inference requests can be sent. Depending on the model type, specific resources can be configured, such as which type of GPU accelerator may be needed to hit the desired performance. Some providers may also offer serverless inference or batch inference options that do not need a persistent endpoint to access the model. Example: [AWS SageMaker Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) - -### Embedded MLOps +### Embedded MLOps Services Despite the proliferation of new MLOps tools in response to the increase in demand, the challenges described earlier have constrained the availability of such tools in embedded systems environments. More recently, new tools such as Edge Impulse [@janapa2023edge] have made the development process somewhat easier, as described below. diff --git a/contents/core/workflow/workflow.qmd b/contents/core/workflow/workflow.qmd index 25fd6cd2..41bf1b73 100644 --- a/contents/core/workflow/workflow.qmd +++ b/contents/core/workflow/workflow.qmd @@ -128,7 +128,7 @@ Understanding the various roles involved in an ML project is crucial for its suc : Roles and responsibilities of people involved in MLOps. {#tbl-mlops_roles .striped .hover} -As we proceed through the upcoming chapters, we will explore each role's essence and expertise and foster a deeper understanding of the complexities involved in AI projects. This holistic view facilitates seamless collaboration and nurtures an environment ripe for innovation and breakthroughs. +This holistic view facilitates seamless collaboration and nurtures an environment ripe for innovation and breakthroughs. As we proceed through the upcoming chapters, we will explore each role's essence and expertise and foster a deeper understanding of the complexities involved in AI projects. For a more detailed discussion of the specific tools and techniques these roles use, as well as an in-depth exploration of their responsibilities, refer to @sec-roles-and_resp-ops. ## Conclusion