Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed figure captions and references #205

Merged
merged 2 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 22 additions & 22 deletions contents/data_engineering/data_engineering.qmd

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions contents/dl_primer/dl_primer.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ This section briefly introduces deep learning, starting with an overview of its

### Definition and Importance

Deep learning, a specialized area within machine learning and artificial intelligence (AI), utilizes algorithms modeled after the structure and function of the human brain, known as artificial neural networks. This field is a foundational element in AI, driving progress in diverse sectors such as computer vision, natural language processing, and self-driving vehicles. Its significance in embedded AI systems is highlighted by its capability to handle intricate calculations and predictions, optimizing the limited resources in embedded settings.
Deep learning, a specialized area within machine learning and artificial intelligence (AI), utilizes algorithms modeled after the structure and function of the human brain, known as artificial neural networks. This field is a foundational element in AI, driving progress in diverse sectors such as computer vision, natural language processing, and self-driving vehicles. Its significance in embedded AI systems is highlighted by its capability to handle intricate calculations and predictions, optimizing the limited resources in embedded settings. @fig-ai-ml-dl illustrates the chronological development and relative segmentation of the three fields.

![[Source](https://1394217531-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LvBP1svpACTB1R1x_U4%2F-LvCh0IFvnfX-S1za_GI%2F-LvD0gbfAKEIMXcVxdqQ%2Fimage.png?alt=media&token=d6ca58f0-ebe3-4188-a90a-dc68256e1b0a)](images/png/ai_dl_progress_nvidia.png)
![Artificial intelligence subfields. Credit: NVIDIA.](images/png/ai_dl_progress_nvidia.png){#fig-ai-ml-dl}

### Brief History of Deep Learning

Expand Down Expand Up @@ -75,17 +75,17 @@ Neural networks serve as the foundation of deep learning, inspired by the biolog

### Perceptrons

The Perceptron is the basic unit or node that is the foundation for more complex structures. It takes various inputs, applies weights and biases to them, and then uses an activation function to produce an output.
The Perceptron is the basic unit or node that is the foundation for more complex structures. It takes various inputs, applies weights and biases to them, and then uses an activation function to produce an output. @fig-perceptron illustrates the building blocks of a perceptron. In simple terms, think of a perceptron as a tiny decision-maker that learns to make a binary decision (e.g. 'yes' or 'no'). It takes in numbers as inputs (`x_1, x_2, ...`), each representing a feature of an object we wish to analyze (an image for example). Then it multiplies each input by a weight, adds them up, and if the total is high enough (crosses a certain threshold), it returns "yes" as an asnwer, otherwise, it outputs "no."

![Perceptron ([source](https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Rosenblattperceptron.png/500px-Rosenblattperceptron.png))](images/png/Rosenblattperceptron.png){#fig-perceptron}
![Perceptron. Credit: Wikimedia - Chrislb.](images/png/Rosenblattperceptron.png){#fig-perceptron}

Conceived in the 1950s, perceptrons paved the way for developing more intricate neural networks and have been a fundamental building block in deep learning.

### Multilayer Perceptrons

Multilayer perceptrons (MLPs) are an evolution of the single-layer perceptron model, featuring multiple layers of nodes connected in a feedforward manner. These layers include an input layer for data reception, several hidden layers for data processing, and an output layer for final result generation. MLPs are skilled at identifying non-linear relationships and use a backpropagation technique for training, where weights are optimized through a gradient descent algorithm.
Multilayer perceptrons (MLPs) are an evolution of the single-layer perceptron model, featuring multiple layers of nodes connected in a feedforward manner, as shown in @fig-mlp. These layers include an input layer for data reception, several hidden layers for data processing, and an output layer for final result generation. MLPs are skilled at identifying non-linear relationships and use a backpropagation technique for training, where weights are optimized through a gradient descent algorithm.

![Multilayer Perceptron](https://www.nomidl.com/wp-content/uploads/2022/04/image-7.png){width=70%}
![Multilayer Perceptron. Credit: Wikimedia - Charlie.](https://www.nomidl.com/wp-content/uploads/2022/04/image-7.png){width=70%, #fig-mlp}

#### Forward Pass

Expand Down
20 changes: 11 additions & 9 deletions contents/efficient_ai/efficient_ai.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ Training models can consume significant energy, sometimes equivalent to the carb

## The Need for Efficient AI

Efficiency takes on different connotations depending on where AI computations occur. Let's revisit and differentiate between Cloud, Edge, and TinyML in terms of efficiency.
Efficiency takes on different connotations depending on where AI computations occur. Let's revisit and differentiate between Cloud, Edge, and TinyML in terms of efficiency. @fig-platforms provides a big picture comparison of the three different platforms.

![Cloud, Mobile and TinyML.](https://www.mdpi.com/futureinternet/futureinternet-14-00363/article_deploy/html/images/futureinternet-14-00363-g001-550.jpg){#fig-platforms}
![Cloud, Mobile and TinyML. Credit: @schizas2022tinyml.](https://www.mdpi.com/futureinternet/futureinternet-14-00363/article_deploy/html/images/futureinternet-14-00363-g001-550.jpg){#fig-platforms}

For cloud AI, traditional AI models often run in large-scale data centers equipped with powerful GPUs and TPUs [@barroso2019datacenter]. Here, efficiency pertains to optimizing computational resources, reducing costs, and ensuring timely data processing and return. However, relying on the cloud introduced latency, especially when dealing with large data streams that must be uploaded, processed, and downloaded.

Expand All @@ -66,13 +66,13 @@ Choosing the right model architecture is as crucial as optimizing it. In recent

Model compression methods are very important for bringing deep learning models to devices with limited resources. These techniques reduce models' size, energy consumption, and computational demands without significantly losing accuracy. At a high level, the methods can briefly be binned into the following fundamental methods:

**Pruning:** This is akin to trimming the branches of a tree. This was first thought of in the [Optimal Brain Damage](https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf) paper [@lecun1989optimal]. This was later popularized in the context of deep learning by @han2016deep. Certain weights or even entire neurons are removed from the network in pruning based on specific criteria. This can significantly reduce the model size. Various strategies include weight pruning, neuron pruning, and structured pruning. We will explore these in more detail in @sec-pruning. In the example in @fig-pruning, removing some of the nodes in the inner layers reduces the number of edges between the nodes and, in turn, the model's size.
**Pruning:** This is akin to trimming the branches of a tree. This was first thought of in the [Optimal Brain Damage](https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf) paper [@lecun1989optimal]. This was later popularized in the context of deep learning by @han2016deep. Certain weights or even entire neurons are removed from the network in pruning based on specific criteria. This can significantly reduce the model size. Various strategies include weight pruning, neuron pruning, and structured pruning. We will explore these in more detail in @sec-pruning. @fig-pruning is an examples of neural network pruning: removing some of the nodes in the inner layers (based on a specific criteria) reduces the numbers of edges between the nodes and, in turn, the size of the model.

![Pruning applies different criteria that determine which nodes and/or weights can be removed without having significant impact on the model's performance.](images/jpg/pruning.jpeg){#fig-pruning}
![Neural Network Pruning.](images/jpg/pruning.jpeg){#fig-pruning}

**Quantization:** quantization is the process of constraining an input from a large set to output in a smaller set, primarily in deep learning; this means reducing the number of bits that represent the weights and biases of the model. For example, using 16-bit or 8-bit representations instead of 32-bit can reduce the model size and speed up computations, with a minor trade-off in accuracy. We will explore these in more detail in @sec-quant. @fig-quantization shows an example of quantization by rounding to the closest number. The conversion from 32-bit floating point to 16-bit reduces the memory usage by 50%. And going from 32-bit to 8-bit Integer, memory is reduced by 75%. While the loss in numeric precision, and consequently model performance, is minor, the memory usage efficiency is very significant.

![One method of quantization involves rounding to the nearest representable number. Quantization helps save on memory while minimizing performance loss.](images/jpg/quantization.jpeg){#fig-quantization}
![Different forms of quantization.](images/jpg/quantization.jpeg){#fig-quantization}

**Knowledge Distillation:** Knowledge distillation involves training a smaller model (student) to replicate the behavior of a larger model (teacher). The idea is to transfer the knowledge from the cumbersome model to the lightweight one. Hence, the smaller model attains performance close to its larger counterpart but with significantly fewer parameters. We will explore knowledge distillation in more detail in the @sec-kd.

Expand All @@ -84,7 +84,7 @@ Model compression methods are very important for bringing deep learning models t

[Edge TPUs](https://cloud.google.com/edge-tpu) are a smaller, power-efficient version of Google's TPUs tailored for edge devices. They provide fast on-device ML inferencing for TensorFlow Lite models. Edge TPUs allow for low-latency, high-efficiency inference on edge devices like smartphones, IoT devices, and embedded systems. AI capabilities can be deployed in real-time applications without communicating with a central server, thus saving bandwidth and reducing latency. Consider the table in @fig-edge-tpu-perf. It shows the performance differences between running different models on CPUs versus a Coral USB accelerator. The Coral USB accelerator is an accessory by Google's Coral AI platform that lets developers connect Edge TPUs to Linux computers. Running inference on the Edge TPUs was 70 to 100 times faster than on CPUs.

![Many applications require high-performance inference, which can be achieved with on-device accelerators like Edge TPUs. Source: [TensorFlow Blog](https://blog.tensorflow.org/2019/03/build-ai-that-works-offline-with-coral.html)](images/png/tflite_edge_tpu_perf.png){#fig-edge-tpu-perf}
![Accelerator vs CPU performance comparison. Credit: [TensorFlow Blog.](https://blog.tensorflow.org/2019/03/build-ai-that-works-offline-with-coral.html)](images/png/tflite_edge_tpu_perf.png){#fig-edge-tpu-perf}

**NN Accelerators:** Fixed-function neural network accelerators are hardware accelerators designed explicitly for neural network computations. They can be standalone chips or part of a larger system-on-chip (SoC) solution. By optimizing the hardware for the specific operations that neural networks require, such as matrix multiplications and convolutions, NN accelerators can achieve faster inference times and lower power consumption than general-purpose CPUs and GPUs. They are especially beneficial in TinyML devices with power or thermal constraints, such as smartwatches, micro-drones, or robotics.

Expand All @@ -108,7 +108,9 @@ Several other numerical formats fall into an exotic class. An exotic example is

By retaining the 8-bit exponent of FP32, BF16 offers a similar range, which is crucial for deep learning tasks where certain operations can result in very large or very small numbers. At the same time, by truncating precision, BF16 allows for reduced memory and computational requirements compared to FP32. BF16 has emerged as a promising middle ground in the landscape of numerical formats for deep learning, providing an efficient and effective alternative to the more traditional FP32 and FP16 formats.

![Three floating-point formats. Source: [Google blog](google.com)](https://storage.googleapis.com/gweb-cloudblog-publish/images/Three_floating-point_formats.max-624x261.png){#fig-fp-formats}
@fig-float-point-formats shows three different floating-point formats: Float32, Float16, and BFloat16.

![Three floating-point formats.](images/jpg/three_float_types.jpeg){#fig-float-point-formats width=90%}

**Integer:** These are integer representations using 8, 4, and 2 bits. They are often used during the inference phase of neural networks, where the weights and activations of the model are quantized to these lower precisions. Integer representations are deterministic and offer significant speed and memory advantages over floating-point representations. For many inference tasks, especially on edge devices, the slight loss in accuracy due to quantization is often acceptable, given the efficiency gains. An extreme form of integer numerics is for binary neural networks (BNNs), where weights and activations are constrained to one of two values: +1 or -1.

Expand Down Expand Up @@ -169,9 +171,9 @@ Moreover, the optimal model choice is only sometimes universal but often depends

Another important consideration is the relationship between model complexity and its practical benefits. Take voice-activated assistants, such as "Alexa" or "OK Google." While a complex model might demonstrate a marginally superior understanding of user speech if it's slower to respond than a simpler counterpart, the user experience could be compromised. Thus, adding layers or parameters only sometimes equates to better real-world outcomes.

Furthermore, while benchmark datasets, such as ImageNet [@russakovsky2015imagenet], COCO [@lin2014microsoft], Visual Wake Words [@chowdhery2019visual], Google Speech Commands [@warden2018speech], etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, @fig-stoves from the Dollar Street dataset shows stove images across extreme monthly incomes. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions.
Furthermore, while benchmark datasets, such as ImageNet [@russakovsky2015imagenet], COCO [@lin2014microsoft], Visual Wake Words [@chowdhery2019visual], Google Speech Commands [@warden2018speech], etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, @fig-stoves from the Dollar Street dataset shows stove images across extreme monthly incomes. Stoves have different shapes and technological levels across different regions and income levels. A model that is not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions.

![Objects like stoves have different shapes and technological levels in different regions. A model not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. Source: Dollar Street stove images.](https://pbs.twimg.com/media/DmUyPSSW0AAChGa.jpg){#fig-stoves}
![Different types of stoves. Credit: Dollar Street stove images.](https://pbs.twimg.com/media/DmUyPSSW0AAChGa.jpg){#fig-stoves}

In essence, a thorough comparative analysis transcends numerical metrics. It's a holistic assessment intertwined with real-world applications, costs, and the intricate subtleties that each model brings to the table. This is why having standard benchmarks and metrics widely established and adopted by the community becomes important.

Expand Down
Loading
Loading