diff --git a/dl_primer.qmd b/dl_primer.qmd index b292dba3..32453600 100644 --- a/dl_primer.qmd +++ b/dl_primer.qmd @@ -10,15 +10,15 @@ Deep learning, a specialized area within machine learning and artificial intelli ### Brief History of Deep Learning -The idea of deep learning has origins in early artificial neural networks. It has experienced several cycles of interest, starting with the introduction of the Perceptron in the 1950s, followed by the invention of backpropagation algorithms in the 1980s. +The idea of deep learning has origins in early artificial neural networks. It has experienced several cycles of interest, starting with the introduction of the Perceptron in the 1950s [@rosenblatt1957perceptron], followed by the invention of backpropagation algorithms in the 1980s [@rumelhart1986learning]. -The term "deep learning" became prominent in the 2000s, characterized by advances in computational power and data accessibility. Important milestones include the successful training of deep networks like AlexNet by Geoffrey Hinton, a leading figure in AI, and the renewed focus on neural networks as effective tools for data analysis and modeling. +The term "deep learning" became prominent in the 2000s, characterized by advances in computational power and data accessibility. Important milestones include the successful training of deep networks like AlexNet [@krizhevsky2012imagenet] by [Geoffrey Hinton](https://amturing.acm.org/award_winners/hinton_4791679.cfm), a leading figure in AI, and the renewed focus on neural networks as effective tools for data analysis and modeling. -In recent times, deep learning has seen exponential growth, transforming various industries. Computational growth followed an 18-month doubling pattern from 1952 to 2010, which then accelerated to a 6-month cycle from 2010 to 2022. Concurrently, we saw the emergence of large-scale models between 2015 and 2022, appearing 2 to 3 orders of magnitude faster and following a 10-month doubling cycle. +In recent times, deep learning has seen exponential growth, transforming various industries. Computational growth followed an 18-month doubling pattern from 1952 to 2010, which then accelerated to a 6-month cycle from 2010 to 2022, as shown in @fig-trends. Concurrently, we saw the emergence of large-scale models between 2015 and 2022, appearing 2 to 3 orders of magnitude faster and following a 10-month doubling cycle. ![Growth of deep learning models.](https://epochai.org/assets/images/posts/2022/compute-trends.png){#fig-trends} -Multiple factors have contributed to this surge, including advancements in computational power, the abundance of big data, and improvements in algorithmic designs. First, the growth of computational capabilities, especially the arrival of Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), has significantly sped up the training and inference times of deep learning models. These hardware improvements have enabled the construction and training of more complex, deeper networks than what was possible in earlier years. +Multiple factors have contributed to this surge, including advancements in computational power, the abundance of big data, and improvements in algorithmic designs. First, the growth of computational capabilities, especially the arrival of Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) [@jouppi2017datacenter], has significantly sped up the training and inference times of deep learning models. These hardware improvements have enabled the construction and training of more complex, deeper networks than what was possible in earlier years. Second, the digital revolution has yielded a wealth of big data, offering rich material for deep learning models to learn from and excel in tasks such as image and speech recognition, language translation, and game playing. The presence of large, labeled datasets has been key in refining and successfully deploying deep learning applications in real-world settings. @@ -78,8 +78,11 @@ The forward pass is the initial phase where data moves through the network from #### Backward Pass (Backpropagation) -Backpropagation is a key algorithm in training deep neural networks. This phase involves calculating the gradient of the loss function concerning each weight by using the chain rule, effectively moving backward through the network. The gradients calculated +Backpropagation is a key algorithm in training deep neural networks. This phase involves calculating the gradient of the loss function concerning each weight by using the chain rule, effectively moving backward through the network. The gradients calculated in this step guide the adjustment of weights with the objective of minimizing the loss function, thereby enhancing the network's performance with each iteration of training. +Grasping these foundational concepts paves the way to understanding more intricate deep learning architectures and techniques, fostering the development of more sophisticated and efficacious applications, especially within the realm of embedded AI systems. + +{{< video https://www.youtube.com/embed/aircAruvnKk?si=qfkBf8MJjC2WSyw3 >}} ### Model Architectures @@ -106,19 +109,19 @@ In embedded systems, these networks can be used in voice recognition systems, pr #### Generative Adversarial Networks (GANs) -GANs consist of two networks, a generator and a discriminator, trained simultaneously through adversarial training. The generator produces data that tries to mimic the real data distribution, while the discriminator aims to distinguish between real and generated data. GANs are widely used in image generation, style transfer, and data augmentation. +GANs consist of two networks, a generator and a discriminator, trained simultaneously through adversarial training [@goodfellow2020generative]. The generator produces data that tries to mimic the real data distribution, while the discriminator aims to distinguish between real and generated data. GANs are widely used in image generation, style transfer, and data augmentation. In embedded settings, GANs could be used for on-device data augmentation to enhance the training of models directly on the embedded device, enabling continual learning and adaptation to new data without the need for cloud computing resources. #### Autoencoders -Autoencoders are neural networks used for data compression and noise reduction. They are structured to encode input data into a lower-dimensional representation and then decode it back to its original form. Variants like Variational Autoencoders (VAEs) introduce probabilistic layers that allow for generative properties, finding applications in image generation and anomaly detection. +Autoencoders are neural networks used for data compression and noise reduction [@bank2023autoencoders]. They are structured to encode input data into a lower-dimensional representation and then decode it back to its original form. Variants like Variational Autoencoders (VAEs) introduce probabilistic layers that allow for generative properties, finding applications in image generation and anomaly detection. Using autoencoders can help in efficient data transmission and storage, improving the overall performance of embedded systems with limited computational and memory resources. #### Transformer Networks -Transformer networks have emerged as a powerful architecture, especially in natural language processing. These networks use self-attention mechanisms to weigh the influence of different input words on each output word, enabling parallel computation and capturing intricate patterns in data. Transformer networks have led to state-of-the-art results in tasks like language translation, summarization, and text generation. +Transformer networks have emerged as a powerful architecture, especially in natural language processing [@vaswani2017attention]. These networks use self-attention mechanisms to weigh the influence of different input words on each output word, enabling parallel computation and capturing intricate patterns in data. Transformer networks have led to state-of-the-art results in tasks like language translation, summarization, and text generation. These networks can be optimized to perform language-related tasks directly on-device. For example, transformers can be used in embedded systems for real-time translation services or voice-assisted interfaces, where latency and computational efficiency are crucial. Techniques such as model distillation can be employed to deploy these networks on embedded devices with limited resources. @@ -200,4 +203,3 @@ Furthermore, we delved into an examination of the limitations of deep learning. In this primer, we have equipped you with the knowledge to make informed choices between deploying traditional machine learning or deep learning techniques, depending on the unique demands and constraints of a specific problem. As we conclude this chapter, we hope you are now well-equipped with the basic "language" of deep learning, prepared to delve deeper into the subsequent chapters with a solid understanding and critical perspective. The journey ahead is filled with exciting opportunities and challenges in embedding AI within systems. -