From 7b03f432f794b3789e56dcb89a9150b57f97484a Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sat, 9 Dec 2023 14:41:07 -0500 Subject: [PATCH] Minor spacing fix before reference --- hw_acceleration.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw_acceleration.qmd b/hw_acceleration.qmd index e47b7451..302284e8 100644 --- a/hw_acceleration.qmd +++ b/hw_acceleration.qmd @@ -530,7 +530,7 @@ efficiency gains compared to isolated efforts. Jointly exploring innovations in neural network architectures along with custom hardware design is a powerful co-design technique. This allows finding ideal pairings tailored to each other's strengths [@sze2017efficient]. -For instance, the shift to mobile architectures like MobileNets [@howard_mobilenets_2017] was guided by edge device constraints like model size and latency. The quantization [@jacob2018quantization] and pruning techniques [@gale2019state] that unlocked these efficient models became possible thanks to hardware accelerators with native low-precision integer support and pruning support[@mishrapruning]. +For instance, the shift to mobile architectures like MobileNets [@howard_mobilenets_2017] was guided by edge device constraints like model size and latency. The quantization [@jacob2018quantization] and pruning techniques [@gale2019state] that unlocked these efficient models became possible thanks to hardware accelerators with native low-precision integer support and pruning support [@mishrapruning]. Attention-based models have thrived on massively parallel GPUs and ASICs where their computation maps well spatially, as opposed to RNN architectures reliant on sequential processing. Co-evolution of algorithms and hardware unlocked new capabilities.