Skip to content

Commit

Permalink
adding references for 2:4 pruning
Browse files Browse the repository at this point in the history
  • Loading branch information
srivatsankrishnan committed Dec 9, 2023
1 parent b1e8d6b commit 4c1a3ae
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
4 changes: 2 additions & 2 deletions hw_acceleration.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -526,11 +526,11 @@ guide software optimizations, while algorithmic advances inform hardware
specialization. This mutual enhancement provides multiplicative
efficiency gains compared to isolated efforts.

#### Algorithm-Hardare Co-exploration
#### Algorithm-Hardware Co-exploration

Jointly exploring innovations in neural network architectures along with custom hardware design is a powerful co-design technique. This allows finding ideal pairings tailored to each other's strengths [@sze2017efficient].

For instance, the shift to mobile architectures like MobileNets [@howard_mobilenets_2017] was guided by edge device constraints like model size and latency. The quantization [@jacob2018quantization] and pruning techniques [@gale2019state] that unlocked these efficient models became possible thanks to hardware accelerators with native low-precision integer support.
For instance, the shift to mobile architectures like MobileNets [@howard_mobilenets_2017] was guided by edge device constraints like model size and latency. The quantization [@jacob2018quantization] and pruning techniques [@gale2019state] that unlocked these efficient models became possible thanks to hardware accelerators with native low-precision integer support and pruning support[@mishrapruning].

Attention-based models have thrived on massively parallel GPUs and ASICs where their computation maps well spatially, as opposed to RNN architectures reliant on sequential processing. Co-evolution of algorithms and hardware unlocked new capabilities.

Expand Down
21 changes: 21 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -4669,3 +4669,24 @@ @article{zhuang2020comprehensive
number = 1,
pages = {43--76}
}

@article{mishrapruning,
author = {Asit K. Mishra and
Jorge Albericio Latorre and
Jeff Pool and
Darko Stosic and
Dusan Stosic and
Ganesh Venkatesh and
Chong Yu and
Paulius Micikevicius},
title = {Accelerating Sparse Deep Neural Networks},
journal = {CoRR},
volume = {abs/2104.08378},
year = {2021},
url = {https://arxiv.org/abs/2104.08378},
eprinttype = {arXiv},
eprint = {2104.08378},
timestamp = {Mon, 26 Apr 2021 17:25:10 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08378.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

0 comments on commit 4c1a3ae

Please sign in to comment.