Skip to content

xh-yuan/awesome-efficient-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

awesome-efficient-transformer

Introduction

Papers

Surveys

  • 2022 | IEEE | Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y. and Yang, Z., 2022. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence.

  • 2021 | arXiv | Fournier, Q., Caron, G.M. and Aloise, D., 2021. A practical survey on faster and lighter transformers. arXiv preprint arXiv:2103.14636.

  • 2020 | ACM | Tay, Y., Dehghani, M., Bahri, D. and Metzler, D., 2020. Efficient transformers: A survey. ACM Computing Surveys (CSUR).

Pruning

  • 2022 | CVPR | Chavan, A., Shen, Z., Liu, Z., Liu, Z., Cheng, K.T. and Xing, E.P., 2022. Vision transformer slimming: Multi-dimension searching in continuous optimization space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4931-4941).

  • 2022 | ACL | Xia, M., Zhong, Z. and Chen, D., 2022. Structured pruning learns compact and accurate models. arXiv preprint arXiv:2204.00408.

  • 2022 | CSL |Sajjad, H., Dalvi, F., Durrani, N. and Nakov, P., 2022. On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, p.101429.

  • 2022 | AAAI | Xu, Y., Zhang, Z., Zhang, M., Sheng, K., Li, K., Dong, W., Zhang, L., Xu, C. and Sun, X., 2022, June. Evo-vit: Slow-fast token evolution for dynamic vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 3, pp. 2964-2972).

  • 2021 | KDD | Zhu, M., Tang, Y. and Han, K., 2021. Vision transformer pruning. arXiv preprint arXiv:2104.08500.

  • 2020 | NIPS | Hou, L., Huang, Z., Shang, L., Jiang, X., Chen, X. and Liu, Q., 2020. Dynabert: Dynamic bert with adaptive width and depth. Advances in Neural Information Processing Systems, 33, pp.9782-9793.

  • 2020 | ICLR | Fan, A., Grave, E. and Joulin, A., 2019. Reducing transformer depth on demand with structured dropout. arXiv preprint arXiv:1909.11556.

  • 2020 | EMNLP | Prasanna, S., Rogers, A. and Rumshisky, A., 2020. When bert plays the lottery, all tickets are winning. arXiv preprint arXiv:2005.00561.

  • 2019 | NIPS | Michel, P., Levy, O. and Neubig, G., 2019. Are sixteen heads really better than one?. Advances in neural information processing systems, 32.

Quantization

  • 2021 | NIPS | Liu, Z., Wang, Y., Han, K., Zhang, W., Ma, S. and Gao, W., 2021. Post-training quantization for vision transformer. Advances in Neural Information Processing Systems, 34, pp.28092-28103.

  • 2020 | EMNLP | Shridhar, K., Jain, H., Agarwal, A. and Kleyko, D., 2020, November. End to end binarized neural networks for text classification. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing (pp. 29-34).

  • 2020 | EMNLP | Prato, G., Charlaix, E. and Rezagholizadeh, M., 2019. Fully quantized transformer for machine translation. arXiv preprint arXiv:1910.10485.

  • 2020 | NLPCC | Zhao, Z., Liu, Y., Chen, L., Liu, Q., Ma, R. and Yu, K., 2020, October. An investigation on different underlying quantization schemes for pre-trained language models. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 359-371). Springer, Cham.

  • 2019 | Cheong, R. and Daniel, R., 2019. transformers. zip: Compressing Transformers with Pruning and Quantization. Technical report, tech. rep., Stanford University, Stanford, California.

Decomposition

  • 2020 | EMNLP | Wang, Z., Wohlwend, J. and Lei, T., 2019. Structured pruning of large language models. arXiv preprint arXiv:1910.04732.
  • 2020 | NIPS | Wang, W., Wei, F., Dong, L., Bao, H., Yang, N. and Zhou, M., 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33, pp.5776-5788.

Knowledge Distillation

  • 2020 | ACL | Mukherjee, S. and Awadallah, A., 2020. XtremeDistil: Multi-stage distillation for massive multilingual models. arXiv preprint arXiv:2004.05686.

  • 2020 | NIPS | Wang, W., Wei, F., Dong, L., Bao, H., Yang, N. and Zhou, M., 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33, pp.5776-5788.

  • 2020 | arXiv | Jia, D., Han, K., Wang, Y., Tang, Y., Guo, J., Zhang, C. and Tao, D., 2021. Efficient vision transformers via fine-grained manifold distillation. arXiv preprint arXiv:2107.01378.

Architecture Design

  • 2022 | CVPR | Chen, Y., Dai, X., Chen, D., Liu, M., Dong, X., Yuan, L. and Liu, Z., 2022. Mobile-former: Bridging mobilenet and transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5270-5279).
  • 2022 | CVPR | Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J. and Yan, S., 2022. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10819-10829).
  • 2022 | CVPR | Arar, M., Shamir, A. and Bermano, A.H., 2022. Learned Queries for Efficient Local Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10841-10852).
  • 2021 | ICLR | Geng, Z., Guo, M.H., Chen, H., Li, X., Wei, K. and Lin, Z., 2021. Is attention better than matrix decomposition?. arXiv preprint arXiv:2109.04553.
  • 2021 | ICCV | Li, C., Tang, T., Wang, G., Peng, J., Wang, B., Liang, X. and Chang, X., 2021. Bossnas: Exploring hybrid cnn-transformers with block-wisely self-supervised neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12281-12291).
  • 2021 | arXiv | Su, X., You, S., Xie, J., Zheng, M., Wang, F., Qian, C., Zhang, C., Wang, X. and Xu, C., 2021. Vision transformer architecture search. arXiv e-prints, pp.arXiv-2106.
  • 2020 | NIPS | Jiang, Z.H., Yu, W., Zhou, D., Chen, Y., Feng, J. and Yan, S., 2020. Convbert: Improving bert with span-based dynamic convolution. Advances in Neural Information Processing Systems, 33, pp.12837-12848.
  • 2020 | NIPS | Yun, C., Chang, Y.W., Bhojanapalli, S., Rawat, A.S., Reddi, S. and Kumar, S., 2020. O (n) connections are expressive enough: Universal approximability of sparse transformers. Advances in Neural Information Processing Systems, 33, pp.13783-13794.
  • 2020 | NIPS | Zaheer, M., Guruganesh, G., Dubey, K.A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L. and Ahmed, A., 2020. Big bird: Transformers for longer sequences. Advances in Neural Information Processing Systems, 33, pp.17283-17297.
  • 2020 | ICML | Katharopoulos, A., Vyas, A., Pappas, N. and Fleuret, F., 2020, November. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning (pp. 5156-5165). PMLR.
  • 2019 | NIPS | Guo, Y., Zheng, Y., Tan, M., Chen, Q., Chen, J., Zhao, P. and Huang, J., 2019. Nat: Neural architecture transformer for accurate and compact architectures. Advances in Neural Information Processing Systems, 32.
  • 2019 | LCML | So, D., Le, Q. and Liang, C., 2019, May. The evolved transformer. In International Conference on Machine Learning (pp. 5877-5886). PMLR.

Project

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published