Skip to content

nemo9cby/awesome-se4ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

awesome-se4ai

A curated list of literature on se4ai

Web Resources:

  1. Software Engineering for AI-Enabled Systems (SE4AI) - CMU 17-445/645, Summer 2020.
  2. Introducing the Data Validation Tool - Google. 2021.
  3. Datamations: Animated Explanations of Data Analysis Pipelines - CHI 2021.
  4. Can AI Replace Lawyers? Researchers Say Machine Learning Can Help Predict Summary Judgment Outcomes
  5. Underspecification Presents Challenges in modern ML
  6. Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self-assessment
  7. The reusable holdout: Preserving validity in adaptive data analysis
  8. mllint — Linter for Machine Learning projects
  9. PE_for_ML
  10. Operationalizing Machine Learning
  11. Versioning, Provenance, and Reproducibility in Production Machine Learning
  12. Time Travel and Provenance for Machine Learning Pipelines
  13. Automating Entity Matching Model Development
  14. DataPrep - The easiest way to prepare data in Python
  15. Using PyTorch + NumPy? You're making a mistake.
  16. MLOps: Continuous delivery and automation pipelines in machine learning
  17. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers
  18. Versioning ML Models & Data in Time and Space
  19. Overton: A Data System for Monitoring and Improving Machine-Learned Products
  20. Introducing Ludwig, a Code-Free Deep Learning Toolbox
  21. Awesome-mlops
  22. Git for data
  23. MLCommons
  24. Machine Learning at Industrial Scale: Lessons from the MLflow Project
  25. MediaPipe
  26. Netflix's Metaflow: Reproducible machine learning pipelines
  27. Traceability for Trustworthy AI: A Review of Models and Tools
  28. Finding duplicate images made easy!

Papers:

  1. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017: 1-18.
  2. DeepTest: automated testing of deep-neural-network-driven autonomous cars. Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray. ICSE 2018: 303-314.
  3. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan. ICSE 2019: 1027-1038.
  4. Guiding deep learning system testing using surprise adequacy. Jinhan Kim, Robert Feldt, Shin Yoo. ICSE 2019: 1039-1049.
  5. Adversarial sample detection for deep neural network through model mutation testing. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, Peixin Zhang. ICSE 2019: 1245-1256.
  6. SLEMI: equivalence modulo input (EMI) based mutation of CPS models for finding compiler bugs in Simulink. Shafiul Azam Chowdhury, Sohil Lal Shrestha, Taylor T. Johnson, Christoph Csallner. ICSE 2020: 335-346.
  7. DeepBillboard: systematic physical-world testing of autonomous driving systems. Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Bei Yu, Lingming Zhang, Cong Liu. ICSE 2020: 347-358.
  8. Misbehaviour prediction for autonomous driving systems. Andrea Stocco, Michael Weiss, Marco Calzana, Paolo Tonella. ICSE 2020: 359-371.
  9. Approximation-refinement testing of compute-intensive cyber-physical models: an approach based on system identification. Claudio Menghi, Shiva Nejati, Lionel C. Briand, Yago Isasi Parache. ICSE 2020: 372-384.
  10. A comprehensive study of autonomous vehicle bugs. Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, Qi Alfred Chen. ICSE 2020: 385-396.
  11. Importance-driven deep learning system testing. Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, Alper Cakan: ICSE 2020: 702-713
  12. ReluDiff: differential verification of deep neural networks. Brandon Paulsen, Jingbo Wang, Chao Wang. ICSE 2020:714-726
  13. Dissector: input validation for deep learning applications by crossing-layer dissection. Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, Jian Lu. ICSE 2020:727-738
  14. Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. Xiyue Zhang, Xiaofei Xie, Lei Ma, Xiaoning Du, Qiang Hu, Yang Liu, Jianjun Zhao, Meng Sun. ICSE 2020:739-751
  15. White-box fairness testing through adversarial sampling. Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, Ting Dai. ICSE 2020:949-960
  16. Structure-invariant testing for machine translation. Pinjia He, Clara Meister, Zhendong Su. ICSE 2020:961-973
  17. Automatic testing and improvement of machine translation. Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang. ICSE 2020:974-985
  18. TRADER: trace divergence analysis and embedding regulation for debugging recurrent neural networks. Guanhong Tao, Shiqing Ma, Yingqi Liu, Qiuling Xu, Xiangyu Zhang.ICSE 2020:986-998
  19. Taxonomy of real faults in deep learning systems. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, Paolo Tonella. ICSE 2020:1110-1121
  20. Testing DNN image classifiers for confusion & bias errors.Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail E. Kaiser, Baishakhi Ray. ICSE 2020:1122-1134
  21. Repairing deep neural networks: fix patterns and challenges.Md Johirul Islam, Rangeet Pan, Giang Nguyen, Hridesh Rajan. ICSE 2020:1135-1146
  22. Fuzz testing based data augmentation to improve robustness of deep neural networks.Xiang Gao, Ripon K. Saha, Mukul R. Prasad, Abhik Roychoudhury. ICSE 2020:1147-1158. Video.
  23. An empirical study on program failures of deep learning jobs. Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang. ICSE 2020:1159-1170

ICSE 2021

  1. Are Machine Learning Cloud APIs Used Correctly? Chengcheng Wan, Shicheng Liu, Henry Hoffmann, Michael Maire, Shan Lu. ICSE 2021:125-137.
  2. Resource-Guided Configuration Space Reduction for Deep Learning Models. Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang: ICSE 2021:175-187.
  3. Distribution-Aware Testing of Neural Networks Using Generative Models. Swaroopa Dola, Matthew B. Dwyer, Mary Lou Soffa. ICSE 2021:226-237.
  4. An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems. Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja. ICSE 2021: 238-250.
  5. DeepLocalize: Fault Localization for Deep Neural Networks. Mohammad Wardat, Wei Le, Hridesh Rajan. ICSE 2021:251-262
  6. DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection. Yuanchun Li, Jiayi Hua, Haoyu Wang, Chunyang Chen, Yunxin Liu. ICSE 2021: 263-274
  7. Reducing DNN Properties to Enable Falsification with Adversarial Attacks. David Shriver, Sebastian G. Elbaum, Matthew B. Dwyer. ICSE 2021: 275-287
  8. Graph-based Fuzz Testing for Deep Learning Inference Engines. Weisi Luo, Dong Chai, Xiaoyue Run, Jiang Wang, Chunrong Fang, Zhenyu Chen. ICSE 2021: 288-299
  9. RobOT: Robustness-Oriented Testing for Deep Learning Systems. Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, Peng Cheng. ICSE 2021: 300-311
  10. Scalable Quantitative Verification For Deep Neural Networks. Teodora Baluta, Zheng Leong Chua, Kuldeep S. Meel, Prateek Saxena. ICSE 2021:312-323
  11. Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions. Antonio Guerriero, Roberto Pietrantuono, Stefano Russo. ICSE 2021: 348-358
  12. AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System. Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen. ICSE 2021: 359-371
  13. Self-Checking Deep Neural Networks in Deployment. Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian G. Elbaum, Yun Lin, Jin Song Dong. ICSE 2021: 372-384
  14. Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models. Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu. ICSE 2021: 385-396
  15. Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis. Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang. ICSE 2021:397-409
  16. Testing Machine Translation via Referential Transparency. Pinjia He, Clara Meister, Zhendong Su. ICSE 2021: 410-422
  17. An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications. Zhenpeng Chen, Huihan Yao, Yiling Lou, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Xuanzhe Liu. ICSE 2021: 674-685
  18. White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems. Miguel Velez, Pooyan Jamshidi, Norbert Siegmund, Sven Apel, Christian Kästner. ICSE 2021: 1072-1084

FSE

  1. LAMP: data provenance for graph based machine learning algorithms through derivative computation. Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, Xiangyu Zhang. FSE 2017:786-797
  2. MODE: automated neural network model debugging via state differential analysis and input selection. Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, Ananth Grama. FSE 2018:175-186
  3. DeepStellar: model-based quantitative analysis of stateful deep learning systems. Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, Jianjun Zhao. FSE 2019:477-487
  4. Bridging the gap between ML solutions and their business requirements using feature interactions. Guy Barash, Eitan Farchi, Ilan Jayaraman, Orna Raz, Rachel Tzoref-Brill, Marcel Zalmanovici. FSE 2019:1048-1058
  5. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. Sumon Biswas, Hridesh Rajan. FSE 2020:642-653
  6. Fairway: a way to build fair ML software. Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, Tim Menzies. FSE 2020:654-665
  7. A comprehensive study on challenges in deploying deep learning based software. Zhenpeng Chen, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Tao Xie, Xuanzhe Liu. FSE 2020:750-762
  8. AMS: generating AutoML search spaces from weak specifications. José Pablo Cambronero, Jürgen Cito, Martin C. Rinard. FSE 2020:763-774
  9. Correlations between deep neural network model coverage criteria and model quality. Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, Xiangyu Zhang. FSE 2020:775-787
  10. Deep learning library testing via effective model generation. Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, Dongdi Zhang. FSE 2020:788-799
  11. DeepSearch: a simple and effective blackbox attack for deep neural networks. Fuyuan Zhang, Sankalan Pal Chowdhury, Maria Christakis. FSE 2020:800-812
  12. Detecting numerical bugs in neural network architectures. Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, Tao Xie. FSE 2020:826-837
  13. Dynamic slicing for deep neural networks. Ziqi Zhang, Yuanchun Li, Yao Guo, Xiangqun Chen, Yunxin Liu. FSE 2020:838-850
  14. Is neuron coverage a meaningful measure for testing deep neural networks? Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, Miryung Kim. FSE 2020:851-862
  15. Machine translation testing via pathological invariance. Shashij Gupta, Pinjia He, Clara Meister, Zhendong Su. FSE 2020:863-875
  16. Model-based exploration of the frontier of behaviours for deep learning system testing. Vincenzo Riccio, Paolo Tonella. FSE 2020:876-888
  17. On decomposing a deep neural network into modules. Rangeet Pan, Hridesh Rajan. FSE 2020:889-900
  18. Operational calibration: debugging confidence errors for DNNs in the field. Zenan Li, Xiaoxing Ma, Chang Xu, Jingwei Xu, Chun Cao, Jian Lu. FSE 2020:901-913
  19. A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo. Zi Peng, Jinqiu Yang, Tse-Hsun (Peter) Chen, Lei Ma. FSE 2020:1240-1250
  20. Enhancing the interoperability between deep learning frameworks by model conversion. Yu Liu, Cheng Chen, Ru Zhang, Tingting Qin, Xiang Ji, Haoxiang Lin, Mao Yang. FSE 2020:1320-1330
  21. Estimating GPU memory consumption of deep learning models. Yanjie Gao, Yu Liu, Hongyu Zhang, Zhengxian Li, Yonghao Zhu, Haoxiang Lin, Mao Yang. FSE 2020:1342-1352
  22. Reducing DNN labelling cost using surprise adequacy: an industrial case study for autonomous driving. Jinhan Kim, Jeongil Ju, Robert Feldt, Shin Yoo. FSE 2020:1466-1476
  23. Bias in machine learning software: why? how? what to do? Joymallya Chakraborty, Suvodeep Majumder, Tim Menzies. FSE 2021:429-440
  24. Validation on machine reading comprehension software without annotated labels: a property-based method. Songqiang Chen, Shuo Jin, Xiaoyuan Xie. FSE 2021:590-602
  25. FLEX: fixing flaky tests in machine learning projects by updating assertion bounds. Saikat Dutta, August Shi, Sasa Misailovic. 603-614
  26. Exposing numerical bugs in deep learning via gradient back-propagation. Ming Yan, Junjie Chen, Xiangyu Zhang, Lin Tan, Gan Wang, Zan Wang. FSE 2021:627-638
  27. Probing model signal-awareness via prediction-preserving input minimization. Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Alain Laredo, Alessandro Morari. FSE 2021: 945-955
  28. Shu Lin, Na Meng, Wenxin Li: Generating efficient solvers from constraint models. 956-967
  29. [A comprehensive study of deep learning compiler bugs.] Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, Xiang Chen. FSE 2021:968-980
  30. [Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline.] Sumon Biswas, Hridesh Rajan. FSE 2021: 981-993
  31. [Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods.] Max Hort, Jie M. Zhang, Federica Sarro, Mark Harman. FSE 2021:994-1006

ASE

  1. Automated directed fairness testing. Sakshi Udeshi, Pryanshu Arora, Sudipta Chattopadhyay. ASE 2018:98-108
  2. Concolic testing for deep neural networks. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, Daniel Kroening. ASE 2018:109-119
  3. DeepGauge: multi-granularity testing criteria for deep learning systems. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, Yadong Wang. ASE 2018:120-131
  4. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, Sarfraz Khurshid: DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. ASE 2018:132-142
  5. [AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation.] Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. ASE 2019:38-41
  6. [Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning.] Yan Zheng, Changjie Fan, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen. ASE 2020: 772-784
  7. [A Study of Oracle Approximations in Testing Deep Learning Libraries.] Mahdi Nejadgholi, Jinqiu Yang. ASE 2020: 785-796
  8. Property Inference for Deep Neural Networks. Divya Gopinath, Hayes Converse, Corina S. Pasareanu, Ankur Taly. ASE 2020: 797-809
  9. An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms. Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, Xiaohong Li. ASE 2020: 810-822
  10. Audee: Automated Testing for Deep Learning Frameworks. Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, Chao Shen. ASE 2020:486-498
  11. Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance. Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan. ASE 2020: 771-783
  12. NEURODIFF: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation. Brandon Paulsen, Jingbo Wang, Jiawei Wang, Chao Wang. ASE 2020: 784-796

TOSEM

  1. An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions. Yingzhe Lyu, Heng Li, Mohammed Sayagh, Zhen Ming (Jack) Jiang, Ahmed E. Hassan. TOSEM 2021. 54:1-54:38
  2. Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution. Malinda Dilhara, Ameya Ketkar, Danny Dig. TOSEM 2021. 55:1-55:42
  3. Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching. Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin, Hengkai Ye. TOSEM 2021. 23:1-23:31
  4. Why an Android App Is Classified as Malware: Toward Malware Classification Interpretation. Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu. TOSEM 2021.21:1-21:29

OTHERS

  1. Doing More with Less: Characterizing Dataset Downsampling for AutoML. Fatjon Zogaj, José Pablo Cambronero, Martin Rinard, Jürgen Cito. Proc. VLDB Endow. 14(11): 2059-2072 (2021)
  2. An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors Vishnu Banna, Akhil Chinnakotla, Zhengxin Yan, Anirudh Vegesana, Naveen Vivek, Kruthi Krishnappa, Wenxin Jiang, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis.
  3. Provenance in Databases: Why, How, and Where
  4. DataHub: Collaborative Data Science & Dataset Version Management at Scale
  5. Ensuring Dataset Quality for Machine Learning Certification
  6. On the experiences of adopting automated data validation in an industrial machine learning project
  7. Asset Management in Machine Learning: A Survey
  8. MSR4ML: Reconstructing Artifact Traceability in Machine Learning Repositories. Aquilas Tchanjou Njomou, Alexandra Johanne Bifona Africa, Bram Adams, Marios Fokaefs. SANER 2021: 536-540
  9. Unveiling the Mystery of API Evolution in Deep Learning Frameworks: A Case Study of Tensorflow 2. Zejun Zhang, Yanming Yang, Xin Xia, David Lo, Xiaoxue Ren, John C. Grundy. ICSE (SEIP) 2021: 238-247
  10. Underspecification Presents Challenges for Credibility in Modern Machine Learning
  11. A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim. NeurIPS 2019: 9734-9745
  12. A Data Quality-Driven View of MLOps
  13. Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities
  14. Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems
  15. Automated end-to-end management of the modeling lifecycle in deep learning
  16. Practices for Engineering Trustworthy Machine Learning Applications
  17. How are Deep Learning Models Similar?: An Empirical Study on Clone Analysis of Deep Learning Software
  18. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang. ICDE 2021: 13-24
  19. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
  20. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects Amine Barrak, Ellis E. Eghan, Bram Adams. SANER 2021: 422-433
  21. Model Assertions for Monitoring and Improving ML Models Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia. MLSys 2020
  22. TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN
  23. Quality Assurance for AI-based Systems: Overview and Challenges
  24. The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps

About

A curated list of literature on se4ai

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages