awesome-se4ai

A curated list of literature on se4ai

Web Resources:

Papers:

DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017: 1-18.
DeepTest: automated testing of deep-neural-network-driven autonomous cars. Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray. ICSE 2018: 303-314.
CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan. ICSE 2019: 1027-1038.
Guiding deep learning system testing using surprise adequacy. Jinhan Kim, Robert Feldt, Shin Yoo. ICSE 2019: 1039-1049.
Adversarial sample detection for deep neural network through model mutation testing. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, Peixin Zhang. ICSE 2019: 1245-1256.
SLEMI: equivalence modulo input (EMI) based mutation of CPS models for finding compiler bugs in Simulink. Shafiul Azam Chowdhury, Sohil Lal Shrestha, Taylor T. Johnson, Christoph Csallner. ICSE 2020: 335-346.
DeepBillboard: systematic physical-world testing of autonomous driving systems. Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Bei Yu, Lingming Zhang, Cong Liu. ICSE 2020: 347-358.
Misbehaviour prediction for autonomous driving systems. Andrea Stocco, Michael Weiss, Marco Calzana, Paolo Tonella. ICSE 2020: 359-371.
Approximation-refinement testing of compute-intensive cyber-physical models: an approach based on system identification. Claudio Menghi, Shiva Nejati, Lionel C. Briand, Yago Isasi Parache. ICSE 2020: 372-384.
A comprehensive study of autonomous vehicle bugs. Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, Qi Alfred Chen. ICSE 2020: 385-396.
Importance-driven deep learning system testing. Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, Alper Cakan: ICSE 2020: 702-713
ReluDiff: differential verification of deep neural networks. Brandon Paulsen, Jingbo Wang, Chao Wang. ICSE 2020:714-726
Dissector: input validation for deep learning applications by crossing-layer dissection. Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, Jian Lu. ICSE 2020:727-738
Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. Xiyue Zhang, Xiaofei Xie, Lei Ma, Xiaoning Du, Qiang Hu, Yang Liu, Jianjun Zhao, Meng Sun. ICSE 2020:739-751
White-box fairness testing through adversarial sampling. Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, Ting Dai. ICSE 2020:949-960
Structure-invariant testing for machine translation. Pinjia He, Clara Meister, Zhendong Su. ICSE 2020:961-973
Automatic testing and improvement of machine translation. Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang. ICSE 2020:974-985
TRADER: trace divergence analysis and embedding regulation for debugging recurrent neural networks. Guanhong Tao, Shiqing Ma, Yingqi Liu, Qiuling Xu, Xiangyu Zhang.ICSE 2020:986-998
Taxonomy of real faults in deep learning systems. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, Paolo Tonella. ICSE 2020:1110-1121
Testing DNN image classifiers for confusion & bias errors.Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail E. Kaiser, Baishakhi Ray. ICSE 2020:1122-1134
Repairing deep neural networks: fix patterns and challenges.Md Johirul Islam, Rangeet Pan, Giang Nguyen, Hridesh Rajan. ICSE 2020:1135-1146
Fuzz testing based data augmentation to improve robustness of deep neural networks.Xiang Gao, Ripon K. Saha, Mukul R. Prasad, Abhik Roychoudhury. ICSE 2020:1147-1158. Video.
An empirical study on program failures of deep learning jobs. Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang. ICSE 2020:1159-1170

ICSE 2021

Are Machine Learning Cloud APIs Used Correctly? Chengcheng Wan, Shicheng Liu, Henry Hoffmann, Michael Maire, Shan Lu. ICSE 2021:125-137.
Resource-Guided Configuration Space Reduction for Deep Learning Models. Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang: ICSE 2021:175-187.
Distribution-Aware Testing of Neural Networks Using Generative Models. Swaroopa Dola, Matthew B. Dwyer, Mary Lou Soffa. ICSE 2021:226-237.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems. Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja. ICSE 2021: 238-250.
DeepLocalize: Fault Localization for Deep Neural Networks. Mohammad Wardat, Wei Le, Hridesh Rajan. ICSE 2021:251-262
DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection. Yuanchun Li, Jiayi Hua, Haoyu Wang, Chunyang Chen, Yunxin Liu. ICSE 2021: 263-274
Reducing DNN Properties to Enable Falsification with Adversarial Attacks. David Shriver, Sebastian G. Elbaum, Matthew B. Dwyer. ICSE 2021: 275-287
Graph-based Fuzz Testing for Deep Learning Inference Engines. Weisi Luo, Dong Chai, Xiaoyue Run, Jiang Wang, Chunrong Fang, Zhenyu Chen. ICSE 2021: 288-299
RobOT: Robustness-Oriented Testing for Deep Learning Systems. Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, Peng Cheng. ICSE 2021: 300-311
Scalable Quantitative Verification For Deep Neural Networks. Teodora Baluta, Zheng Leong Chua, Kuldeep S. Meel, Prateek Saxena. ICSE 2021:312-323
Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions. Antonio Guerriero, Roberto Pietrantuono, Stefano Russo. ICSE 2021: 348-358
AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System. Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen. ICSE 2021: 359-371
Self-Checking Deep Neural Networks in Deployment. Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian G. Elbaum, Yun Lin, Jin Song Dong. ICSE 2021: 372-384
Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models. Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu. ICSE 2021: 385-396
Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis. Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang. ICSE 2021:397-409
Testing Machine Translation via Referential Transparency. Pinjia He, Clara Meister, Zhendong Su. ICSE 2021: 410-422
An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications. Zhenpeng Chen, Huihan Yao, Yiling Lou, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Xuanzhe Liu. ICSE 2021: 674-685
White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems. Miguel Velez, Pooyan Jamshidi, Norbert Siegmund, Sven Apel, Christian Kästner. ICSE 2021: 1072-1084

FSE

LAMP: data provenance for graph based machine learning algorithms through derivative computation. Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, Xiangyu Zhang. FSE 2017:786-797
MODE: automated neural network model debugging via state differential analysis and input selection. Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, Ananth Grama. FSE 2018:175-186
DeepStellar: model-based quantitative analysis of stateful deep learning systems. Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, Jianjun Zhao. FSE 2019:477-487
Bridging the gap between ML solutions and their business requirements using feature interactions. Guy Barash, Eitan Farchi, Ilan Jayaraman, Orna Raz, Rachel Tzoref-Brill, Marcel Zalmanovici. FSE 2019:1048-1058
Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. Sumon Biswas, Hridesh Rajan. FSE 2020:642-653
Fairway: a way to build fair ML software. Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, Tim Menzies. FSE 2020:654-665
A comprehensive study on challenges in deploying deep learning based software. Zhenpeng Chen, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Tao Xie, Xuanzhe Liu. FSE 2020:750-762
AMS: generating AutoML search spaces from weak specifications. José Pablo Cambronero, Jürgen Cito, Martin C. Rinard. FSE 2020:763-774
Correlations between deep neural network model coverage criteria and model quality. Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, Xiangyu Zhang. FSE 2020:775-787
Deep learning library testing via effective model generation. Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, Dongdi Zhang. FSE 2020:788-799
DeepSearch: a simple and effective blackbox attack for deep neural networks. Fuyuan Zhang, Sankalan Pal Chowdhury, Maria Christakis. FSE 2020:800-812
Detecting numerical bugs in neural network architectures. Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, Tao Xie. FSE 2020:826-837
Dynamic slicing for deep neural networks. Ziqi Zhang, Yuanchun Li, Yao Guo, Xiangqun Chen, Yunxin Liu. FSE 2020:838-850
Is neuron coverage a meaningful measure for testing deep neural networks? Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, Miryung Kim. FSE 2020:851-862
Machine translation testing via pathological invariance. Shashij Gupta, Pinjia He, Clara Meister, Zhendong Su. FSE 2020:863-875
Model-based exploration of the frontier of behaviours for deep learning system testing. Vincenzo Riccio, Paolo Tonella. FSE 2020:876-888
On decomposing a deep neural network into modules. Rangeet Pan, Hridesh Rajan. FSE 2020:889-900
Operational calibration: debugging confidence errors for DNNs in the field. Zenan Li, Xiaoxing Ma, Chang Xu, Jingwei Xu, Chun Cao, Jian Lu. FSE 2020:901-913
A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo. Zi Peng, Jinqiu Yang, Tse-Hsun (Peter) Chen, Lei Ma. FSE 2020:1240-1250
Enhancing the interoperability between deep learning frameworks by model conversion. Yu Liu, Cheng Chen, Ru Zhang, Tingting Qin, Xiang Ji, Haoxiang Lin, Mao Yang. FSE 2020:1320-1330
Estimating GPU memory consumption of deep learning models. Yanjie Gao, Yu Liu, Hongyu Zhang, Zhengxian Li, Yonghao Zhu, Haoxiang Lin, Mao Yang. FSE 2020:1342-1352
Reducing DNN labelling cost using surprise adequacy: an industrial case study for autonomous driving. Jinhan Kim, Jeongil Ju, Robert Feldt, Shin Yoo. FSE 2020:1466-1476
Bias in machine learning software: why? how? what to do? Joymallya Chakraborty, Suvodeep Majumder, Tim Menzies. FSE 2021:429-440
Validation on machine reading comprehension software without annotated labels: a property-based method. Songqiang Chen, Shuo Jin, Xiaoyuan Xie. FSE 2021:590-602
FLEX: fixing flaky tests in machine learning projects by updating assertion bounds. Saikat Dutta, August Shi, Sasa Misailovic. 603-614
Exposing numerical bugs in deep learning via gradient back-propagation. Ming Yan, Junjie Chen, Xiangyu Zhang, Lin Tan, Gan Wang, Zan Wang. FSE 2021:627-638
Probing model signal-awareness via prediction-preserving input minimization. Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Alain Laredo, Alessandro Morari. FSE 2021: 945-955
~~Shu Lin, Na Meng, Wenxin Li: Generating efficient solvers from constraint models. 956-967~~
[A comprehensive study of deep learning compiler bugs.] Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, Xiang Chen. FSE 2021:968-980
[Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline.] Sumon Biswas, Hridesh Rajan. FSE 2021: 981-993
[Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods.] Max Hort, Jie M. Zhang, Federica Sarro, Mark Harman. FSE 2021:994-1006

ASE

Automated directed fairness testing. Sakshi Udeshi, Pryanshu Arora, Sudipta Chattopadhyay. ASE 2018:98-108
Concolic testing for deep neural networks. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, Daniel Kroening. ASE 2018:109-119
DeepGauge: multi-granularity testing criteria for deep learning systems. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, Yadong Wang. ASE 2018:120-131
DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, Sarfraz Khurshid: DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. ASE 2018:132-142
[AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation.] Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. ASE 2019:38-41
[Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning.] Yan Zheng, Changjie Fan, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen. ASE 2020: 772-784
[A Study of Oracle Approximations in Testing Deep Learning Libraries.] Mahdi Nejadgholi, Jinqiu Yang. ASE 2020: 785-796
Property Inference for Deep Neural Networks. Divya Gopinath, Hayes Converse, Corina S. Pasareanu, Ankur Taly. ASE 2020: 797-809
An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms. Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, Xiaohong Li. ASE 2020: 810-822
Audee: Automated Testing for Deep Learning Frameworks. Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, Chao Shen. ASE 2020:486-498
Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance. Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan. ASE 2020: 771-783
NEURODIFF: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation. Brandon Paulsen, Jingbo Wang, Jiawei Wang, Chao Wang. ASE 2020: 784-796

TOSEM

An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions. Yingzhe Lyu, Heng Li, Mohammed Sayagh, Zhen Ming (Jack) Jiang, Ahmed E. Hassan. TOSEM 2021. 54:1-54:38
Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution. Malinda Dilhara, Ameya Ketkar, Danny Dig. TOSEM 2021. 55:1-55:42
Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching. Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin, Hengkai Ye. TOSEM 2021. 23:1-23:31
Why an Android App Is Classified as Malware: Toward Malware Classification Interpretation. Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu. TOSEM 2021.21:1-21:29

OTHERS

Doing More with Less: Characterizing Dataset Downsampling for AutoML. Fatjon Zogaj, José Pablo Cambronero, Martin Rinard, Jürgen Cito. Proc. VLDB Endow. 14(11): 2059-2072 (2021)
An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors Vishnu Banna, Akhil Chinnakotla, Zhengxin Yan, Anirudh Vegesana, Naveen Vivek, Kruthi Krishnappa, Wenxin Jiang, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis.
Provenance in Databases: Why, How, and Where
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Ensuring Dataset Quality for Machine Learning Certification
On the experiences of adopting automated data validation in an industrial machine learning project
Asset Management in Machine Learning: A Survey
MSR4ML: Reconstructing Artifact Traceability in Machine Learning Repositories. Aquilas Tchanjou Njomou, Alexandra Johanne Bifona Africa, Bram Adams, Marios Fokaefs. SANER 2021: 536-540
Unveiling the Mystery of API Evolution in Deep Learning Frameworks: A Case Study of Tensorflow 2. Zejun Zhang, Yanming Yang, Xin Xia, David Lo, Xiaoxue Ren, John C. Grundy. ICSE (SEIP) 2021: 238-247
Underspecification Presents Challenges for Credibility in Modern Machine Learning
A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim. NeurIPS 2019: 9734-9745
A Data Quality-Driven View of MLOps
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities
Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems
Automated end-to-end management of the modeling lifecycle in deep learning
Practices for Engineering Trustworthy Machine Learning Applications
How are Deep Learning Models Similar?: An Empirical Study on Clone Analysis of Deep Learning Software
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang. ICDE 2021: 13-24
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects Amine Barrak, Ellis E. Eghan, Bram Adams. SANER 2021: 422-433
Model Assertions for Monitoring and Improving ML Models Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia. MLSys 2020
TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN
Quality Assurance for AI-based Systems: Overview and Challenges
The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-se4ai

Web Resources:

Papers:

About

Releases

Packages

Languages

nemo9cby/awesome-se4ai

Folders and files

Latest commit

History

Repository files navigation

awesome-se4ai

Web Resources:

Papers:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages