A curated list of literature on se4ai
- Software Engineering for AI-Enabled Systems (SE4AI) - CMU 17-445/645, Summer 2020.
- Introducing the Data Validation Tool - Google. 2021.
- Datamations: Animated Explanations of Data Analysis Pipelines - CHI 2021.
- Can AI Replace Lawyers? Researchers Say Machine Learning Can Help Predict Summary Judgment Outcomes
- Underspecification Presents Challenges in modern ML
- Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self-assessment
- The reusable holdout: Preserving validity in adaptive data analysis
- mllint — Linter for Machine Learning projects
- PE_for_ML
- Operationalizing Machine Learning
- Versioning, Provenance, and Reproducibility in Production Machine Learning
- Time Travel and Provenance for Machine Learning Pipelines
- Automating Entity Matching Model Development
- DataPrep - The easiest way to prepare data in Python
- Using PyTorch + NumPy? You're making a mistake.
- MLOps: Continuous delivery and automation pipelines in machine learning
- Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers
- Versioning ML Models & Data in Time and Space
- Overton: A Data System for Monitoring and Improving Machine-Learned Products
- Introducing Ludwig, a Code-Free Deep Learning Toolbox
- Awesome-mlops
- Git for data
- MLCommons
- Machine Learning at Industrial Scale: Lessons from the MLflow Project
- MediaPipe
- Netflix's Metaflow: Reproducible machine learning pipelines
- Traceability for Trustworthy AI: A Review of Models and Tools
- Finding duplicate images made easy!
- DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017: 1-18.
- DeepTest: automated testing of deep-neural-network-driven autonomous cars. Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray. ICSE 2018: 303-314.
- CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan. ICSE 2019: 1027-1038.
- Guiding deep learning system testing using surprise adequacy. Jinhan Kim, Robert Feldt, Shin Yoo. ICSE 2019: 1039-1049.
- Adversarial sample detection for deep neural network through model mutation testing. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, Peixin Zhang. ICSE 2019: 1245-1256.
- SLEMI: equivalence modulo input (EMI) based mutation of CPS models for finding compiler bugs in Simulink. Shafiul Azam Chowdhury, Sohil Lal Shrestha, Taylor T. Johnson, Christoph Csallner. ICSE 2020: 335-346.
- DeepBillboard: systematic physical-world testing of autonomous driving systems. Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Bei Yu, Lingming Zhang, Cong Liu. ICSE 2020: 347-358.
- Misbehaviour prediction for autonomous driving systems. Andrea Stocco, Michael Weiss, Marco Calzana, Paolo Tonella. ICSE 2020: 359-371.
- Approximation-refinement testing of compute-intensive cyber-physical models: an approach based on system identification. Claudio Menghi, Shiva Nejati, Lionel C. Briand, Yago Isasi Parache. ICSE 2020: 372-384.
- A comprehensive study of autonomous vehicle bugs. Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, Qi Alfred Chen. ICSE 2020: 385-396.
- Importance-driven deep learning system testing. Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, Alper Cakan: ICSE 2020: 702-713
- ReluDiff: differential verification of deep neural networks. Brandon Paulsen, Jingbo Wang, Chao Wang. ICSE 2020:714-726
- Dissector: input validation for deep learning applications by crossing-layer dissection. Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, Jian Lu. ICSE 2020:727-738
- Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. Xiyue Zhang, Xiaofei Xie, Lei Ma, Xiaoning Du, Qiang Hu, Yang Liu, Jianjun Zhao, Meng Sun. ICSE 2020:739-751
- White-box fairness testing through adversarial sampling. Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, Ting Dai. ICSE 2020:949-960
- Structure-invariant testing for machine translation. Pinjia He, Clara Meister, Zhendong Su. ICSE 2020:961-973
- Automatic testing and improvement of machine translation. Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang. ICSE 2020:974-985
- TRADER: trace divergence analysis and embedding regulation for debugging recurrent neural networks. Guanhong Tao, Shiqing Ma, Yingqi Liu, Qiuling Xu, Xiangyu Zhang.ICSE 2020:986-998
- Taxonomy of real faults in deep learning systems. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, Paolo Tonella. ICSE 2020:1110-1121
- Testing DNN image classifiers for confusion & bias errors.Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail E. Kaiser, Baishakhi Ray. ICSE 2020:1122-1134
- Repairing deep neural networks: fix patterns and challenges.Md Johirul Islam, Rangeet Pan, Giang Nguyen, Hridesh Rajan. ICSE 2020:1135-1146
- Fuzz testing based data augmentation to improve robustness of deep neural networks.Xiang Gao, Ripon K. Saha, Mukul R. Prasad, Abhik Roychoudhury. ICSE 2020:1147-1158. Video.
- An empirical study on program failures of deep learning jobs. Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang. ICSE 2020:1159-1170
ICSE 2021
- Are Machine Learning Cloud APIs Used Correctly? Chengcheng Wan, Shicheng Liu, Henry Hoffmann, Michael Maire, Shan Lu. ICSE 2021:125-137.
- Resource-Guided Configuration Space Reduction for Deep Learning Models. Yanjie Gao, Yonghao Zhu, Hongyu Zhang, Haoxiang Lin, Mao Yang: ICSE 2021:175-187.
- Distribution-Aware Testing of Neural Networks Using Generative Models. Swaroopa Dola, Matthew B. Dwyer, Mary Lou Soffa. ICSE 2021:226-237.
- An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems. Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja. ICSE 2021: 238-250.
- DeepLocalize: Fault Localization for Deep Neural Networks. Mohammad Wardat, Wei Le, Hridesh Rajan. ICSE 2021:251-262
- DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection. Yuanchun Li, Jiayi Hua, Haoyu Wang, Chunyang Chen, Yunxin Liu. ICSE 2021: 263-274
- Reducing DNN Properties to Enable Falsification with Adversarial Attacks. David Shriver, Sebastian G. Elbaum, Matthew B. Dwyer. ICSE 2021: 275-287
- Graph-based Fuzz Testing for Deep Learning Inference Engines. Weisi Luo, Dong Chai, Xiaoyue Run, Jiang Wang, Chunrong Fang, Zhenyu Chen. ICSE 2021: 288-299
- RobOT: Robustness-Oriented Testing for Deep Learning Systems. Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, Peng Cheng. ICSE 2021: 300-311
- Scalable Quantitative Verification For Deep Neural Networks. Teodora Baluta, Zheng Leong Chua, Kuldeep S. Meel, Prateek Saxena. ICSE 2021:312-323
- Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions. Antonio Guerriero, Roberto Pietrantuono, Stefano Russo. ICSE 2021: 348-358
- AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System. Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen. ICSE 2021: 359-371
- Self-Checking Deep Neural Networks in Deployment. Yan Xiao, Ivan Beschastnikh, David S. Rosenblum, Changsheng Sun, Sebastian G. Elbaum, Yun Lin, Jin Song Dong. ICSE 2021: 372-384
- Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models. Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu. ICSE 2021: 385-396
- Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis. Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, Wenbin Zhang. ICSE 2021:397-409
- Testing Machine Translation via Referential Transparency. Pinjia He, Clara Meister, Zhendong Su. ICSE 2021: 410-422
- An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications. Zhenpeng Chen, Huihan Yao, Yiling Lou, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Xuanzhe Liu. ICSE 2021: 674-685
- White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems. Miguel Velez, Pooyan Jamshidi, Norbert Siegmund, Sven Apel, Christian Kästner. ICSE 2021: 1072-1084
FSE
- LAMP: data provenance for graph based machine learning algorithms through derivative computation. Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, Xiangyu Zhang. FSE 2017:786-797
- MODE: automated neural network model debugging via state differential analysis and input selection. Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, Ananth Grama. FSE 2018:175-186
- DeepStellar: model-based quantitative analysis of stateful deep learning systems. Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, Jianjun Zhao. FSE 2019:477-487
- Bridging the gap between ML solutions and their business requirements using feature interactions. Guy Barash, Eitan Farchi, Ilan Jayaraman, Orna Raz, Rachel Tzoref-Brill, Marcel Zalmanovici. FSE 2019:1048-1058
- Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. Sumon Biswas, Hridesh Rajan. FSE 2020:642-653
- Fairway: a way to build fair ML software. Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, Tim Menzies. FSE 2020:654-665
- A comprehensive study on challenges in deploying deep learning based software. Zhenpeng Chen, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Tao Xie, Xuanzhe Liu. FSE 2020:750-762
- AMS: generating AutoML search spaces from weak specifications. José Pablo Cambronero, Jürgen Cito, Martin C. Rinard. FSE 2020:763-774
- Correlations between deep neural network model coverage criteria and model quality. Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, Xiangyu Zhang. FSE 2020:775-787
- Deep learning library testing via effective model generation. Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, Dongdi Zhang. FSE 2020:788-799
- DeepSearch: a simple and effective blackbox attack for deep neural networks. Fuyuan Zhang, Sankalan Pal Chowdhury, Maria Christakis. FSE 2020:800-812
- Detecting numerical bugs in neural network architectures. Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, Tao Xie. FSE 2020:826-837
- Dynamic slicing for deep neural networks. Ziqi Zhang, Yuanchun Li, Yao Guo, Xiangqun Chen, Yunxin Liu. FSE 2020:838-850
- Is neuron coverage a meaningful measure for testing deep neural networks? Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, Miryung Kim. FSE 2020:851-862
- Machine translation testing via pathological invariance. Shashij Gupta, Pinjia He, Clara Meister, Zhendong Su. FSE 2020:863-875
- Model-based exploration of the frontier of behaviours for deep learning system testing. Vincenzo Riccio, Paolo Tonella. FSE 2020:876-888
- On decomposing a deep neural network into modules. Rangeet Pan, Hridesh Rajan. FSE 2020:889-900
- Operational calibration: debugging confidence errors for DNNs in the field. Zenan Li, Xiaoxing Ma, Chang Xu, Jingwei Xu, Chun Cao, Jian Lu. FSE 2020:901-913
- A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo. Zi Peng, Jinqiu Yang, Tse-Hsun (Peter) Chen, Lei Ma. FSE 2020:1240-1250
- Enhancing the interoperability between deep learning frameworks by model conversion. Yu Liu, Cheng Chen, Ru Zhang, Tingting Qin, Xiang Ji, Haoxiang Lin, Mao Yang. FSE 2020:1320-1330
- Estimating GPU memory consumption of deep learning models. Yanjie Gao, Yu Liu, Hongyu Zhang, Zhengxian Li, Yonghao Zhu, Haoxiang Lin, Mao Yang. FSE 2020:1342-1352
- Reducing DNN labelling cost using surprise adequacy: an industrial case study for autonomous driving. Jinhan Kim, Jeongil Ju, Robert Feldt, Shin Yoo. FSE 2020:1466-1476
- Bias in machine learning software: why? how? what to do? Joymallya Chakraborty, Suvodeep Majumder, Tim Menzies. FSE 2021:429-440
- Validation on machine reading comprehension software without annotated labels: a property-based method. Songqiang Chen, Shuo Jin, Xiaoyuan Xie. FSE 2021:590-602
- FLEX: fixing flaky tests in machine learning projects by updating assertion bounds. Saikat Dutta, August Shi, Sasa Misailovic. 603-614
- Exposing numerical bugs in deep learning via gradient back-propagation. Ming Yan, Junjie Chen, Xiangyu Zhang, Lin Tan, Gan Wang, Zan Wang. FSE 2021:627-638
- Probing model signal-awareness via prediction-preserving input minimization. Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Alain Laredo, Alessandro Morari. FSE 2021: 945-955
Shu Lin, Na Meng, Wenxin Li: Generating efficient solvers from constraint models. 956-967- [A comprehensive study of deep learning compiler bugs.] Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, Xiang Chen. FSE 2021:968-980
- [Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline.] Sumon Biswas, Hridesh Rajan. FSE 2021: 981-993
- [Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods.] Max Hort, Jie M. Zhang, Federica Sarro, Mark Harman. FSE 2021:994-1006
ASE
- Automated directed fairness testing. Sakshi Udeshi, Pryanshu Arora, Sudipta Chattopadhyay. ASE 2018:98-108
- Concolic testing for deep neural networks. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, Daniel Kroening. ASE 2018:109-119
- DeepGauge: multi-granularity testing criteria for deep learning systems. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, Yadong Wang. ASE 2018:120-131
- DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, Sarfraz Khurshid: DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. ASE 2018:132-142
- [AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation.] Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. ASE 2019:38-41
- [Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning.] Yan Zheng, Changjie Fan, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen. ASE 2020: 772-784
- [A Study of Oracle Approximations in Testing Deep Learning Libraries.] Mahdi Nejadgholi, Jinqiu Yang. ASE 2020: 785-796
- Property Inference for Deep Neural Networks. Divya Gopinath, Hayes Converse, Corina S. Pasareanu, Ankur Taly. ASE 2020: 797-809
- An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms. Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, Xiaohong Li. ASE 2020: 810-822
- Audee: Automated Testing for Deep Learning Frameworks. Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, Chao Shen. ASE 2020:486-498
- Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance. Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan. ASE 2020: 771-783
- NEURODIFF: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation. Brandon Paulsen, Jingbo Wang, Jiawei Wang, Chao Wang. ASE 2020: 784-796
TOSEM
- An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions. Yingzhe Lyu, Heng Li, Mohammed Sayagh, Zhen Ming (Jack) Jiang, Ahmed E. Hassan. TOSEM 2021. 54:1-54:38
- Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution. Malinda Dilhara, Ameya Ketkar, Danny Dig. TOSEM 2021. 55:1-55:42
- Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching. Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin, Hengkai Ye. TOSEM 2021. 23:1-23:31
- Why an Android App Is Classified as Malware: Toward Malware Classification Interpretation. Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu. TOSEM 2021.21:1-21:29
OTHERS
- Doing More with Less: Characterizing Dataset Downsampling for AutoML. Fatjon Zogaj, José Pablo Cambronero, Martin Rinard, Jürgen Cito. Proc. VLDB Endow. 14(11): 2059-2072 (2021)
- An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors Vishnu Banna, Akhil Chinnakotla, Zhengxin Yan, Anirudh Vegesana, Naveen Vivek, Kruthi Krishnappa, Wenxin Jiang, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis.
- Provenance in Databases: Why, How, and Where
- DataHub: Collaborative Data Science & Dataset Version Management at Scale
- Ensuring Dataset Quality for Machine Learning Certification
- On the experiences of adopting automated data validation in an industrial machine learning project
- Asset Management in Machine Learning: A Survey
- MSR4ML: Reconstructing Artifact Traceability in Machine Learning Repositories. Aquilas Tchanjou Njomou, Alexandra Johanne Bifona Africa, Bram Adams, Marios Fokaefs. SANER 2021: 536-540
- Unveiling the Mystery of API Evolution in Deep Learning Frameworks: A Case Study of Tensorflow 2. Zejun Zhang, Yanming Yang, Xin Xia, David Lo, Xiaoxue Ren, John C. Grundy. ICSE (SEIP) 2021: 238-247
- Underspecification Presents Challenges for Credibility in Modern Machine Learning
- A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim. NeurIPS 2019: 9734-9745
- A Data Quality-Driven View of MLOps
- Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities
- Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems
- Automated end-to-end management of the modeling lifecycle in deep learning
- Practices for Engineering Trustworthy Machine Learning Applications
- How are Deep Learning Models Similar?: An Empirical Study on Clone Analysis of Deep Learning Software
- CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang. ICDE 2021: 13-24
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
- On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects Amine Barrak, Ellis E. Eghan, Bram Adams. SANER 2021: 422-433
- Model Assertions for Monitoring and Improving ML Models Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia. MLSys 2020
- TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN
- Quality Assurance for AI-based Systems: Overview and Challenges
- The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps