Practical Accuracy Estimation for Efficient Deep Neural Network Testing (ESEC/FSE 2021 - Journal First)

Who

Junjie Chen, Zhuo Wu, Zan Wang, Hanmo You, Lingming Zhang, Ming Yan

Track

ESEC/FSE 2021 Journal First

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 27 Aug 2021 11:20 - 11:30 - Testing—Testing of Machine Learning Models Chair(s): Chang Xu
Fri 27 Aug 2021 23:20 - 23:30 - Testing—Testing of Machine Learning Models Chair(s): Dan Hao

Abstract

Deep neural network (DNN) has become increasingly popular and DNN testing is very critical to guarantee the correctness of DNN, i.e., the accuracy of DNN in this work. However, DNN testing suffers from a serious efficiency problem, i.e., it is costly to label each test input to know the DNN accuracy for the testing set, since labeling each test input involves multiple persons (even with domain-specific knowledge) in a manual way and the testing set is large-scale. To relieve this problem, we propose a novel and practical approach, called PACE (which is short for \textbf{P}ractical \textbf{AC}curacy \textbf{E}stimation), which selects a small set of test inputs that can precisely estimate the accuracy of the whole testing set. In this way, the labeling costs can be largely reduced by just labeling this small set of selected test inputs. Besides achieving a precise accuracy estimation, to make PACE more practical it is also required that it is interpretable, deterministic, and as efficient as possible. Therefore, PACE first incorporates clustering to interpretably divide test inputs with different testing capabilities (i.e., testing different functionalities of a DNN model) into different groups. Then, PACE utilizes the MMD-critic algorithm, a state-of-the-art example-based explanation algorithm, to select prototypes (i.e., the most representative test inputs) from each group, according to the group sizes, which can reduce the impact of noise due to clustering. Meanwhile, PACE also borrows the idea of adaptive random testing to select test inputs from the minority space (i.e., the test inputs that are not clustered into any group) to achieve great diversity under the required number of test inputs. The two parallel selection processes (i.e., selection from both groups and the minority space) compose the final small set of selected test inputs. We conducted an extensive study to evaluate the performance of PACE based on a comprehensive benchmark (i.e., 24 pairs of DNN models and testing sets) by considering different types of models (i.e., classification and regression models, high-accuracy and low-accuracy models, and CNN and RNN models) and different types of test inputs (i.e., original, mutated, and automatically generated test inputs). The results demonstrate that PACE is able to precisely estimate the accuracy of the whole testing set with only 1.181%~2.302% deviations, on average, significantly outperforming the state-of-the-art approaches.

Junjie Chen

Tianjin University

China

Zhuo Wu

Tianjin International Engineering Institute, Tianjin University

Zan Wang

Tianjin University, China

China

Hanmo You

College of Intelligence and Computing, Tianjin University

Lingming Zhang

University of Illinois at Urbana-Champaign

United States

Ming Yan

Tianjin University

China

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 27 Aug
Displayed time zone: Athens change

11:00 - 12:00	Testing—Testing of Machine Learning ModelsResearch Papers / Journal First +12h Chair(s): Chang Xu Nanjing University

11:00 10m Paper		Validation on Machine Reading Comprehension Software without Annotated Labels: A Property-Based Method Research Papers Songqiang Chen Wuhan University, Shuo Jin Wuhan University, Xiaoyuan Xie Wuhan University DOI
11:10 10m Paper		FLEX: Fixing Flaky Tests in Machine Learning Projects by Updating Assertion Bounds Research Papers Saikat Dutta University of Illinois at Urbana-Champaign, August Shi University of Texas at Austin, Sasa Misailovic University of Illinois at Urbana-Champaign DOI
11:20 10m Paper		Practical Accuracy Estimation for Efficient Deep Neural Network Testing Journal First Junjie Chen Tianjin University, Zhuo Wu Tianjin International Engineering Institute, Tianjin University, Zan Wang Tianjin University, China, Hanmo You College of Intelligence and Computing, Tianjin University, Lingming Zhang University of Illinois at Urbana-Champaign, Ming Yan Tianjin University
11:30 30m Live Q&A		Q&A (Testing—Testing of Machine Learning Models) Research Papers

23:00 - 00:00	Testing—Testing of Machine Learning ModelsJournal First / Research Papers Chair(s): Dan Hao Peking University

23:00 10m Paper		Validation on Machine Reading Comprehension Software without Annotated Labels: A Property-Based Method Research Papers Songqiang Chen Wuhan University, Shuo Jin Wuhan University, Xiaoyuan Xie Wuhan University DOI
23:10 10m Paper		FLEX: Fixing Flaky Tests in Machine Learning Projects by Updating Assertion Bounds Research Papers Saikat Dutta University of Illinois at Urbana-Champaign, August Shi University of Texas at Austin, Sasa Misailovic University of Illinois at Urbana-Champaign DOI
23:20 10m Paper		Practical Accuracy Estimation for Efficient Deep Neural Network Testing Journal First Junjie Chen Tianjin University, Zhuo Wu Tianjin International Engineering Institute, Tianjin University, Zan Wang Tianjin University, China, Hanmo You College of Intelligence and Computing, Tianjin University, Lingming Zhang University of Illinois at Urbana-Champaign, Ming Yan Tianjin University
23:30 30m Live Q&A		Q&A (Testing—Testing of Machine Learning Models) Research Papers