FLEX: Fixing Flaky Tests in Machine Learning Projects by Updating Assertion Bounds (ESEC/FSE 2021 - Research Papers)

Who

Saikat Dutta, August Shi, Sasa Misailovic

Track

ESEC/FSE 2021 Research Papers

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 27 Aug 2021 11:10 - 11:20 - Testing—Testing of Machine Learning Models Chair(s): Chang Xu
Fri 27 Aug 2021 23:10 - 23:20 - Testing—Testing of Machine Learning Models Chair(s): Dan Hao

Abstract

Many machine learning (ML) algorithms are inherently random – multiple executions using the same inputs may produce slightly different results each time. Randomness impacts how developers write
tests that check for end-to-end quality of their implementations of these ML algorithms. In particular, selecting the proper thresholds for comparing obtained quality metrics with the reference results is a non-intuitive task, which may lead to flaky test executions.

We present FLEX, the first tool for automatically fixing flaky tests due to algorithmic randomness in ML algorithms. FLEX fixes tests that use approximate assertions to compare actual and expected
values that represent the quality of the outputs of ML algorithms. We present a technique for systematically identifying the acceptable bound between the actual and expected output quality that
also minimizes flakiness. Our technique is based on the Peak Over Threshold method from statistical Extreme Value Theory, which estimates the tail distribution of the output values observed from several runs. Based on the tail distribution, FLEX updates the bound used in the test, or selects the number of test re-runs, based on a desired confidence level.

We evaluate FLEX on a corpus of 35 tests collected from the latest versions of 21 ML projects. Overall, FLEX identifies and proposes a fix for 28 tests. We sent 19 pull requests, each fixing one test, to the developers. So far, 9 have been accepted by the developers.

DOI

https://doi.org/10.1145/3468264.3468615

Saikat Dutta

University of Illinois at Urbana-Champaign

United States

August Shi

University of Texas at Austin

United States

Sasa Misailovic

University of Illinois at Urbana-Champaign

United States

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 27 Aug
Displayed time zone: Athens change

11:00 - 12:00	Testing—Testing of Machine Learning ModelsResearch Papers / Journal First +12h Chair(s): Chang Xu Nanjing University

11:00 10m Paper		Validation on Machine Reading Comprehension Software without Annotated Labels: A Property-Based Method Research Papers Songqiang Chen Wuhan University, Shuo Jin Wuhan University, Xiaoyuan Xie Wuhan University DOI
11:10 10m Paper		FLEX: Fixing Flaky Tests in Machine Learning Projects by Updating Assertion Bounds Research Papers Saikat Dutta University of Illinois at Urbana-Champaign, August Shi University of Texas at Austin, Sasa Misailovic University of Illinois at Urbana-Champaign DOI
11:20 10m Paper		Practical Accuracy Estimation for Efficient Deep Neural Network Testing Journal First Junjie Chen Tianjin University, Zhuo Wu Tianjin International Engineering Institute, Tianjin University, Zan Wang Tianjin University, China, Hanmo You College of Intelligence and Computing, Tianjin University, Lingming Zhang University of Illinois at Urbana-Champaign, Ming Yan Tianjin University
11:30 30m Live Q&A		Q&A (Testing—Testing of Machine Learning Models) Research Papers

23:00 - 00:00	Testing—Testing of Machine Learning ModelsJournal First / Research Papers Chair(s): Dan Hao Peking University

23:00 10m Paper		Validation on Machine Reading Comprehension Software without Annotated Labels: A Property-Based Method Research Papers Songqiang Chen Wuhan University, Shuo Jin Wuhan University, Xiaoyuan Xie Wuhan University DOI
23:10 10m Paper		FLEX: Fixing Flaky Tests in Machine Learning Projects by Updating Assertion Bounds Research Papers Saikat Dutta University of Illinois at Urbana-Champaign, August Shi University of Texas at Austin, Sasa Misailovic University of Illinois at Urbana-Champaign DOI
23:20 10m Paper		Practical Accuracy Estimation for Efficient Deep Neural Network Testing Journal First Junjie Chen Tianjin University, Zhuo Wu Tianjin International Engineering Institute, Tianjin University, Zan Wang Tianjin University, China, Hanmo You College of Intelligence and Computing, Tianjin University, Lingming Zhang University of Illinois at Urbana-Champaign, Ming Yan Tianjin University
23:30 30m Live Q&A		Q&A (Testing—Testing of Machine Learning Models) Research Papers