Does Reusing Pre-trained NLP Model Propagate Bugs? (ESEC/FSE 2021 - Student Research Competition)

Track

ESEC/FSE 2021 Student Research Competition

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 26 Aug 2021 17:30 - 17:40 - Student Research Competition
Fri 27 Aug 2021 05:30 - 05:40 - Student Research Competition

Abstract

In this digital era, the textual content has become a seemingly ubiquitous part of our life. Natural Language Processing (NLP) empowers machines to comprehend the intricacies of textual data and eases human-computer interaction. Advancement in language modeling, continual learning, availability of a large amount of linguistic data, and large-scale computational power have made it feasible to train models for downstream tasks related to text analysis, including safety-critical ones, e.g., medical, airlines, etc. Compared to other deep learning (DL) models, NLP-based models are widely reused for various tasks. However, the reuse of pre-trained models in a new setting is still a complex task due to the limitations of the training dataset, model structure, specification, usage, etc. With this motivation, we study BERT, a vastly used language model (LM), from the direction of reusing in the code. We mined 80 posts from Stack Overflow related to BERT and found 4 types of bugs observed in clients’ code. Our results show that 13.75% are fairness, 28.75% are parameter, 15% are token, and 16.25% are version-related bugs.

DOI

https://doi.org/10.1145/3468264.3473494

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 26 Aug
Displayed time zone: Athens change

17:00 - 18:00	Student Research CompetitionStudent Research Competition +12h

17:00 10m Talk		PorkFuzz: Testing Stateful Software-Defined Network Applications with Property Graphs Student Research Competition Chaofan Shou University of California at Santa Barbara DOI
17:10 10m Talk		A Qualitative Study of Cleaning in Jupyter Notebooks Student Research Competition Helen Dong Carnegie Mellon University DOI
17:20 10m Talk		Contextualizing Toxicity in Open Source: A Qualitative Study Student Research Competition Sophie Cohen Wesleyan University DOI
17:30 10m Talk		Does Reusing Pre-trained NLP Model Propagate Bugs? Student Research Competition Mohna Chakraborty Iowa State University DOI
17:40 10m Talk		Accelerating Redundancy-Based Program Repair via Code Representation Learning and Adaptive Patch Filtering Student Research Competition Chen Yang Tianjin University DOI
17:50 10m Talk		SMT Solver Testing with Type and Grammar Based Mutation Student Research Competition Jiwon Park École Polytechnique DOI

Fri 27 Aug
Displayed time zone: Athens change

05:00 - 06:00	Student Research CompetitionStudent Research Competition

05:00 10m Talk		PorkFuzz: Testing Stateful Software-Defined Network Applications with Property Graphs Student Research Competition Chaofan Shou University of California at Santa Barbara DOI
05:10 10m Talk		A Qualitative Study of Cleaning in Jupyter Notebooks Student Research Competition Helen Dong Carnegie Mellon University DOI
05:20 10m Talk		Contextualizing Toxicity in Open Source: A Qualitative Study Student Research Competition Sophie Cohen Wesleyan University DOI
05:30 10m Talk		Does Reusing Pre-trained NLP Model Propagate Bugs? Student Research Competition Mohna Chakraborty Iowa State University DOI
05:40 10m Talk		Accelerating Redundancy-Based Program Repair via Code Representation Learning and Adaptive Patch Filtering Student Research Competition Chen Yang Tianjin University DOI
05:50 10m Talk		SMT Solver Testing with Type and Grammar Based Mutation Student Research Competition Jiwon Park École Polytechnique DOI

Does Reusing Pre-trained NLP Model Propagate Bugs?

Program Display Configuration

Program Display Configuration

Thu 26 AugDisplayed time zone: Athens change

Fri 27 AugDisplayed time zone: Athens change

Mohna Chakraborty

Iowa State University

Thu 26 Aug
Displayed time zone: Athens change

Fri 27 Aug
Displayed time zone: Athens change