Does Reusing Pre-trained NLP Model Propagate Bugs?
Fri 27 Aug 2021 05:30 - 05:40 - Student Research Competition
In this digital era, the textual content has become a seemingly ubiquitous part of our life. Natural Language Processing (NLP) empowers machines to comprehend the intricacies of textual data and eases human-computer interaction. Advancement in language modeling, continual learning, availability of a large amount of linguistic data, and large-scale computational power have made it feasible to train models for downstream tasks related to text analysis, including safety-critical ones, e.g., medical, airlines, etc. Compared to other deep learning (DL) models, NLP-based models are widely reused for various tasks. However, the reuse of pre-trained models in a new setting is still a complex task due to the limitations of the training dataset, model structure, specification, usage, etc. With this motivation, we study BERT, a vastly used language model (LM), from the direction of reusing in the code. We mined 80 posts from Stack Overflow related to BERT and found 4 types of bugs observed in clients’ code. Our results show that 13.75% are fairness, 28.75% are parameter, 15% are token, and 16.25% are version-related bugs.
Thu 26 AugDisplayed time zone: Athens change
17:00 - 18:00 | |||
17:00 10mTalk | PorkFuzz: Testing Stateful Software-Defined Network Applications with Property Graphs Student Research Competition Chaofan Shou University of California at Santa Barbara DOI | ||
17:10 10mTalk | A Qualitative Study of Cleaning in Jupyter Notebooks Student Research Competition Helen Dong Carnegie Mellon University DOI | ||
17:20 10mTalk | Contextualizing Toxicity in Open Source: A Qualitative Study Student Research Competition Sophie Cohen Wesleyan University DOI | ||
17:30 10mTalk | Does Reusing Pre-trained NLP Model Propagate Bugs? Student Research Competition Mohna Chakraborty Iowa State University DOI | ||
17:40 10mTalk | Accelerating Redundancy-Based Program Repair via Code Representation Learning and Adaptive Patch Filtering Student Research Competition Chen Yang Tianjin University DOI | ||
17:50 10mTalk | SMT Solver Testing with Type and Grammar Based Mutation Student Research Competition Jiwon Park École Polytechnique DOI |
Fri 27 AugDisplayed time zone: Athens change
05:00 - 06:00 | |||
05:00 10mTalk | PorkFuzz: Testing Stateful Software-Defined Network Applications with Property Graphs Student Research Competition Chaofan Shou University of California at Santa Barbara DOI | ||
05:10 10mTalk | A Qualitative Study of Cleaning in Jupyter Notebooks Student Research Competition Helen Dong Carnegie Mellon University DOI | ||
05:20 10mTalk | Contextualizing Toxicity in Open Source: A Qualitative Study Student Research Competition Sophie Cohen Wesleyan University DOI | ||
05:30 10mTalk | Does Reusing Pre-trained NLP Model Propagate Bugs? Student Research Competition Mohna Chakraborty Iowa State University DOI | ||
05:40 10mTalk | Accelerating Redundancy-Based Program Repair via Code Representation Learning and Adaptive Patch Filtering Student Research Competition Chen Yang Tianjin University DOI | ||
05:50 10mTalk | SMT Solver Testing with Type and Grammar Based Mutation Student Research Competition Jiwon Park École Polytechnique DOI |