Semantic Bug Seeding: A Learning-Based Approach for Creating Realistic Bugs (ESEC/FSE 2021 - Research Papers)

Who

Jibesh Patra, Michael Pradel

Track

ESEC/FSE 2021 Research Papers

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 25 Aug 2021 16:10 - 16:20 - Testing—Bug Characterization and Fixing Chair(s): Myra Cohen
Thu 26 Aug 2021 04:10 - 04:20 - Testing—Bug Characterization and Fixing Chair(s): Abhik Roychoudhury, Akond Rahman

Abstract

When working on techniques to address the wide-spread problem of software bugs, one often faces the need for a large number of realistic bugs in real-world programs. Such bugs can either help evaluate an approach, e.g., in form of a bug benchmark or a suite of program mutations, or even help build the technique, e.g., in learning-based bug detection. Because gathering a large number of real bugs is difficult, a common approach is to rely on automatically seeded bugs. Prior work seeds bugs based on syntactic transformation patterns, which often results in unrealistic bugs and typically cannot introduce new, application-specific code tokens.

This paper presents SemSeed, a technique for automatically seeding bugs in a semantics-aware way. The key idea is to imitate how a given real-world bug would look like in other programs by semantically adapting the bug pattern to the local context. To reason about the semantics of pieces of code, our approach builds on learned token embeddings that encode the semantic similarities of identifiers and literals. Our evaluation with real-world JavaScript software shows that the approach effectively reproduces real bugs and clearly outperforms a semantics-unaware approach. The seeded bugs are useful as training data for learning-based bug detection, where they significantly improve the bug detection ability. Moreover, we show that SemSeed-created bugs complement existing mutation testing operators, and that our approach is efficient enough to seed hundreds of thousands of bugs within an hour.

Link to Preprint

https://software-lab.org/publications/fse2021.pdf

DOI

https://doi.org/10.1145/3468264.3468623

Jibesh Patra

University of Stuttgart

Germany

Michael Pradel

University of Stuttgart

Germany

Code and data