How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization) (ESEC/FSE 2021 - Journal First)

Who

Rui Shu, Tianpei Xia, Jianfeng Chen , Laurie Williams, Tim Menzies

Track

ESEC/FSE 2021 Journal First

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 27 Aug 2021 17:10 - 17:20 - Dependability—Software Security 2 Chair(s): Vaggelis Atlidakis
Sat 28 Aug 2021 05:10 - 05:20 - Dependability—Software Security 2 Chair(s): Arie Gurfinkel

Abstract

In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort. The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports. Our proposed method, called SWIFT, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, SWIFT uses a technique called epsilon-dominance that learns how to avoid operations that do not significantly improve performance.

When compared to recent state-of-the-art results (from FARSEC which is published in TSE’18), we find that the SWIFT’s dual optimization of both preprocessor and learner is more useful than optimizing each of them individually. For example, in a study of security bug reports from the Chromium dataset, the median recalls of FARSEC and SWIFT were 15.7% and 77.4%, respectively. For another example, in experiments with data from the Ambari project, the median recalls improved from 21.5% to 85.7% (FARSEC to SWIFT). Overall, our approach can quickly optimize models that achieve better recalls than the prior state-of-the-art. These increases in recall are associated with moderate increases in false positive rates (from 8% to 24%, median). For future work, these results suggest that dual optimization is both practical and useful.

Rui Shu

North Carolina State University

United States

Tianpei Xia

North Carolina State University

United States

Jianfeng Chen

North Carolina State University

Laurie Williams

North Carolina State University

United States

Tim Menzies

North Carolina State University

United States

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 27 Aug
Displayed time zone: Athens change

17:00 - 18:00	Dependability—Software Security 2Research Papers / Industry Papers / Journal First +12h Chair(s): Vaggelis Atlidakis Brown University

17:00 10m Paper		TaintStream: Fine-Grained Taint Tracking for Big Data Platforms through Dynamic Code Translation Research Papers Chengxu Yang Peking University, Yuanchun Li Microsoft Research, Mengwei Xu Beijing University of Posts and Telecommunications, Zhenpeng Chen Peking University, Yunxin Liu Tsinghua University, Gang Huang Peking University, Xuanzhe Liu Peking University DOI Pre-print
17:10 10m Paper		How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization) Journal First Rui Shu North Carolina State University, Tianpei Xia North Carolina State University, Jianfeng Chen North Carolina State University, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University
17:20 10m Paper		A Comprehensive Study on Learning-Based PE Malware Family Classification Methods Industry Papers Yixuan Ma State Key Laboratory of Communication Content Cognition; Tianjin University, Shuang Liu Tianjin University, Jiajun Jiang Tianjin University, Guanhong Chen Tianjin University, Keqiu Li Tianjin University DOI
17:30 30m Live Q&A		Q&A (Dependability—Software Security 2) Research Papers

Sat 28 Aug
Displayed time zone: Athens change

05:00 - 06:00	Dependability—Software Security 2Research Papers / Industry Papers / Journal First Chair(s): Arie Gurfinkel University of Waterloo

05:00 10m Paper		TaintStream: Fine-Grained Taint Tracking for Big Data Platforms through Dynamic Code Translation Research Papers Chengxu Yang Peking University, Yuanchun Li Microsoft Research, Mengwei Xu Beijing University of Posts and Telecommunications, Zhenpeng Chen Peking University, Yunxin Liu Tsinghua University, Gang Huang Peking University, Xuanzhe Liu Peking University DOI Pre-print
05:10 10m Paper		How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization) Journal First Rui Shu North Carolina State University, Tianpei Xia North Carolina State University, Jianfeng Chen North Carolina State University, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University
05:20 10m Paper		A Comprehensive Study on Learning-Based PE Malware Family Classification Methods Industry Papers Yixuan Ma State Key Laboratory of Communication Content Cognition; Tianjin University, Shuang Liu Tianjin University, Jiajun Jiang Tianjin University, Guanhong Chen Tianjin University, Keqiu Li Tianjin University DOI
05:30 30m Live Q&A		Q&A (Dependability—Software Security 2) Research Papers