Write a Blog >>
ESEC/FSE 2021
Thu 19 - Sat 28 August 2021 Clowdr Platform

Binary type inference is a critical reverse engineering task supporting many security applications, including vulnerability analysis, binary hardening, forensics, and decompilation. It is a difficult task because source-level type information is often stripped during compilation, leaving only binaries with untyped memory and register accesses. Existing approaches rely on hand-coded type inference rules defined by domain experts, which are brittle and require nontrivial effort to maintain and update. Even though machine learning approaches have shown promise at automatically learning the inference rules, their accuracy is still low, especially for optimized binaries.

We present StateFormer, a new neural architecture that is adept at accurate and robust type inference. StateFormer follows a two-step transfer learning paradigm. In the pretraining step, the model is trained with Generative State Modeling (GSM), a novel task that we design to teach the model to statically approximate execution effects of assembly instructions in both forward and backward directions. In the finetuning step, the pretrained model learns to use its knowledge of operational semantics to infer types.

We evaluate StateFormer's performance on a corpus of 33 popular open-source software projects containing over 1.67 billion variables of different types. The programs are compiled with GCC and LLVM over 4 optimization levels O0-O3, and 3 obfuscation passes based on LLVM. Our model significantly outperforms state-of-the-art ML-based tools by 14.6% in recovering types for both function arguments and variables. Our ablation studies show that GSM improves type inference accuracy by 33%.

Wed 25 Aug

Displayed time zone: Athens change

08:00 - 09:00
SE & AI—Machine Learning for Software Engineering 1Research Papers +12h
Chair(s): Michael Pradel University of Stuttgart, Ivica Crnkovic Chalmers University of Technology
08:00
10m
Paper
Boosting Coverage-Based Fault Localization via Graph-Based Representation Learning
Research Papers
Yiling Lou Purdue University, Qihao Zhu Peking University, Jinhao Dong Peking University, Xia Li Kennesaw State University, Zeyu Sun Peking University, Dan Hao Peking University, Lu Zhang Peking University, Lingming Zhang University of Illinois at Urbana-Champaign
DOI
08:10
10m
Paper
SynGuar: Guaranteeing Generalization in Programming by ExampleArtifacts AvailableArtifacts Reusable
Research Papers
Bo Wang National University of Singapore, Teodora Baluta National University of Singapore, Aashish Kolluri National University of Singapore, Prateek Saxena National University of Singapore
DOI
08:20
10m
Paper
StateFormer: Fine-Grained Type Recovery from Binaries using Generative State ModelingArtifacts AvailableArtifacts Reusable
Research Papers
Kexin Pei Columbia University, Jonas Guan University of Toronto, Matthew Broughton Columbia University, Zhongtian Chen Columbia University, Songchen Yao Columbia University, David Williams-King Columbia University, Vikas Ummadisetty Dublin High School, Junfeng Yang Columbia University, Baishakhi Ray Columbia University, Suman Jana Columbia University
DOI
08:30
30m
Live Q&A
Q&A (SE & AI—Machine Learning for Software Engineering 1)
Research Papers

20:00 - 21:00
SE & AI—Machine Learning for Software Engineering 1Research Papers
Chair(s): Kelly Lyons University of Toronto, Phuong T. Nguyen University of L’Aquila
20:00
10m
Paper
Boosting Coverage-Based Fault Localization via Graph-Based Representation Learning
Research Papers
Yiling Lou Purdue University, Qihao Zhu Peking University, Jinhao Dong Peking University, Xia Li Kennesaw State University, Zeyu Sun Peking University, Dan Hao Peking University, Lu Zhang Peking University, Lingming Zhang University of Illinois at Urbana-Champaign
DOI
20:10
10m
Paper
SynGuar: Guaranteeing Generalization in Programming by ExampleArtifacts AvailableArtifacts Reusable
Research Papers
Bo Wang National University of Singapore, Teodora Baluta National University of Singapore, Aashish Kolluri National University of Singapore, Prateek Saxena National University of Singapore
DOI
20:20
10m
Paper
StateFormer: Fine-Grained Type Recovery from Binaries using Generative State ModelingArtifacts AvailableArtifacts Reusable
Research Papers
Kexin Pei Columbia University, Jonas Guan University of Toronto, Matthew Broughton Columbia University, Zhongtian Chen Columbia University, Songchen Yao Columbia University, David Williams-King Columbia University, Vikas Ummadisetty Dublin High School, Junfeng Yang Columbia University, Baishakhi Ray Columbia University, Suman Jana Columbia University
DOI
20:30
30m
Live Q&A
Q&A (SE & AI—Machine Learning for Software Engineering 1)
Research Papers