BinAlign: Alignment Padding Based Compiler Provenance Recovery

Maliha Ismail, Yan Lin, DongGyun Han, Debin Gao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

523 Downloads (Pure)

Abstract

Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code.
Original languageEnglish
Title of host publicationAustralasian Conference on Information Security and Privacy
EditorsLeonie Simpson, Mir Ali Rezazadeh Baee
Place of PublicationCham
PublisherSpringer Nature Switzerland AG
Pages609-629
Number of pages21
ISBN (Electronic)978-3-031-35486-1
ISBN (Print)978-3-031-35485-4
DOIs
Publication statusE-pub ahead of print - 15 Jun 2023

Cite this