TY - GEN
T1 - SoK
T2 - 43rd IEEE Symposium on Security and Privacy, SP 2022
AU - Liu, Zhibo
AU - Yuan, Yuanyuan
AU - Wang, Shuai
AU - Bao, Yuyan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Binary lifters convert executables into an intermediate representation (IR) of a compiler framework. The recovered IR code is generally deemed 'analysis friendly,' bridging low-level code analysis with well-established compiler infrastructures. With years of development, binary lifters are becoming increasingly popular for use in various security, systems, and software (re)-engineering applications. Recent studies have also reported highly promising results that suggest binary lifters can generate LLVM IR code with correct functionality, even for complex cases.This paper conducts an in-depth study of binary lifters from an orthogonal and highly demanding perspective. We demystify the 'expressiveness' of binary lifters, and reveal how well the lifted LLVM IR code can support critical downstream applications in security analysis scenarios. To do so, we generate two pieces of LLVM IR code by compiling C/C++ programs or by lifting the corresponding executables. We then feed these two pieces of LLVM IR code to three keystone downstream applications (pointer analysis, discriminability analysis, and decompilation) and determine whether inconsistent analysis results are generated. We study four popular static and dynamic LLVM IR lifters that were developed by the industry or academia from a total of 252,063 executables generated by various compilers and optimizations and on different architectures. Our findings show that modern binary lifters afford IR code that is highly suitable for discriminability analysis and decompilation, and suggest that such binary lifters can be applied in common similarity- or code comprehension-based security analysis (e.g., binary diffing). However, the lifted IR code appears unsuited to rigorous static analysis (e.g., pointer analysis). To obtain a more comprehensive view of the utility of binary lifters, we also compare the performance of lifter-enabled approaches with that of binary-only tools in three security tasks, i.e., sanitization, binary diffing, and C decompilation. We summarize our findings and make suggestions for the correct use and further enhancement of binary lifters. We also explored practical ways to enhance the accuracy of pointer analysis using lifted IR code, by using and augmenting Debin, a tool for predicting debug information.
AB - Binary lifters convert executables into an intermediate representation (IR) of a compiler framework. The recovered IR code is generally deemed 'analysis friendly,' bridging low-level code analysis with well-established compiler infrastructures. With years of development, binary lifters are becoming increasingly popular for use in various security, systems, and software (re)-engineering applications. Recent studies have also reported highly promising results that suggest binary lifters can generate LLVM IR code with correct functionality, even for complex cases.This paper conducts an in-depth study of binary lifters from an orthogonal and highly demanding perspective. We demystify the 'expressiveness' of binary lifters, and reveal how well the lifted LLVM IR code can support critical downstream applications in security analysis scenarios. To do so, we generate two pieces of LLVM IR code by compiling C/C++ programs or by lifting the corresponding executables. We then feed these two pieces of LLVM IR code to three keystone downstream applications (pointer analysis, discriminability analysis, and decompilation) and determine whether inconsistent analysis results are generated. We study four popular static and dynamic LLVM IR lifters that were developed by the industry or academia from a total of 252,063 executables generated by various compilers and optimizations and on different architectures. Our findings show that modern binary lifters afford IR code that is highly suitable for discriminability analysis and decompilation, and suggest that such binary lifters can be applied in common similarity- or code comprehension-based security analysis (e.g., binary diffing). However, the lifted IR code appears unsuited to rigorous static analysis (e.g., pointer analysis). To obtain a more comprehensive view of the utility of binary lifters, we also compare the performance of lifter-enabled approaches with that of binary-only tools in three security tasks, i.e., sanitization, binary diffing, and C decompilation. We summarize our findings and make suggestions for the correct use and further enhancement of binary lifters. We also explored practical ways to enhance the accuracy of pointer analysis using lifted IR code, by using and augmenting Debin, a tool for predicting debug information.
KW - reverse-engineering
KW - software-security
UR - http://www.scopus.com/inward/record.url?scp=85135909424&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135909424&partnerID=8YFLogxK
U2 - 10.1109/SP46214.2022.9833799
DO - 10.1109/SP46214.2022.9833799
M3 - Conference contribution
AN - SCOPUS:85135909424
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 1100
EP - 1119
BT - Proceedings - 43rd IEEE Symposium on Security and Privacy, SP 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 May 2022 through 26 May 2022
ER -