Researchers at GW have developed a system that can accurately detect binary code similarity in the context of computer vulnerability analysis. The system utilizes a graph neural network model to generate representative embeddings for provenance identification amongst various binary codes extant in the art, to improve the accuracy associated with the detection of vulnerabilities that may be present in binary codes. The system particularly creates a representation of a binary code during run-time, called an attributed function call graph (AFCG), in order to accurately differentiate between closely related executables and which problem has hitherto been left unresolved.
The disclosed invention can be implemented as a system, or a method as can be appreciated. The system or the method can include (i) a provenance identification module that can effectively and accurately identify various binary codes that may be used by utilizing AFCG and which covers three types of features including idiom features at the instruction level, graphlet features at the function level, and function call graph at the binary level; (ii) a generation module that can generate respective binary code representations in the form of AFCG. In one embodiment, the AFCG is capable if achieving 96.1% accuracy on the publicly available datasets of more than 6,000 binaries. In another embodiment, when applied for binary vulnerability detection, the invention can help improve the top-1 hit rate of three recent code vulnerability detection methods by up to 27%.
Fig. 1 – One embodiment of the disclosed invention
Applications:
Advantages: