What is it about?
This work introduces Disa, an AI-driven system that helps computers understand compiled programs when the original source code is missing. Security analysts often need to examine binary files—for example, during malware analysis or vulnerability investigation—but these files are difficult to decode. Traditional tools rely on rigid rules and can be easily confused by compiler optimizations or code obfuscation. Disa uses machine learning to identify where functions begin, which byte sequences are real instructions, and how memory is organized inside the program. It learns by looking at many examples of instructions and how they relate to each other. A key feature of Disa is its ability to detect the boundaries of memory blocks, which makes it easier to resolve indirect calls—one of the hardest parts of reconstructing a program’s control-flow graph. In simple terms, Disa teaches computers to “read” machine code more accurately, even when the program has been deliberately obscured, enabling more reliable program analysis and security work.
Featured Image
Photo by Joshua Hoehne on Unsplash
Why is it important?
Understanding binaries is essential for malware analysis, vulnerability detection, digital forensics, and software hardening. However, existing disassemblers often produce incomplete or incorrect results, especially when facing complex real-world binaries or advanced obfuscation techniques. Disa improves disassembly accuracy in three ways: More reliable function detection – even under strong obfuscation. Better instruction identification, reducing false interpretations. First-of-its-kind memory-block boundary prediction using AI, which significantly improves indirect call resolution. By integrating with block-memory-based points-to analysis, Disa reduces unnecessary indirect call targets and produces a cleaner, more accurate control-flow graph. This makes downstream security tools stronger and reduces manual reverse engineering effort. As software grows more complex and attackers increasingly use obfuscation, a learning-based approach like Disa provides a timely and scalable solution for the future of binary analysis.
Perspectives
This project began with a practical challenge: traditional disassemblers struggle with modern optimization pipelines and advanced obfuscation, often misidentifying instructions or missing function boundaries entirely. We wanted a method that learns structural patterns directly from machine code instead of relying on brittle heuristics. A key insight was that memory structure is a rich source of information. By teaching a model to detect memory-boundary-related instructions and combining that with a lightweight value-tracking analysis, we could reconstruct memory blocks with far higher precision. This in turn transforms the accuracy of control-flow recovery. From a broader perspective, Disa demonstrates how deep learning and classical static analysis complement each other. The model handles complex pattern recognition, while the analysis grounds predictions in precise semantics. Together, they move binary understanding closer to the accuracy and robustness required in real-world security settings.
Monika Santra
Pennsylvania State University
Read the Original
This page is a summary of: Disa
: Accurate Learning-based Static Disassembly with Attentions, November 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3719027.3744828.
You can read the full text:
Contributors
The following have contributed to this page







