Code Deobfuscation: Release 0.1.0 of Xyntia is out

The Xyntia black-box deobfuscator is now released. Check out its features and how you can use it for research and practical reverse engineering.

What is obfuscation and deobfuscation?

Software contains valuable assets, such as secret algorithms, business logic, or cryptographic keys, that attackers may try to retrieve. The so-called Man-At-The-End-Attacks scenario (MATE) considers the case where software users themselves are adversarial and try to extract such information from the code. Code obfuscation aims at protecting codes against such attacks, by transforming a sensitive program P into a functionally equivalent program P' that is more “difficult” to understand or modify. On the flip side, code deobfuscation aims to extract information from obfuscated codes.

Enough with theory, now to practice!

Reverse Engineering: An Example of Obfuscation

Consider the following code snippet.

int foo(int x, int x) {
    return ((((~y & x) - ~y) + -1) * 2 ^
            (((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
             (((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y))) & 0xfffffffdU) -
            (~(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
              (((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y)))) & 2U) ^
           (((((~y & x) - ~y) + -1) * 2 |
            (((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
             (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y))) &
            0xfffffffdU) -
            (~(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
              (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y)))) & 2U)) +
           (((~y & x) - ~y) + -1) * -2) * 2) +
           ((((((~y & x) - ~y) + -1) * 2 |
             (((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
              (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y))) &
             0xfffffffdU) -
             (~(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
               (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y)))) & 2U)) +
            (((~y & x) - ~y) + -1) * -2) * 2 &
           ~((((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
              (((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y))) & 0xfffffffdU) -
             (~(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
               ((~y + (x & y)) * 2 & ~((~y & x) * 2 - (x ^ y)))) & 2U) ^
            (((~y & x) - ~y) + -1) * 2)) * -2;
}

Hard to read, is not it? And we are not even talking about the binary version.
It is hard because, as you may have realized, this function has been obfuscated using Mixed-Boolean-Arithmetic (MBA) encoding.
So, do we have to reverse it by hand to understand what foo is actually computing?

Not anymore! Xyntia is here to help us.

Let us just run the following command on the compiled version of foo.

$ python3 sample.py --bin <foo_basic_block>.bin --reg_out eax --out sample_res
$ xyntia sample_res/0.json
expression: (mem_0<32> + (1 * mem_1<32>))
simplified: (mem_0<32> + mem_1<32>)
smtlib: (bvadd mem_0<32> mem_1<32>)
output: eax_1662<32>
success: yes
synthesis time: 0.002572
simplification time: 0.000042

It only takes a fraction of a second for Xyntia to figure out that foo computes x + y (the example is compiled in x86 32 bits, thus, mem_0 and mem_1 represent the foo arguments pushed in the stack, i.e., x and y)! With this knowledge, we could certainly go further in our reverse engineering task.

Xyntia: The Black-Box Deobfuscator

What is it?

Xyntia is a tool that aims to simplify highly obfuscated blocks of code. It is completely black-box and, as such, is not impacted by semantics-preserving transformations like MBA encoding. But how does it works? Xyntia samples input-output examples randomly to approximate code block behavior. From such I/O examples, Xyntia synthesizes a simple and understandable expression that mimics observed behaviors.

In our CCS’21 paper, we show that Xyntia is fast, precise, and robust. Moreover, we propose the first protections against black-box deobfuscation.

What it ships

Xyntia 0.1.0 enables the following features:

Proposes to use different S-metaheuristics to guide synthesis
Gives all scripts to replicate experiments from the paper
Gives scripts relying on GDB and Binsec to trace code execution, extract the semantics of each code block, sample and synthesize them
Handles 8, 16, 32, and 64 bits inputs (registers and memory reads)

Documentation explaining how to run Xyntia over your own examples is available here.

Cannot help but to try it?

Do not hesitate to get the docker image and the source to try Xyntia out. For more information about Xyntia check the README and the paper.