Code Deobfuscation: Release 0.2.0 of Xyntia is out


Xyntia 0.2.0 is now released. Check out its features and how you can use it for research and practical reverse engineering.

What is obfuscation and deobfuscation?

Software contains valuable assets, such as secret algorithms, business logic, or cryptographic keys, that attackers may try to retrieve. The so-called Man-At-The-End-Attacks scenario (MATE) considers the case where software users themselves are adversarial and try to extract such information from the code. Code obfuscation aims at protecting codes against such attacks, by transforming a sensitive program P into a functionally equivalent program P' that is more “difficult” to understand or modify. On the flip side, code deobfuscation aims to extract information from obfuscated codes.

Enough with theory, now to practice!

Reverse Engineering: An Example of Obfuscation

Consider the following code snippet:

int foo(int x, int x) {
    return ((((~y & x) - ~y) + -1) * 2 ^
            (((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
             (((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y))) & 0xfffffffdU) -
            (~(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
              (((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y)))) & 2U) ^
           (((((~y & x) - ~y) + -1) * 2 |
            (((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
             (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y))) &
            0xfffffffdU) -
            (~(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
              (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y)))) & 2U)) +
           (((~y & x) - ~y) + -1) * -2) * 2) +
           ((((((~y & x) - ~y) + -1) * 2 |
             (((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
              (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y))) &
             0xfffffffdU) -
             (~(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
               (((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y)))) & 2U)) +
            (((~y & x) - ~y) + -1) * -2) * 2 &
           ~((((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
              (((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y))) & 0xfffffffdU) -
             (~(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
               ((~y + (x & y)) * 2 & ~((~y & x) * 2 - (x ^ y)))) & 2U) ^
            (((~y & x) - ~y) + -1) * 2)) * -2;
}

The source code, the binary and Xyntia configuration can be found here.

The foo function is hard to read, is not it? And we are not even talking about the binary version.
It is hard because, as you may have realized, this function has been obfuscated using Mixed-Boolean-Arithmetic (MBA) encoding.
So, shoud we reverse foo by hand to understand what it is actually computing?

Not anymore! Xyntia is here to help us.

All you have to do is specify Xyntia what to reverse engineer though an extension of the Binsec scripting language.

We use the following file:

$ cat mba.ini
starting from 0x117d

explore all

hook 0x14ac with 
    sample 100 eax
    halt
end

It specifies the addresses where the analysis should start (0x117d) and end (0x14ac). It also asks Xyntia to recover 100 input-output observations for the eax output, i.e., the output of the function in x86.

From this simple script, we can run Xyntia as follows:

$ xyntia -config mba.ini -bin mba
expression: (mem_1<32> + mem_0<32>)
simplified: (mem_1<32> + mem_0<32>)
smtlib: (bvadd mem_1 mem_0)
output: eax<32>
synthesized expression size: 3
original expression size: 353
success: yes
synthesis time: 0.000024
simplification time: 0.000003
equiv: ukn

Among all the outputs, Xyntia returns the expression synthesized (line 1), a simplified version if some post-process simplifications applied (line 2) and the smtlib version of the simplified expression (line 3).

Observe the equiv: ukn line. It says that we do not know if the result returned by Xyntia is equivalent to the ground-truth. This is normal as we did not asked Xyntia to check the equivalence. To do so, run:

$ xyntia -config mba.ini -bin mba -check
expression: (mem_1<32> + mem_0<32>)
simplified: (mem_1<32> + mem_0<32>)
smtlib: (bvadd mem_1 mem_0)
output: eax<32>
synthesized expression size: 3
original expression size: 353
success: yes
synthesis time: 0.000024
simplification time: 0.000003
equiv: yes

Here, the equivalence checking proved Xyntia results to be correct. Hence, Xyntia figured out that foo computes x + y (the example is compiled in x86 32 bits, thus, mem_0 and mem_1 represent the foo arguments pushed in the stack, i.e., x and y)! With this knowledge, we could certainly go further in our reverse engineering task.

Xyntia: The Black-Box Deobfuscator

What is it?

Xyntia is a tool that aims to simplify highly obfuscated blocks of code. It is completely black-box and, as such, is not impacted by semantics-preserving transformations like MBA encoding. But how does it works? Xyntia samples input-output examples randomly to approximate code block behavior. From such I/O examples, Xyntia synthesizes a simple and understandable expression that mimics observed behaviors.

Xyntia was first introduced in our CCS’21 paper.

What is new?

Xyntia 0.2.0 provides the following new features:

  • Xyntia is now linked to Binsec and handles many architectures: x86_32/64, ARMv7/v8, RISCV and PowerPC64
  • Sampling and equivalence checking against ground truth are now integrated in Xyntia
  • Xyntia includes a Counter-Example-Guided-Synthesis (CEGIS) loop through the -cegis option
  • Xyntia can output Sygus2 problems to easily compare with other synthesizers through the -sygus option. Try it and send us feedback
  • It includes new operators and new grammars, making synthesis more flexible.

Documentation explaining how to run Xyntia over your own examples is available here.

Cannot help but to try it?

Do not hesitate to get the docker image and the source to try Xyntia out. For more information about Xyntia check the README and the paper.