Xyntia 0.2.0 is now released. Check out its features and how you can use it for research and practical reverse engineering.
What is obfuscation and deobfuscation?
Software contains valuable assets, such as secret algorithms, business logic, or cryptographic keys, that attackers may try to retrieve. The so-called Man-At-The-End-Attacks scenario (MATE) considers the case where software users themselves are adversarial and try to extract such information from the code. Code obfuscation aims at protecting codes against such attacks, by
transforming a sensitive program P
into a functionally equivalent program P'
that is more “difficult” to understand or modify. On the flip side, code deobfuscation aims to extract information from obfuscated codes.
Enough with theory, now to practice!
Reverse Engineering: An Example of Obfuscation
Consider the following code snippet:
int foo(int x, int x) {
return ((((~y & x) - ~y) + -1) * 2 ^
(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
(((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y))) & 0xfffffffdU) -
(~(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
(((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y)))) & 2U) ^
(((((~y & x) - ~y) + -1) * 2 |
(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
(((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y))) &
0xfffffffdU) -
(~(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
(((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y)))) & 2U)) +
(((~y & x) - ~y) + -1) * -2) * 2) +
((((((~y & x) - ~y) + -1) * 2 |
(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
(((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y))) &
0xfffffffdU) -
(~(((~y & x) - (~x & y) & ~(((~x | y) + x + ~y + 1) * 2)) -
(((~x | y) + x + ~y + 1) * 2 & ~((~y & x) - (~x & y)))) & 2U)) +
(((~y & x) - ~y) + -1) * -2) * 2 &
~((((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
(((x & y) + ~y) * 2 & ~((~y & x) * 2 - (x ^ y))) & 0xfffffffdU) -
(~(((~y & x) * 2 - (x ^ y) & ~(((x & y) + ~y) * 2)) -
((~y + (x & y)) * 2 & ~((~y & x) * 2 - (x ^ y)))) & 2U) ^
(((~y & x) - ~y) + -1) * 2)) * -2;
}
The source code, the binary and Xyntia configuration can be found here.
The foo
function is hard to read, is not it? And we are not even talking about the binary version.
It is hard because, as you may have realized, this function has been obfuscated using Mixed-Boolean-Arithmetic (MBA) encoding.
So, shoud we reverse foo
by hand to understand what it is actually computing?
Not anymore! Xyntia is here to help us.
All you have to do is specify Xyntia what to reverse engineer though an extension of the Binsec scripting language.
We use the following file:
$ cat mba.ini
starting from 0x117d
explore all
hook 0x14ac with
sample 100 eax
halt
end
It specifies the addresses where the analysis should start (0x117d
) and end (0x14ac
). It also asks Xyntia to recover 100
input-output observations for the eax
output, i.e., the output of the function in x86.
From this simple script, we can run Xyntia as follows:
$ xyntia -config mba.ini -bin mba
expression: (mem_1<32> + mem_0<32>)
simplified: (mem_1<32> + mem_0<32>)
smtlib: (bvadd mem_1 mem_0)
output: eax<32>
synthesized expression size: 3
original expression size: 353
success: yes
synthesis time: 0.000024
simplification time: 0.000003
equiv: ukn
Among all the outputs, Xyntia returns the expression synthesized (line 1), a simplified version if some post-process simplifications applied (line 2) and the smtlib version of the simplified expression (line 3).
Observe the equiv: ukn
line. It says that we do not know if the result returned by Xyntia is equivalent to the ground-truth.
This is normal as we did not asked Xyntia to check the equivalence. To do so, run:
$ xyntia -config mba.ini -bin mba -check
expression: (mem_1<32> + mem_0<32>)
simplified: (mem_1<32> + mem_0<32>)
smtlib: (bvadd mem_1 mem_0)
output: eax<32>
synthesized expression size: 3
original expression size: 353
success: yes
synthesis time: 0.000024
simplification time: 0.000003
equiv: yes
Here, the equivalence checking proved Xyntia results to be correct. Hence, Xyntia figured out that foo
computes x + y
(the example is compiled in x86 32 bits, thus, mem_0
and mem_1
represent the foo
arguments pushed in the stack, i.e., x
and y
)!
With this knowledge, we could certainly go further in our reverse engineering task.
Xyntia: The Black-Box Deobfuscator
What is it?
Xyntia is a tool that aims to simplify highly obfuscated blocks of code. It is completely black-box and, as such, is not impacted by semantics-preserving transformations like MBA encoding. But how does it works? Xyntia samples input-output examples randomly to approximate code block behavior. From such I/O examples, Xyntia synthesizes a simple and understandable expression that mimics observed behaviors.
Xyntia was first introduced in our CCS’21 paper.
What is new?
Xyntia 0.2.0 provides the following new features:
- Xyntia is now linked to Binsec and handles many architectures: x86_32/64, ARMv7/v8, RISCV and PowerPC64
- Sampling and equivalence checking against ground truth are now integrated in Xyntia
- Xyntia includes a Counter-Example-Guided-Synthesis (CEGIS) loop through the
-cegis
option - Xyntia can output Sygus2 problems to easily compare with other synthesizers through the
-sygus
option. Try it and send us feedback - It includes new operators and new grammars, making synthesis more flexible.
Documentation explaining how to run Xyntia over your own examples is available here.
Cannot help but to try it?
Do not hesitate to get the docker image and the source to try Xyntia out. For more information about Xyntia check the README and the paper.