Skip to main content

Symbolic execution in a nutshell

In this chapter, we are going to learn how to configure the BINSEC symbolic execution engine to explore the different behaviors of small binary programs.

Previously on Fibonacci

Last chapter was introducing the basic reverse-engineering information we can read from the tiny ELF executable fibonacci . As a reminder, the following gives you an overview of the file content as read by BINSEC.

00000000000000100000002000000030000000400000005000000060000000700000008000000090000000a0000000b0000000c0000000d0000000e0000000f000000100000001100000012000000130000001400000015000000160000001700000018000000190000001a0000001b0000001c0000001d0000001e0000001f000000200000002100000022000000230000002400000025000000260000002700000028000000290000002a0000002b0000002c0000002d0000002e0000002f0000003000000031000000320000003300000034000000350000003600000037000000380
7f45 4c46 0201 0100 0000 0000 0000 0000 0200 3e00 0100 0000 e800 4000 0000 0000 4000 0000 0000 0000 1002 0000 0000 0000 0000 0000 4000 3800 0100 4000 0600 0500 0100 0000 0500 0000 7800 0000 0000 0000 7800 4000 0000 0000 7800 4000 0000 0000 c400 0000 0000 0000 c400 0000 0000 0000 0400 0000 0000 0000 31c0 4885 ff74 0fba 0100 0000 4892 4801 d048 ffcf 75f6 c390 31ff 0fb6 0685 c074 1c83 e830 0f88 8200 0000 3c09 777e 48ff c648 8d3c bf48 01ff 4801 c7eb ddc3 6690 6668 000a 4889 e666 5abb 0a00 0000 48ff ce31 d248 f7f3 80c2 3088 1648 85c0 75ee bf01 0000 0048 89e2 4829 f289 f80f 05c3 5f66 83ff 0275 1f48 89e7 488b 7708 e895 ffff ffe8 78ff ffff e8b3 ffff ff31 ffb8 3c00 0000 0f05 bf01 0000 0048 8d34 2529 0140 00ba 1300 0000 89f8 0f05 40b7 ffeb de55 7361 6765 3a20 6669 626f 6e61 6363 6920 4e0a 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 0000 0200 0100 9000 4000 0000 0000 2600 0000 0000 0000 0e00 0000 0200 0100 b800 4000 0000 0000 3000 0000 0000 0000 1a00 0000 1000 0100 e800 4000 0000 0000 0000 0000 0000 0000 2100 0000 1200 0100 7800 4000 0000 0000 1700 0000 0000 0000 0070 6172 7365 5f75 696e 7436 3400 6563 686f 5f75 696e 7436 3400 5f73 7461 7274 0066 6962 6f6e 6163 6369 0000 2e73 796d 7461 6200 2e73 7472 7461 6200 2e73 6873 7472 7461 6200 2e74 6578 7400 2e72 6f64 6174 6100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1b00 0000 0100 0000 0600 0000 0000 0000 7800 4000 0000 0000 7800 0000 0000 0000 b100 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 2100 0000 0100 0000 0200 0000 0000 0000 2901 4000 0000 0000 2901 0000 0000 0000 1300 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 0100 0000 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000 4001 0000 0000 0000 7800 0000 0000 0000 0400 0000 0300 0000 0800 0000 0000 0000 1800 0000 0000 0000 0900 0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 b801 0000 0000 0000 2b00 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 1100 0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 e301 0000 0000 0000 2900 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000
Legend
Headers Magic
Code Instructions
Read-Only Data Strings
Data Other Sections
ELF Header:
Class:               ELF64                        
Data:                2's complement, little endian
Type:                EXEC                         
Machine:             x86                        
Entry point address: 0x4000e8                     

Section Headers:
[Nr] Name      Type     Addr             Off    Size   ES Flg Lk Inf Al
[ 0]           NULL     0000000000000000 000000 000000 00      0   0  0
[ 1] .text     PROGBITS 0000000000400078 000078 0000b1 00  AX  0   0  4
[ 2] .rodata   PROGBITS 0000000000400129 000129 000013 00   A  0   0  1
[ 3] .symtab   SYMTAB   0000000000000000 000140 000078 18      4   3  8
[ 4] .strtab   STRTAB   0000000000000000 0001b8 00002b 00      0   0  1
[ 5] .shstrtab STRTAB   0000000000000000 0001e3 000029 00      0   0  1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), G (group), T (TLS), O (extra OS processing required)

Symbol table '.symtab' contains 5 entries:
Num:            Value Size Type   Bind   Section Name        
  0: 0000000000000000    0 NOTYPE LOCAL  UND                 
  1: 0000000000400090   38 FUNC   LOCAL  .text   parse_uint64
  2: 00000000004000b8   48 FUNC   LOCAL  .text   echo_uint64
  3: 00000000004000e8    0 NOTYPE GLOBAL .text   _start    
  4: 0000000000400078   23 FUNC   GLOBAL .text   fibonacci

Concrete emulation

Putting aside the Symbolic in Symbolic execution ends up with straight (concrete) execution: not the fastest nor the most robust, but enough to play with small machine code, even if it does not match the processor of your computer.

Here, we can evaluate the computation of the Fibonacci sequence of the eponymous function.

int fibonacci(int);
info

The x86-64 calling convention states that the first (integer'like) arguments are given, in order, via the registers rdi, rsi, rdx, rcx, r8 and r9. The result is returned in the register rax.

We can feed BINSEC with the following script.

starting from <fibonacci>
rdi := 5
reach <fibonacci> return then print dec rax
Output

Here, the script is close to the natural language:

  • the first line moves the initial instruction pointer to the value of the symbol <fibonacci> (i.e. 0x400078);
  • the second line initialize the register rdi, i.e. the function argument according to the calling convention ();
  • the last line add the address of the function return (i.e. 0x4000b5) to the goal of the analysis. If the address is reach, the engine will print the value of the register rax (return value).
tip

You can adjust the value of the argument by using the slider.

Input-Output inversion

Let us reintroduce the symbolic in the equation! Instead of choosing a concrete input value for rdi, we will give some freedom to the symbolic engine. Symbolic execution is able to provide an input witness, called "model", for each program path it covers, i.e. an assignment for the symbolic variables we can feed to the program to follow the given path. Thus, it means we can query the tool to find a set of inputs that triggers a special condition.

Here, we can use the following script to find the input number which produces a given Fibonacci sequence value.

starting from <fibonacci>
rdi := nondet as n
reach <fibonacci> return such that rax = 5 then print dec n
cut at <fibonacci> return
Output

warning

A reach does not prevent the execution to go past its address. To avoid the exploration to go outside the fibonacci code, we add the cut instruction, so the execution will stop at the fibonacci return address even if the condition is not met.

note

Any non-initialized location (registers and memory) is implicitly treated as symbolic by BINSEC. Yet, you can not reliably use or constraint them in the script without a name. Thus, it is recommended to always explicitly initialize the program variables either with concrete values or symbolic declarations.

Path enumeration

Up to now, we were looking for a precise, single goal. But, we can also use the symbolic execution to just explore and report all the way to reach a given address.

Here, we will use the reach* goal which stands for reach all.

starting from <fibonacci>
rdi := nondet as n
assume 0 <= n <= 10
reach* <fibonacci> return then print dec rax
cut at <fibonacci> return
Output

note

BINSEC symbolic execution runs until either all the goals are fulfilled or there is no other valid successor state to execute. As consequences:

  • a single goal script will stop as soon as it reach its target;
  • a script without any goal will stop immediately;
  • an infeasible or infinite goal prevents the exploration to stop before its worklist exhaustion.
tip

Here, we use an assumption to reduce the valid range of value for n . It should be the preferred way to limit the search space of the analysis.

Bounded verification

Last but not least, if we are able to exhaustively explore all the paths of a program up to a given bound, we can all along verify that nothing wrong will happen under this scope.

For instance, we may wonder what is the upper limit of n for which our fibonacci computation is correct. Indeed, the fibonacci sequence grows exponentially while the implementation use fixed 64-bit values.

info

Beside the actual result, the x86 arithmetic instructions also update the condition flags, including the carry flag cf and the overflow flag of. The addition instruction add sets these flags when the, respectively unsigned and signed results do not fit in the destination operand.

starting from <fibonacci>
rdi := nondet as n
assume 0 <= n <= 10
at 0x400089 assert !of
cut at <fibonacci> return
explore all
Output

Here, we put an assertion to test the overflow flag of just after the execution of the add instruction. If BINSEC terminates its exploration without raising an alarm, we have the guaranty the no overflow will happen up to the chosen limit. Yet, starting from 93, the assertion fails, meaning the program output is no longer reliable.

tip

The explore all command ensure the exhaustive enumeration of the program paths. It makes clear the goal of the script and will never been optimized out in the future.

In this chapter, you have learned how to setup the entrypoint and the goal of the analysis, how to initialize or symbolize program variables and how to make use of assumption and assertions. It is time to put what you have just learned into practice.