DEV Community

maximilian feldthusen
maximilian feldthusen

Posted on • Edited on

Howto turn a x86 binary executable back into C source code

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 mov ebx,3 ; ebx = 3 add eax,ebx ; eax = eax + ebx sub ebx, 2 ; ebx = ebx - 2 
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 mov ebx, 1234 ; ebx = 1234 mov eax, [ebx] ; eax = *ebx mov [ebx], eax ; *ebx = eax 
Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 je label1 ; if(eax==2) goto label1 ja label2 ; if(eax>2) goto label2 jb label3 ; if(eax<2) goto label3 jbe label4 ; if(eax<=2) goto label4 jne label5 ; if(eax!=2) goto label5 jmp label6 ; unconditional goto label6 
Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi Right before leaving the function: pop esi ; restore esi ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){ return a+b } c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1 
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){ a[i]=0; } 
Enter fullscreen mode Exit fullscreen mode

becomes

 a[0]=0; a[1]=0; 
Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop: for (i = 0; i < 2; i++) { a[i] = p + q; } 
Enter fullscreen mode Exit fullscreen mode

becomes:

 temp = p + q; for (i = 0; i < 2; i++) { a[i] = temp; } 
Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 mov ebx,3 ; ebx = 3 add eax,ebx ; eax = eax + ebx sub ebx, 2 ; ebx = ebx - 2 
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 mov ebx, 1234 ; ebx = 1234 mov eax, [ebx] ; eax = *ebx mov [ebx], eax ; *ebx = eax 
Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 je label1 ; if(eax==2) goto label1 ja label2 ; if(eax>2) goto label2 jb label3 ; if(eax<2) goto label3 jbe label4 ; if(eax<=2) goto label4 jne label5 ; if(eax!=2) goto label5 jmp label6 ; unconditional goto label6 
Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi Right before leaving the function: pop esi ; restore esi ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){ return a+b } c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1 
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){ a[i]=0; } 
Enter fullscreen mode Exit fullscreen mode

becomes

 a[0]=0; a[1]=0; 
Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop: for (i = 0; i < 2; i++) { a[i] = p + q; } 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q; for (i = 0; i < 2; i++) { a[i] = temp; } 
Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

a = b + (z + 1) p = q + (z + 1) 
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1 a = b + z p = q + z 
Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5 b = a + 1 func(b) 
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9) 
Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1 if (a < 0) { printf(ERROR!) } 
Enter fullscreen mode Exit fullscreen mode

to

a = 1 
Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2 y = x * 15 
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x y = (x << 4) - x 
Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1 printf(ERROR) goto label2 l1: printf(OK) l2: return; 
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1 printf(OK) l2: return l1: printf(ERROR) goto l2 
Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi] add eax, 1 mov ebx, [edi] add ebx, 1 
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi] mov ebx, [edi] add eax, 1 add ebx, 1 a = b + (z + 1) p = q + (z + 1) 
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1 a = b + z p = q + z 
Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5 b = a + 1 func(b) 
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9) 
Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1 if (a < 0) { printf(ERROR!) } 
Enter fullscreen mode Exit fullscreen mode

to

a = 1 
Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2 y = x * 15 
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x y = (x << 4) - x 
Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1 printf(ERROR) goto label2 l1: printf(OK) l2: return; 
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1 printf(OK) l2: return l1: printf(ERROR) goto l2 
Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

  • Objective: turn a x86 binary executable back into C source code.

  • Understand how the compiler turns C into assembly code.

  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 mov ebx,3 ; ebx = 3 add eax,ebx ; eax = eax + ebx sub ebx, 2 ; ebx = ebx - 2 
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 mov ebx, 1234 ; ebx = 1234 mov eax, [ebx] ; eax = *ebx mov [ebx], eax ; *ebx = eax 
Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 je label1 ; if(eax==2) goto label1 ja label2 ; if(eax>2) goto label2 jb label3 ; if(eax<2) goto label3 jbe label4 ; if(eax<=2) goto label4 jne label5 ; if(eax!=2) goto label5 jmp label6 ; unconditional goto label6 
Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi Right before leaving the function: pop esi ; restore esi ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){ return a+b } c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1 
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){ a[i]=0; } 
Enter fullscreen mode Exit fullscreen mode
becomes a[0]=0; a[1]=0; 
Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:

for (i = 0; i < 2; i++) { a[i] = p + q; } 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q; for (i = 0; i < 2; i++) { a[i] = temp; } 
Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 mov ebx,3 ; ebx = 3 add eax,ebx ; eax = eax + ebx sub ebx, 2 ; ebx = ebx - 2 
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 mov ebx, 1234 ; ebx = 1234 mov eax, [ebx] ; eax = *ebx mov [ebx], eax ; *ebx = eax 
Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 je label1 ; if(eax==2) goto label1 ja label2 ; if(eax>2) goto label2 jb label3 ; if(eax<2) goto label3 jbe label4 ; if(eax<=2) goto label4 jne label5 ; if(eax!=2) goto label5 jmp label6 ; unconditional goto label6 
Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi Right before leaving the function: pop esi ; restore esi ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){ return a+b } c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1 
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){ a[i]=0; } 
Enter fullscreen mode Exit fullscreen mode

becomes

 a[0]=0; a[1]=0; 
Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop: for (i = 0; i < 2; i++) { a[i] = p + q; } 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q; for (i = 0; i < 2; i++) { a[i] = temp; } 
Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

a = b + (z + 1) p = q + (z + 1) 
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1 a = b + z p = q + z 
Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5 b = a + 1 func(b) 
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9) 
Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1 if (a < 0) { printf(ERROR!) } 
Enter fullscreen mode Exit fullscreen mode

to

a = 1 
Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2 y = x * 15 
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x y = (x << 4) - x 
Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1 printf(ERROR) goto label2 l1: printf(OK) l2: return; 
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1 printf(OK) l2: return l1: printf(ERROR) goto l2 
Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi] add eax, 1 mov ebx, [edi] add ebx, 1 
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi] mov ebx, [edi] add eax, 1 add ebx, 1 a = b + (z + 1) p = q + (z + 1) 
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1 a = b + z p = q + z 
Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5 b = a + 1 func(b) 
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9) 
Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1 if (a < 0) { printf(ERROR!) } 
Enter fullscreen mode Exit fullscreen mode

to

a = 1 
Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2 y = x * 15 
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x y = (x << 4) - x 
Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1 printf(ERROR) goto label2 l1: printf(OK) l2: return; 
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1 printf(OK) l2: return l1: printf(ERROR) goto l2 
Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi] add eax, 1 mov ebx, [edi] add ebx, 1 
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi] mov ebx, [edi] add eax, 1 add ebx, 1 
Enter fullscreen mode Exit fullscreen mode

Instruction scheduling

Assembly code like:

mov eax, [esi] add eax, 1 mov ebx, [edi] add ebx, 1 
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi] mov ebx, [edi] add eax, 1 add ebx, 1 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)