HYP3RV3LOCITY: Stack Buffer Overflow Primer: Stack and Assembly in x86

Introduction

A buffer overflow is a very well-known vulnerability that occurs when it is possible to put more data into a buffer than that buffer can hold. In the coming tutorials, we will learn about this vulnerability and how it can be exploited. In this tutorial, however, we will go over some concepts that will be necessary to know in order to understand the buffer overflow.

The Stack Structure

The stack is a data structure that holds information about functions while the program is executing. Whenever a function is called, a new stack frame is pushed onto the stack. When the function is finished executing, its stack frame is popped off the stack. Consider the following program

#include <stdio.h>

int first(int f1, int f2, int f3);
void second(int s1, char s2);

int main(){
 first(1,2,3);
 second (4,'a');
 return 0;
}

int first(int f1, int f2, int f3){
 printf("My numbers: %d, %d, and %d\n", f1,f2,f3);
 int sum = f1 + f2 +f3;
 return sum;
}

void second(int s1, char s2){
 s1++;
}

There are four functions: main(), first(), second(), and printf(). When the program first starts, there is one stack frame on the stack for main(). Then when the program gets to first(), it puts another stack frame on main()'s frame. Then when it gets to printf, it puts the another frame on top of the one for first. When printf finishes printing, the printf frame gets popped off leaving just main and first.
Below is a depiction of how the stack frames would look like during the execution of the program.

                      |printf|                                                  low addresses
                     --------
           |first|    |first|    |first|              |second|
          --------   --------   --------   --------   --------   
|main| --> |main| --> |main| --> |main| --> |main| --> |main| --> |main|        high addresses

Notice that the frames get put on top of the previous frame. This is because the stack data structure works in a First In Last Out (FILO) structure (aka Last In First Out - LIFO), so the first frame that gets pushed onto the stack (main) is the last frame to pop off the stack. The stack also grows from high addresses to low addresses. Thus, the address of main()'s frame is higher than the address of printf(), for example.

Detour into Registers and Assembly

Now that we understand how the stack changes during program execution, we now need to understand the stack frame. But before we can get into that, we need to talk about registers and the x86 instruction set.

Note: When talking about numerical values in assembly, we will always use base 16 numbers (ex: 0xAA, 0x14, 0x2FCD).

When we program, we have our high level C code, but when we compile it, it becomes machine code. Assembly is an interpretation of machine code (opcode) that makes it human readable. There are two main flavors/syntax of assembly: intel and At&T. We will use intel in this tutorial.

An assembly program is comprised of a series of instructions that the CPU executes. Each instruction is made of a mnemonic, a source operand, and a destination operand. The syntax for an instruction is

mnemonic destination_operand source_operand
  mov             eax,             0x55

Here, we are moving the value 0x55 into the register, eax.

A register is a data storage that the program uses to help execute the program. In the x86 processor, we have general, segment, status, and instruction registers.

For now, we only really need to worry about the general registers: EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI. Each of these registers have 32 bits of memory (or 4 bytes), and they can be referenced by their lower 16 bits, AX, BX, CX, DX, BP, SP, SI. So for example if EAX contains the 32 bit value: 0x12345678. Then AX would contain 0x5678.

For AX, BX, CX, and DX, we can also refer to their higher and lower 8 bits. For AX from the same example, the higher 8 bits, AH, would contain 0x56, and the lower 8 bits, AL, contain 0x78.

A register can hold both values and addresses.

  mov             eax,             0x080483fc
  mov             ebx,             [eax]

In the first line, we move the value 0x080483fc into eax. In the next line, we move the value at the address contained in eax to ebx. So if the address 0x080483fc contained the value 0x5, then we would be moving 0x5 into ebx.

For the purpose of understanding stacks, there are three registers we need to know about: EBP, ESP, and EIP.

EIP is the instruction pointer, and it contains the memory address of the next instruction that the program will execute. In other words, it points to the next instruction that we are executing. When we call a function, EIP gets the address of the function's first instruction. When we later talk about exploiting buffer overflows, our objective will be to control EIP so that we can tell the program what instructions we want it to execute.

ESP is the stack pointer, and it points to the top of the stack. Whenever we push things onto the stack, ESP decreases in value since the stack grows towards lower addresses. It points to the last thing that we have pushed onto the stack.

EBP is the base pointer, and it points to the bottom of the current stack frame.

Stack Frame, Function Prologue, Function Epilogue

Now that we understand some of the registers and assembly, we can get back to the stack frame. Right before a function is called, the function's arguments get pushed onto the stack in reverse order. So notice that first() has three parameters f1, f2, f3 with the values 1, 2, 3 respectively. These get pushed onto the stack like this:

                                               low address
 --------------------------------- 
1         <--ESP
                  2
                  3    
|main|                      <--EBP             high address
 ---------------------------------

(In these diagrams, the dotted lines are the stack frame boundaries.)

Now when the function is called, EIP first pushes its value (the address of the next instruction) onto the stack. This is called the return address since it is the address of the instruction that our program will return to executing right after we finish executing our function.

After pushing the return address, EIP will get the address of our function's instructions and start executing.

Every function, has a function prologue that occurs before the function carrys out what we've programmed it to do. The prologue pushes the current value of EBP onto the stack. Then it changes the value of EBP to point at this value at the top of the stack. This marks the beginning of the new stack frame.

                                          low address
|first| Old EBP   <-- EBP <--ESP
 ---------------------------------                      
        return address
             1
             2
             3
|main|                                    high address
 ---------------------------------

After this is done, the prologue allocates space for local variables by subtracting from ESP. Similarly, space is later freed on the stack by adding to ESP.

 ---------------------------------        low address
        | 24 bytes for     <--ESP
        |local variables
        |
|first| Old EBP           <-- EBP (doesn't change within the function)
 ---------------------------------                       
        return address
             1
             2
             3
|main|                                    high address
 ---------------------------------

So now that the function is set up, it carries out its instructions, and then when it finishes, the function goes through the function epilogue, which restores the stack to the calling frame. (In the case of first() and main(), the calling frame is main()). It set ESP equal to EBP and pops the old EBP value off the stack and sets EBP to it. The assembly for this is usually "mov esp, ebp" then "pop ebp", but the x86 instruction set allows you to use "leave", which will do the same thing.

      Old EBP  <--EBP <--ESP                                low address
 ------------------              -----------
 return address                 return address <--ESP 
         1                              1 
         2                              2
         3                              3
|main|                           |main|        <--EBP
 -----------------               -----------                high address
//set esp = ebp                  //pop ebp

After doing this, the program then pops the return address off the stack and sets EIP equal to it, which has the effect of returning execution to our main method instructions. In x86 instruction set, we simply use "ret" to do this.

When EIP returns to the instructions in main, we add to ESP in order to restore the memory we allocated when we pushed the function arguments.

low address 
------------------              -----------
         1   <--ESP                     
         2                              
         3                                       <--ESP
|main|       <--EBP                |main|        <--EBP
 -----------------               -----------                high address
//ret                           //free space by adding to ESP

x86 Assembly

Now that we understand what is going on in the stack, let's try to see the stack changes in assembly. Below is the assembly for the main() function. The semicolons and the words following are comments that I've added to make reading the disassembly easier.

hypervelocity ~ $ gcc -m32 -o stackexample stackexample.c
hypervelocity ~ $ gdb -q ./stackexample
(gdb) set disassembly-flavor intel
(gdb) disassemble main 
Dump of assembler code for function main:
   0x080483eb <+0>:  lea    ecx,[esp+0x4]
   0x080483ef <+4>:  and    esp,0xfffffff0
   0x080483f2 <+7>:  push   DWORD PTR [ecx-0x4]
   0x080483f5 <+10>: push   ebp                          ;function prologue
   0x080483f6 <+11>: mov    ebp,esp
   0x080483f8 <+13>: push   ecx                          
   0x080483f9 <+14>: sub    esp,0x4                      
   0x080483fc <+17>: sub    esp,0x4
   0x080483ff <+20>: push   0x3                          ;push arguments for first()
   0x08048401 <+22>: push   0x2
   0x08048403 <+24>: push   0x1
   0x08048405 <+26>: call   0x8048429                    ;call first
   0x0804840a <+31>: add    esp,0x10                     ;free 16 bytes
   0x0804840d <+34>: sub    esp,0x8                      ;allocate 8 bytes 
   0x08048410 <+37>: push   0x61                         ;push arguments, 0x61 = 'a' in ASCII
   0x08048412 <+39>: push   0x4                          
   0x08048414 <+41>: call   0x804845a                    ;call second
   0x08048419 <+46>: add    esp,0x10                     ;free 16 bytes
   0x0804841c <+49>: mov    eax,0x0                      
   0x08048421 <+54>: mov    ecx,DWORD PTR [ebp-0x4]
   0x08048424 <+57>: leave  
   0x08048425 <+58>: lea    esp,[ecx-0x4]
   0x08048428 <+61>: ret

At line 20 (0x080483ff), we can see the that the parameters for first() are being pushed onto the stack in reverse order. We call first() at line 26. Notice that it says call 0x8048429, which matches the address of the first instruction listed in the assembler code for first() .

The return address that we push on the stack before calling first() will be 0x0804840a since that is the address of the next instruction after our function call.

Below are the assembler codes for for first() and second().

(gdb) disassemble first
Dump of assembler code for function first:
   0x08048429 <+0>:  push   ebp                          ;function prologue
   0x0804842a <+1>:  mov    ebp,esp
   0x0804842c <+3>:  sub    esp,0x18                     ;allocate 24 bytes
   0x0804842f <+6>:  push   DWORD PTR [ebp+0x10]         ;push f1,f2,f3 parameters
   0x08048432 <+9>:  push   DWORD PTR [ebp+0xc]          ;   onto the stack for printf
   0x08048435 <+12>: push   DWORD PTR [ebp+0x8]
   0x08048438 <+15>: push   0x80484f0                    ;push the string argument for printf
   0x0804843d <+20>: call   0x80482c0 <printf@plt>       ;call printf
   0x08048442 <+25>: add    esp,0x10                     ;free 16 bytes from stack
   0x08048445 <+28>: mov    edx,DWORD PTR [ebp+0x8]      ;mov f1(1) to edx (1)
   0x08048448 <+31>: mov    eax,DWORD PTR [ebp+0xc]      ;mov f2 to eax (2)
   0x0804844b <+34>: add    edx,eax                      ;add eax to edx (edx = 3) 
   0x0804844d <+36>: mov    eax,DWORD PTR [ebp+0x10]     ;mov f3 to eax (3)
   0x08048450 <+39>: add    eax,edx                      ;add edx to eax (6)
   0x08048452 <+41>: mov    DWORD PTR [ebp-0xc],eax      ;prepare to return (6)
   0x08048455 <+44>: mov    eax,DWORD PTR [ebp-0xc]      ;  by moving into eax
   0x08048458 <+47>: leave  
   0x08048459 <+48>: ret

(gdb) disassemble second
Dump of assembler code for function second:
   0x0804845a <+0>:  push   ebp                          ;prologue
   0x0804845b <+1>:  mov    ebp,esp
   0x0804845d <+3>:  sub    esp,0x4                      ;allocate 4 bytes
   0x08048460 <+6>:  mov    eax,DWORD PTR [ebp+0xc]      ;move parameter value to eax ('a')
   0x08048463 <+9>:  mov    BYTE PTR [ebp-0x4],al        ;move al (lower 8 bits = 'a') to local var
   0x08048466 <+12>: add    DWORD PTR [ebp+0x8],0x1      ;add 1 to parameter (4) = 5
   0x0804846a <+16>: nop
   0x0804846b <+17>: leave  
   0x0804846c <+18>: ret

Notice that during the function's execution, variables are referenced relative to EBP by adding or subtracting offsets. Variables declared inside the function are EBP subtracted by some value, and the parameter values are referenced with EBP+some value. This is consistent with what we know about stack structure since parameter values are pushed first so they are at higher addresses than EBP, and local variables are allocated after EBP, so they are at lower addresses.

A DWORD is 4 bytes (or 32 bits) and it means "double word". A WORD therefore, is 2 bytes. The DWORD PTR is therefore a pointer to a 4 byte value. For example, in line 6 of the assembler code for second, the instruction is saying to move the value located at ebp+0xc into EAX. Notice the brackets around ebp+0xc, which tells us that we are using the value at the address ebp+0xc and not simply the value ebp+0xc.

Also, we know that this value will be the character 'a'. The positive offset from EBP means that it is a parameter value, and we can see in line 12, a DWORD PTR to ebp + 0x8, which we can infer is the other parameter value. Since 0xc is higher than 0x8, we know the [ebp+0xc] value was pushed on first, which means it's the second parameter of our function, char s2, which is 'a'.

Variations in Assembly

What you see when you disassemble your executable will depend on how the program was compiled. The assembly instructions you get from an executable compiled by different compilers will be different but not unrecognizably different.

For example, calling conventions, the way the stack is set up and restored for function calls, will vary depending on the compiler used. So even though the C source code of the program is the same, the assembly instructions will be different. For example, some compilers will not push parameters onto the stack but will move them instead. If you were trying to follow along on OSX, you will notice that the assembly you see is pretty different. That is because OSX uses Clang, which is a different compiler than GCC even though you might have typed gcc into the terminal.

You may have also noticed that not all the instructions in the assembly we have above 'makes sense'. For example, in lines 14 and 17 of the main function, we allocate 4 bytes on the stack. Why couldn't we have just allocated 8 bytes at one time? And after we return from first, why do we free 16 bytes then allocate 8 bytes especially when it seems that we only allocated 8 bytes before calling first? And also, in lines 41 and 44 of first(), why do we move eax to DWORD PTR [ebp-0xc] and then immediately move DWORD PTR [ebp-0xc] back to eax? These reasons for these discrepancies lie in compiler optimization. As computers advanced, people figured out ways to make compilers better in terms of making the resulting executable faster and more efficient. The drawback, however, is that the resulting code is not very intuitive. We can disable optimization by compiling with -O0 option for now, but keep in mind that executables in the 'real world' will not have optimization disabled. In fact, some will even have obfuscating, or anti-reversing, features that will make the code even more confusing. However, that is not something we need to worry about right now. So, if we were to compile without optimizations, then we would get the assembly results below.

hypervelocity ~ $ gcc -g -m32 -O0 -o stackexample stackexample.c
hypervelocity ~ $ gdb -q ./stackexample
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
   0x080483e4 <+0>:     push   ebp                        
   0x080483e5 <+1>:     mov    ebp,esp
   0x080483e7 <+3>:     and    esp,0xfffffff0
   0x080483ea <+6>:     sub    esp,0x10                   
   0x080483ed <+9>:     mov    DWORD PTR [esp+0x8],0x3    
   0x080483f5 <+17>:    mov    DWORD PTR [esp+0x4],0x2
   0x080483fd <+25>:    mov    DWORD PTR [esp],0x1
   0x08048404 <+32>:    call   0x8048424 <first>
   0x08048409 <+37>:    mov    DWORD PTR [esp+0x4],0x61
   0x08048411 <+45>:    mov    DWORD PTR [esp],0x4
   0x08048418 <+52>:    call   0x804845f <second>
   0x0804841d <+57>:    mov    eax,0x0
   0x08048422 <+62>:    leave
   0x08048423 <+63>:    ret
End of assembler dump.
(gdb) disassemble first
Dump of assembler code for function first:
   0x08048424 <+0>:     push   ebp
   0x08048425 <+1>:     mov    ebp,esp
   0x08048427 <+3>:     sub    esp,0x28
   0x0804842a <+6>:     mov    eax,0x8048550
   0x0804842f <+11>:    mov    edx,DWORD PTR [ebp+0x10]
   0x08048432 <+14>:    mov    DWORD PTR [esp+0xc],edx
   0x08048436 <+18>:    mov    edx,DWORD PTR [ebp+0xc]
   0x08048439 <+21>:    mov    DWORD PTR [esp+0x8],edx
   0x0804843d <+25>:    mov    edx,DWORD PTR [ebp+0x8]
   0x08048440 <+28>:    mov    DWORD PTR [esp+0x4],edx
   0x08048444 <+32>:    mov    DWORD PTR [esp],eax
   0x08048447 <+35>:    call   0x8048300 <printf@plt>
   0x0804844c <+40>:    mov    eax,DWORD PTR [ebp+0xc]
   0x0804844f <+43>:    mov    edx,DWORD PTR [ebp+0x8]
   0x08048452 <+46>:    add    eax,edx
   0x08048454 <+48>:    add    eax,DWORD PTR [ebp+0x10]
   0x08048457 <+51>:    mov    DWORD PTR [ebp-0xc],eax
   0x0804845a <+54>:    mov    eax,DWORD PTR [ebp-0xc]
   0x0804845d <+57>:    leave
   0x0804845e <+58>:    ret
End of assembler dump.
(gdb) disassemble second
Dump of assembler code for function second:
   0x0804845f <+0>:     push   ebp
   0x08048460 <+1>:     mov    ebp,esp
   0x08048462 <+3>:     sub    esp,0x4
   0x08048465 <+6>:     mov    eax,DWORD PTR [ebp+0xc]
   0x08048468 <+9>:     mov    BYTE PTR [ebp-0x4],al
   0x0804846b <+12>:    add    DWORD PTR [ebp+0x8],0x1
   0x0804846f <+16>:    leave
   0x08048470 <+17>:    ret
End of assembler dump.

HYP3RV3LOCITY

Thursday, July 21, 2016

Stack Buffer Overflow Primer: Stack and Assembly in x86