SPO Lab 2

For SPO lab2, our task was to familiarize ourselves with produced binaries and explore different options for code compilation.

Compiled C

Classic ‘Hello World!’ C code was used as a starting point :

#include <stdio.h>

int main() {
    printf("Hello World!\n");
}

And it was compiled using the following flags :

-g               # enable debugging information
-O0              # do not optimize (that's a capital letter and then the digit zero)
-fno-builtin     # do not use builtin function optimizations

Note that adding the flags resulted in slightly bigger binary (hello_p1 binary had all 3 flags, hello_noflags had none):

-rwxrwxr-x. 1 obelavina obelavina  8168 Sep 24 17:04 hello_noflags
-rwxrwxr-x. 1 obelavina obelavina 10600 Sep 24 17:14 hello_p1

Now we can use ‘objdump’ command that will disassemble binary into assembly instructions. Using -f option should give us some general information :

objdump -f hello_p1

This produced summary from the overall header that includes architecture and other details:

hello_p1: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000000400400

Much more useful information can be retrieved with –source flag that can insert original source code along with disassembly (this will help us navigate barely-readable disassembled binary). By searching “Hello World” or <main you can quickly navigate to the part containing user-written code:

objdump --source hello_p1
00000000004004d7 <main>:
#include <stdio.h>
 
int main() {
4004d7: 55 push %rbp
4004d8: 48 89 e5 mov %rsp,%rbp
printf("Hello World!\n");
4004db: bf 90 05 40 00 mov $0x400590,%edi
4004e0: b8 00 00 00 00 mov $0x0,%eax
4004e5: e8 06 ff ff ff callq 4003f0 <printf@plt>
4004ea: b8 00 00 00 00 mov $0x0,%eax
}
4004ef: 5d pop %rbp
4004f0: c3 retq

This part <main> appears to be within section .text which is typically used for keeping the actual code.

Now, lets introduce some changes and see how it will affect the output binary:

  1. -static

The -static option should link a program statically for us (meaning it will not require dynamic libraries at runtime). As you can see, the binary grew in size quite a bit (from 10600 to 931704 bytes) :

-rwxrwxr-x. 1 obelavina obelavina 8168 Sep 24 17:04 hello_noflags
-rwxrwxr-x. 1 obelavina obelavina 10600 Sep 24 17:14 hello_p1
-rwxrwxr-x. 1 obelavina obelavina 931704 Sep 24 18:04 hello_p1_ch1

Objdump will now produce 166494 lines of assembly as apposed to original 171!

  1. Let’s remove -fno-builtin option

Some C functions can be replaced with so-called ‘Built-in’ functions. Compiler may add such replacement in certain cases and therefore introduce some optimization.

By navigating to our main function, we can see that call to printf (callq 4003f0 <printf@plt>) was substituted with puts

4004e0: e8 0b ff ff ff callq 4003f0 <puts@plt>

Unlike printf, puts does not provide ability to format variables for output (using %s, %d etc.) which makes puts much simpler (and smaller!). Note that puts also adds newline by default (removing ‘\n’ from ‘Hello World!\n’ will result in call to printf even with built-ins enabled).

  1. Removing -g from gcc

This will remove debugging information from our binary as well as reduce size of the output file. Debug option allows you to see actual C code along with assembly. Now objdump with –source option shows only assembly code.

  1. Adding arguments to printf produces the following objdump output:
int main() {
4004d7: 55 push %rbp
4004d8: 48 89 e5 mov %rsp,%rbp
printf("Hello World!\n%d %d %d %d %d %d %d %d %d %d \n",1,2,3,4,5,6,7,8,9,10);
4004db: 48 83 ec 08 sub $0x8,%rsp
4004df: 6a 0a pushq $0xa
4004e1: 6a 09 pushq $0x9
4004e3: 6a 08 pushq $0x8
4004e5: 6a 07 pushq $0x7
4004e7: 6a 06 pushq $0x6
4004e9: 41 b9 05 00 00 00 mov $0x5,%r9d
4004ef: 41 b8 04 00 00 00 mov $0x4,%r8d
4004f5: b9 03 00 00 00 mov $0x3,%ecx
4004fa: ba 02 00 00 00 mov $0x2,%edx
4004ff: be 01 00 00 00 mov $0x1,%esi

400504: bf b0 05 40 00 mov $0x4005b0,%edi
400509: b8 00 00 00 00 mov $0x0,%eax
40050e: e8 dd fe ff ff callq 4003f0 <printf@plt>
400513: 48 83 c4 30 add $0x30,%rsp
400517: b8 00 00 00 00 mov $0x0,%eax
}

It appears that first 5 values $0x1, $0x2, $0x3, $0x4, $0x5 (1,2,3,4,5) were stored into registers %esi, %edx, %ecx, %r8d and %r9d.

Because syscall/function call can only use six registers for its arguments, the remaining numbers had to be pushed into stack.

Stack is typically used to store temporary results or values. It can also come in handy when there is a need to keep track of chained (or ‘nested’) function calls.

  1. Lets move printf() call to a separate function named output() :
#include <stdio.h>

void output() {
  printf("Hello World!\n");
}

int main() {
  output();
}

This results in the following output:

00000000004004d7 <output>:
#include <stdio.h>

void output() {
4004d7: 55 push %rbp
4004d8: 48 89 e5 mov %rsp,%rbp
printf("Hello World!\n");
4004db: bf a0 05 40 00 mov $0x4005a0,%edi
4004e0: b8 00 00 00 00 mov $0x0,%eax
4004e5: e8 06 ff ff ff callq 4003f0 <printf@plt>
}
4004ea: 90 nop
4004eb: 5d pop %rbp
4004ec: c3 retq

00000000004004ed <main>:

int main() {
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: b8 00 00 00 00 mov $0x0,%eax
4004f6: e8 dc ff ff ff callq 4004d7 <output>
4004fb: b8 00 00 00 00 mov $0x0,%eax
400500: 5d pop %rbp
400501: c3 retq
400502: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
400509: 00 00 00
40050c: 0f 1f 40 00 nopl 0x0(%rax)

It is easy to see that printf call is no longer in <main> (our <output> function is called instead).

  1. Removing -O0 and adding -O3 to the gcc options:

-O1, O2, O3 options aggregate various optimization flags (full list of flags can be found here ), -O0 offers reduced compilation time and produces expected result with little to zero optimization. Code below demonstrates differences between non-optimized and optimized binaries.

/* ===== ORIGINAL -O0 ============= */

int main() {
4004d7: 55 push %rbp
4004d8: 48 89 e5 mov %rsp,%rbp
printf("Hello World!\n");
4004db: bf 90 05 40 00 mov $0x400590,%edi
4004e0: b8 00 00 00 00 mov $0x0,%eax
4004e5: e8 06 ff ff ff callq 4003f0 <printf@plt>
4004ea: b8 00 00 00 00 mov $0x0,%eax
}
4004ef: 5d pop %rbp
4004f0: c3 retq
4004f1: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4004f8: 00 00 00
4004fb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)


/* ===== OPTIMIZED -O3 ============= */

int main() {
400400: 48 83 ec 08 sub $0x8,%rsp
printf("Hello World!\n");
400404: bf 90 05 40 00 mov $0x400590,%edi
400409: 31 c0 xor %eax,%eax
40040b: e8 e0 ff ff ff callq 4003f0 <printf@plt>
}
400410: 31 c0 xor %eax,%eax
400412: 48 83 c4 08 add $0x8,%rsp
400416: c3 retq
400417: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
40041e: 00 00

There are couple of differences related to stack pointers %rbp (points to the base of the current stack frame) and %rsp (points to the top of the current stack frame)

The stack always goes downward (so %rbp is always higher than %rsp)

These 2 lines from non-optimized binary:

4004d7: 55 push %rbp # push old base pointer into stack
4004d8: 48 89 e5 mov %rsp,%rbp # copy the value of stack pointer to base pointer

are substituted with this line

400400: 48 83 ec 08 sub $0x8,%rsp

and this line:

4004ef: 5d pop %rbp # remove old base pointer off the stack

is replace with add operation in optimized version:

400412: 48 83 c4 08 add $0x8,%rsp

Pop loads data from memory to a register and since our program doesn’t need any data from stack in this particular case, we can simply clear stack with add operation (add does not use load unit and therefore lets processor do something else with it). The same applies to push & mov and sub at the beginning of the main segment.

In x86 architecture return value is stored in %eax :

4004ea: b8 00 00 00 00 mov $0x0,%eax

This, however, is replaced with weird-looking XOR operation in optimized version :

400410: 31 c0 xor %eax,%eax

XORing value with itself yields 0. I suppose the reason why compiler replaced it is that MOV can be more expensive in certain cases than applying XOR. One noticeable difference is that MOV operation appears to be 2.5 times larger than XOR (< b8 00 00 00 00 > vs <31 c0>)

Written on September 24, 2017