Table of Contents

Refresher. Assembly Language

Resources

Session slides

Session's tutorials and challenges archive

Session's solutions

Tutorials

This session will serve as a quick refresher of basic computer architecture and assembly language. For the sake of brevity, we are going to focus on x86. Also, people are generally more familiar with this one.

First, we'll go through some very general computer architecture topics. In order to get started with the tools and learn how to easily get from assembly to a running binary, we'll continue with dissecting a short hello-world program. The core of this session is a general reference of assembly language. At the end we will also dive a little bit into some operating system internals and check out some tricks that might be useful later.

Let's get our hands dirty!

Computer Architecture: A Blistering Approach

A microprocessor executes, one by one, logical, arithmetic, control, and input/output (I/O) operations specified by the instructions of a computer program that was previously loaded in the system's memory. An instruction is just a set of bytes that specify the operation or opcode (e.g., addition, multiplication, memory read/write) and the operands (e.g. numbers, memory locations). The list of supported operations is specified by an Instruction Set Architecture (ISA). ISAs can be classified into types such as CISC, RISC, VLIW and others. Particular processors implement this specification in different ways - this is called a microarchitecture, and allows the same program to be compatible with processors produced by different vendors. For example, both Intel 80386, and AMD K7 Athlon implement the same x86 ISA. Moreover, newer ISAs tend to be backward-compatible with older ones (e.g., x86 is still supported on newer 64-bit ISAs).

An x86 program operates with and on data stored in memory along with the program itself. Besides the memory, the processor also contains a set of registers that can hold a very limited number of values for fast access. Both the memory and the registers can be referenced in an instruction as operands.

An x86 instruction in machine code might look like this:

NASM syntax: add dword [0xdeadbeef], 42
        hex:  8    3     0    5     e    f    b    e    a    d    d    e     2    a
     binary: [1000 0011][0000 0101][1110 1111 1011 1110 1010 1101 1101 1110][0010 1010]
             |          |          |                                        \- immediate: 42
             |          |          \- memory address: 0xdeadbeef (note the endianness)
             |          \- opcode modifiers:
             |               2 bits = addressing mode
             |               3 bits = register/opcode modifier
             |               3 bits = r/m field
             \- opcode: add sign-extended 8-bits immediate to register, or 32-bits memory address
Useful references:

The complete hierarchy of memory in a modern computer is depicted in the following picture. The ISA is only interested in accessing the registers, and the RAM memory. The processor-level caching is invisible while the lower levels (below RAM) are managed by the operating system and accessed via system calls.

This being said, even the RAM memory is not directly accessible from a normal (i.e. in protected mode - see Basics section) program. The operating system, with support from the processor, will provide the same virtual address space to all programs but map each program to different physical sections of the RAM. Using the same mechanism, memory contents can also be spilled to disk and accessed on-demand (see swapping/paging).

This Operating Systems lecture (Romanian) covers deep details regarding virtual memory.

Hello (Assembly) World

We can get right down to business and see what happens when we compile a very simple program written in C.

#include <stdio.h>
 
int main() {
  puts("Hello world!");
  return 0;
}

You can compile this with gcc -m32 -O0 hello.c -o hello. Let's take a sneak peek at the assembly generated by the GCC compiler for this basic program: objdump -M intel -d hello. We can see it looks kind of complicated and we hope you'll be able to understand every piece of it by the end of this course but, for now, let's see what bits are actually needed and write our own minimal version directly in assembly. We are going to talk more in later sessions about topics such as disassembling, executable sections, linking, reverse engineering and static analysis.

For our minimal version we need an executable that will contain 2 kinds of information:

We also need to call puts() which is a library function. This function is already assembled in an object file and sits in the libc library. It can be used by any program running on the system in 2 ways:

We are going to use the NASM assembler to convert the following mnemonics into an actual object file containing machine code.

You can find documentation (syntax, command line options, etc.) on NASM here.
extern puts
section .data
  helloStr: db 'Hello, world!',0
section .text
  global main
main:
  push helloStr
  call puts

To assemble this run: nasm -f elf32 hello.asm. This will produce an object file that we can inspect with objdump.

$ objdump -M intel -d hello.o
 
hello.o:     file format elf32-i386
 
 
Disassembly of section .text:
 
00000000 <main>:
   0:	68 00 00 00 00       	push   0x0
   5:	e8 fc ff ff ff       	call   6 <main+0x6>

As we can see, there is no reference to the puts() function but it is present in the relocation records that will be used by the linker.

$ objdump -M intel -r hello.o
 
hello.o:     file format elf32-i386
 
RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE 
00000001 R_386_32          .data
00000006 R_386_PC32        puts

To dynamically link our object file with libc we can use ld.

$ ld -s -lc -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -e main hello.o -o hello_min

You can spend a few minutes and figure out what all those options do. The most important are: -lc and -e main.

The -dynamic-linker option should not be necessary but, at least on the system used to write this session, ld could not find the correct linker.

The disassembly of the final binary also contains some code that will find the puts() function at runtime. We will learn more about the .plt section in the following sessions.

$ objdump -M intel -d hello_min
 
hello_min:     file format elf32-i386
 
 
Disassembly of section .plt:
 
08048170 <puts@plt-0x10>:
 8048170:	ff 35 40 92 04 08    	push   DWORD PTR ds:0x8049240
 8048176:	ff 25 44 92 04 08    	jmp    DWORD PTR ds:0x8049244
 804817c:	00 00                	add    BYTE PTR [eax],al
	...
 
08048180 <puts@plt>:
 8048180:	ff 25 48 92 04 08    	jmp    DWORD PTR ds:0x8049248
 8048186:	68 00 00 00 00       	push   0x0
 804818b:	e9 e0 ff ff ff       	jmp    8048170 <puts@plt-0x10>
 
Disassembly of section .text:
 
08048190 <.text>:
 8048190:	68 4c 92 04 08       	push   0x804924c
 8048195:	e8 e6 ff ff ff       	call   8048180 <puts@plt>
This binary is missing the initialization and clean-up phases. Because nobody calls exit() at the end, it will throw a segmentation fault after running our code. This should be handled by __libc_start_main which is part of the Linux Standard Base Core Specification.

Basics

As new versions of the x86 processors appeared, new features where introduced and, in order to maintain backward compatibility, the processors had to provide different operation modes. For example, the original 8086 allowed access to 1MB of memory, with no protection and no support for virtual memory, thus newer versions (80286, 80386) were introduced and had to be switched to protected mode which overcame the limitations of the older real mode. Other, even newer processors, also introduced the virtual 8086 mode, and the long mode. All x86 processors start in real mode and most operating systems (e.g. Linux) will switch to 80386 protected mode at boot time.

While in protected mode, an x86 processor has access to 8 32-bit general registers (depicted below), 6 segment registers (cs, ds, ss, es, fs, gs), 1 status register (eflags), an instruction pointer (eip), as well as other control, debug, and test registers. The segment registers usually point to the same address in modern operating systems, which use paging - they where initially used for the segmentation mechanism. Other registers were also added by different extensions to the processor (e.g. SSE, MMX). The 32-bit registers have names that start with “e” but their 16-bit and 8-bit versions are still accessible via special names, as described in the picture.

While they can be used to store any value, the 8 general registers are commonly used as follows:

For convenience while working with complex data structures (e.g., structs in an array), x86 ISA offers multiple addressing modes. The most simple one, direct addressing, you only need to specify an absolute value, while other modes compute the absolute address based on some registers. All addressing modes supported in 32-bit protected mode are summarized by this formula:

In Intel syntax, the previous formula translates to:

mov eax, [0xcafebab3]         ; direct (displacement)
mov eax, [esi]                ; register indirect (base)
mov eax, [ebp-8]              ; based (base + displacement)
mov eax, [ebx*4 + 0xdeadbeef] ; indexed (index*scale + displacement)
mov eax, [edx + ebx + 12]     ; based-indexed w/o scale (base + index + displacement)
mov eax, [edx + ebx*4 + 42]   ; based-indexed w/ scale (base + index*scale + displacement)
An effective address is an operand that references a memory location. In NASM (Intel) syntax this consists of an expression enclosed in square brackets.
Sometimes the assembler cannot infer the size of operands. This ambiguity can be removed by the programmer by using size specifications resulting in instructions similar to mov word ax, 0x42 (in NASM syntax). This syntax is usually assembler-specific. Some discussions regarding this can be found here, and on StackOverflow.

Data Transfer

Data transfer instructions move bytes between memory-register, register-register, and register-memory. Memory to memory data transfers are not possible. The most common such instructions are:

The lea instruction represents <src> with square brackets, but it only computes the address and DOES NOT read the contents at that address, as mov does. Example: lea ebx, [ebx*8+ebx].

The following two instructions are equivalent:
lea eax, [ebx]
mov eax, ebx

xchg eax, eax is equivalent with nop, which is an instruction that does nothing.

Control Flow

As a program executes, the address of the next instruction is stored in the eip register. Changing the value of this register allows control of the execution flow. Instructions directly influencing eip are:

Interrupts also change the execution flow of a program. Operation Systems lecture (Romanian) treating this subject.

Conditional jumps act in the same way as jmp, but they require different combinations of flags to be set in the eflags register. Flags are set by arithmetic (e.g., add, sub), logical (e.g., xor, or), or comparison (cmp, test) instructions. The most common flags are:

Conditional jumps reference.

Arithmetic/Logic

Arithmetic instructions (NASM/Intel syntax):

Shifts and rotations: shr, shl (logical shift right/left), sar, sal (arithmetic shift right/left), shld, shrd (double-shift), ror, rol (rotate), rcr, rcl (rotate with carry).

Logical instructions: and, or, xor, not.

Function Calls

Function (subroutines) calls are nothing more that a convention on how parameters are passed, how the return value is passed back to the caller, and how the registers can be modified by the callee. The addresses to which a function needs to return after execution are stored in a stack data structure. Other values such as frame base pointer, and the functions local variables are also placed on the stack. Each function will thus have a corresponding stack frame that it allocates immediately after it is called (function prologue), and deallocates just before returning (function epilogue). The size of this allocation (changing the esp register) is establishes at compile time, and its based on the size of the function's local variables.

The stack frames are usually aligned to 16 bytes (or 8 bytes) boundaries. This is required by standards in order to accommodate some vectorized SSE instructions which will fail if non-aligned addresses are used.

While esp points to the top of the stack, ebp points tot the beginning of the current frame, and it's previous values are also saved on the stack. This is used to conveniently navigate the call stack in debuggers, and to address local variables from a fixed address (esp might change during the function's execution, by calling push and pop for example).

There are multiple calling conventions mainly classified by who (caller or callee) is responsible for cleaning the parameters after the function finished. The most common are:

Following the same calling convention is important in situations such as calling a function from a dynamic library that was precompiled, and is already present on the system.

The following subsections will show the previous conventions, for C code, in real-life. Take a few minutes to dissect and understand the snippets. You can also try to reproduce the example on your machine.

The default convention used by GCC is cdecl. Using the stdcall or fastcall function attributes will force GCC to use the specified convention.

cdecl

struct x {
    int x1, x2;
    char x3;
};
 
int func(struct x a, float b, void* c, int d) {
    return 42;
}
 
int main() {
    struct x a;
    a.x3 = '$';
    func(a, 3.14, (void*)0xdeadbeef, 1);
    return 0;
}
$ gcc -O0 -m32 -no-pie cdecl.c -o cdecl
$ objdump -M intel -d ./cdecl
080483ef <main>:
...
 8048403:       6a 01                   push   0x1
 8048405:       68 ef be ad de          push   0xdeadbeef
 804840a:       d9 80 c0 e4 ff ff       fld    DWORD PTR [eax-0x1b40]
 8048410:       8d 64 24 fc             lea    esp,[esp-0x4]
 8048414:       d9 1c 24                fstp   DWORD PTR [esp]
 8048417:       ff 75 fc                push   DWORD PTR [ebp-0x4]
 804841a:       ff 75 f8                push   DWORD PTR [ebp-0x8]
 804841d:       ff 75 f4                push   DWORD PTR [ebp-0xc]
 8048420:       e8 b6 ff ff ff          call   80483db <func>
 8048425:       83 c4 18                add    esp,0x18
...
080483db <func>:
 80483db:       55                      push   ebp
 80483dc:       89 e5                   mov    ebp,esp
 80483de:       e8 4c 00 00 00          call   804842f <__x86.get_pc_thunk.ax>
 80483e3:       05 1d 1c 00 00          add    eax,0x1c1d
 80483e8:       b8 2a 00 00 00          mov    eax,0x2a
 80483ed:       5d                      pop    ebp
 80483ee:       c3
As you can see the arguments are added on the stack by the caller function(main), using multiple push instructions, and are removed also by the caller function, using a single add esp,0x18 instruction.

stdcall

struct x {
    int x1, x2;
    char x3;
};
 
__attribute__((stdcall))
int func(struct x a, float b, void* c, int d) {
    return 42;
}
 
int main() {
    struct x a;
    a.x3 = '$';
    func(a, 3.14, (void*)0xdeadbeef, 1);
    return 0;
}
$ gcc -O0 -m32 -no-pie stdcall.c -o stdcall
$ objdump -M intel -d ./stdcall
080483f1 <main>:
...
 8048405:       6a 01                   push   0x1
 8048407:       68 ef be ad de          push   0xdeadbeef
 804840c:       d9 80 c0 e4 ff ff       fld    DWORD PTR [eax-0x1b40]
 8048412:       8d 64 24 fc             lea    esp,[esp-0x4]
 8048416:       d9 1c 24                fstp   DWORD PTR [esp]
 8048419:       ff 75 fc                push   DWORD PTR [ebp-0x4]
 804841c:       ff 75 f8                push   DWORD PTR [ebp-0x8]
 804841f:       ff 75 f4                push   DWORD PTR [ebp-0xc]
 8048422:       e8 b4 ff ff ff          call   80483db <func>
...
080483db <func>:
 80483db:       55                      push   ebp
 80483dc:       89 e5                   mov    ebp,esp
 80483de:       e8 4b 00 00 00          call   804842e <__x86.get_pc_thunk.ax>
 80483e3:       05 1d 1c 00 00          add    eax,0x1c1d
 80483e8:       b8 2a 00 00 00          mov    eax,0x2a
 80483ed:       5d                      pop    ebp
 80483ee:       c2 18 00                ret    0x18
As you can see in the output, the arguments are added on the stack by the caller(main) and are removed by the callee(func) using ret 0x18 intruction.

fastcall

__attribute__((fastcall))
int func(int a, int b, int c, int d) {
    return 42;
}
 
int main() {
    func(1, 2, 3, 4);
    return 0;
}
$ gcc -O0 -m32 -no-pie fastcall.c -o fastcall
$ objdump -M intel -d ./fastcall
080483fa <main>:
 80483fa:       55                      push   ebp
 80483fb:       89 e5                   mov    ebp,esp
 80483fd:       e8 1f 00 00 00          call   8048421 <__x86.get_pc_thunk.ax>
 8048402:       05 fe 1b 00 00          add    eax,0x1bfe
 8048407:       6a 04                   push   0x4
 8048409:       6a 03                   push   0x3
 804840b:       ba 02 00 00 00          mov    edx,0x2
 8048410:       b9 01 00 00 00          mov    ecx,0x1
 8048415:       e8 c1 ff ff ff          call   80483db <func>
...
080483db <func>:
 80483db:       55                      push   ebp
 80483dc:       89 e5                   mov    ebp,esp
 80483de:       83 ec 08                sub    esp,0x8
 80483e1:       e8 3b 00 00 00          call   8048421 <__x86.get_pc_thunk.ax>
 80483e6:       05 1a 1c 00 00          add    eax,0x1c1a
 80483eb:       89 4d fc                mov    DWORD PTR [ebp-0x4],ecx
 80483ee:       89 55 f8                mov    DWORD PTR [ebp-0x8],edx
 80483f1:       b8 2a 00 00 00          mov    eax,0x2a
 80483f6:       c9                      leave  
 80483f7:       c2 08 00                ret    0x8
The first two arguments are moved into registers and the rest are pushed on the stack by the caller(main). All arguments from the stack are being removed by the callee, using ret 0x8 instruction. Other compilers might use more registers as arguments.

System calls

Syscalls are the interface that allows user applications to request services from the OS kernel, such as reading the disk, starting new processes, or managing existing ones. Just like function calls, syscalls are just a set of conventions on how to pass arguments to a kernel function. The mechanism is invoked by triggering an interrupt (int 0x80) which will call the kernel's syscall dispatcher, which, in turn, will call the syscall based on the eax register. The conventions for invoking a syscall on Linux are:

Syscalls are not usually invoked directly, but through wrappers in libc. You can read about how this is implemented in this LWN article.
Other useful references:

Compiler Patterns

In the end, let's take a look at some common C language constructs, and how they are compiled into machine code by GCC. You are encouraged to try other constructs too.

Compiler Explorer

You can try out the Compiler explorer at http://gcc.godbolt.org/ to see how each line is translated into instructions. Check this example out: http://goo.gl/gVeH5p

function prologue

 80483ed:	55                   	push   ebp
 80483ee:	89 e5                	mov    ebp,esp
 80483f0:	83 ec 08             	sub    esp,0x8

function epiloque

 80483fe:	c9                   	leave  
 80483ff:	c2 08 00             	ret    0x8

for loop

int main() {
    int x = 1000;
    for (int i = 1; i < 10; i++) {
        x++;
    }
    return 0;
}
$ gcc -O0 -m32 for.c -o for
$ objdump -M intel -d ./for
080483ed <main>:
...
 80483f3:	c7 45 f8 e8 03 00 00 	mov    DWORD PTR [ebp-0x8],0x3e8
 80483fa:	c7 45 fc 01 00 00 00 	mov    DWORD PTR [ebp-0x4],0x1
 8048401:	eb 08                	jmp    804840b <main+0x1e>
 8048403:	83 45 f8 01          	add    DWORD PTR [ebp-0x8],0x1
 8048407:	83 45 fc 01          	add    DWORD PTR [ebp-0x4],0x1
 804840b:	83 7d fc 09          	cmp    DWORD PTR [ebp-0x4],0x9
 804840f:	7e f2                	jle    8048403 <main+0x16>
...

while loop

int main() {
    int x = 1000, i = 42;
    while (--i > 0) {
        x--;
    }
    return 0;
}
$ gcc -O0 -m32 while.c -o while
$ objdump -M intel -d ./while
080483ed <main>:
...
 80483f3:	c7 45 f8 e8 03 00 00 	mov    DWORD PTR [ebp-0x8],0x3e8
 80483fa:	c7 45 fc 2a 00 00 00 	mov    DWORD PTR [ebp-0x4],0x2a
 8048401:	eb 04                	jmp    8048407 <main+0x1a>
 8048403:	83 6d f8 01          	sub    DWORD PTR [ebp-0x8],0x1
 8048407:	83 6d fc 01          	sub    DWORD PTR [ebp-0x4],0x1
 804840b:	83 7d fc 00          	cmp    DWORD PTR [ebp-0x4],0x0
 804840f:	7f f2                	jg     8048403 <main+0x16>
...

nested fors with break and continue

int main() {
    int x = 1000, i, j;
    for (i = 1; i < 10; i++) {
        for (j = 1; j < 4; j++) {
            if (x == 42)
                break;
        }
        if (i == 3)
            continue;
    }
    return 0;
}
$ gcc -O0 -m32 nested.c -o nested
$ objdump -M intel -d ./nested
080483ed <main>:
...
 80483f3:	c7 45 fc e8 03 00 00 	mov    DWORD PTR [ebp-0x4],0x3e8
 80483fa:	c7 45 f4 01 00 00 00 	mov    DWORD PTR [ebp-0xc],0x1
 8048401:	eb 26                	jmp    8048429 <main+0x3c>
 8048403:	c7 45 f8 01 00 00 00 	mov    DWORD PTR [ebp-0x8],0x1
 804840a:	eb 0c                	jmp    8048418 <main+0x2b>
 804840c:	83 7d fc 2a          	cmp    DWORD PTR [ebp-0x4],0x2a
 8048410:	75 02                	jne    8048414 <main+0x27>
 8048412:	eb 0a                	jmp    804841e <main+0x31>
 8048414:	83 45 f8 01          	add    DWORD PTR [ebp-0x8],0x1
 8048418:	83 7d f8 03          	cmp    DWORD PTR [ebp-0x8],0x3
 804841c:	7e ee                	jle    804840c <main+0x1f>
 804841e:	83 7d f4 03          	cmp    DWORD PTR [ebp-0xc],0x3
 8048422:	75 01                	jne    8048425 <main+0x38>
 8048424:	90                   	nop
 8048425:	83 45 f4 01          	add    DWORD PTR [ebp-0xc],0x1
 8048429:	83 7d f4 09          	cmp    DWORD PTR [ebp-0xc],0x9
 804842d:	7e d4                	jle    8048403 <main+0x16>
...

Challenges

01. Execve

Simple printing

Use assembly to write a program that receives N command line parameters. If the 1st parameter starts with . (dot) (such as ./ping 8.8.8.8) the program prints the message FAILED. If the first parameter doesn't start with . (dot) (such as /bin/ping 8.8.8.8) the program prints the message WORKS.

You can find the skeleton for this task in 01-challenge-execve/src.
GCC will take care of the boilerplate that actually places the command line parameters on the stack before calling your main().
$ ./execve ./ping 8.8.8.8 => prints FAILED message
$ ./execve /bin/ping 8.8.8.8 => prints WORKS message

Simple syscall

Update the above program and use assembly to write a program that receives N command line parameters, and dispatches them to the execve syscall. If the 1st parameter starts with . (dot) (such as ./ping 8.8.8.8) the program should NOT call execve and instead print an error message.

You can use libc's printf() or puts() for the error message. You can assume the command line parameters are already on the stack, and you can generate the boilerplate code that takes care of this by linking with gcc as opposed to ld.

The equivalent C call would be:
execve(argv[1], argv+1, NULL);

You have to translate that in assembly.

The syscall number for execve is 11. Check the man page for the other arguments.

02. Looping math

Use assembly to write a program that iterates through a statically allocated string (use the .data section), and calls a function that replaces each letter based on the following formula: NEW_LETTER = 33 + ((OLD_LETTER * 42 / 3 + 13) % 94). Print the new string at the end.

You can find the skeleton for this task in 02-challenge-looping-math/src.
Follow the multiplication and division operations described here.
If the string you use it call denied! the result is tX66v$2Rj2$&.

03. Call secret function

The binary file 03-challenge-call-secret/src/call-secret needs to call a specific function. However, because of a nasty “voice”, the specific function doesn't get called. Please fix it and find out the flag.

You may overwrite “unwanted” content with the NOP instruction. You need to find out the NOP instruction for x86.
To edit a binary, you can use vim + xxd or Bless.

04. No exit

The binary file 04-challenge-no-exit/src/no-exit needs to call a specific function. However, because of a nasty exit, the specific function doesn't get called. Please fix it and find out the flag.

You need to call the secret() function instead of the exit() function. Find out the offset issue the appropriate call instruction.

The secret() function will use the argument that has been “appropriately” provided to the exit() call.

05. Funny convention

The binary 05-challenge-funny-convention/src/funny is already dynamically linked with a missing library (libfunny.so), that you'll have to recreate in assembly. The library should contain a wrapper for the write syscall called leet_write(). The original library was using a funny calling convention, slightly different from the standard one. Figure out the convention, write the wrapper in NASM, and compile the library. Test by running the provided binary.

The library is position independent, and exposes 2 symbols: the function, and some global variable. You can find the skeleton for this task in the directory 05-challenge-funny-convention/src.

You should be able to run the provided binary as long as the correct library is in ./.

The library exports the count_param as a global symbol, thus it will reside inside the caller's address space as opposed to the .data section of the library. Because of this the library cannot access the data by using count_param and needs to use count_param wrt ..sym instead. A more detailed explaination can be found here

Extra: 06. Obfuscation

Write a program that does a completely different thing than what objdump will show by jumping into the middle of an instruction. After the jump, the processor will “see” another stream of valid instructions.

You can find the skeleton for this task in the directory 06-challenge-obfuscation/src.