This session will serve as a quick refresher of basic computer architecture and assembly language. For the sake of brevity, we are going to focus on x86. Also, people are generally more familiar with this one.
First, we'll go through some very general computer architecture topics. In order to get started with the tools and learn how to easily get from assembly to a running binary, we'll continue with dissecting a short hello-world program. The core of this session is a general reference of assembly language. At the end we will also dive a little bit into some operating system internals and check out some tricks that might be useful later.
Let's get our hands dirty!
A microprocessor executes, one by one, logical, arithmetic, control, and input/output (I/O) operations specified by the instructions of a computer program that was previously loaded in the system's memory. An instruction is just a set of bytes that specify the operation or opcode (e.g., addition, multiplication, memory read/write) and the operands (e.g. numbers, memory locations). The list of supported operations is specified by an Instruction Set Architecture (ISA). ISAs can be classified into types such as CISC, RISC, VLIW and others. Particular processors implement this specification in different ways - this is called a microarchitecture, and allows the same program to be compatible with processors produced by different vendors. For example, both Intel 80386, and AMD K7 Athlon implement the same x86 ISA. Moreover, newer ISAs tend to be backward-compatible with older ones (e.g., x86 is still supported on newer 64-bit ISAs).
An x86 program operates with and on data stored in memory along with the program itself. Besides the memory, the processor also contains a set of registers that can hold a very limited number of values for fast access. Both the memory and the registers can be referenced in an instruction as operands.
An x86 instruction in machine code might look like this:
NASM syntax: add dword [0xdeadbeef], 42 hex: 8 3 0 5 e f b e a d d e 2 a binary: [1000 0011][0000 0101][1110 1111 1011 1110 1010 1101 1101 1110][0010 1010] | | | \- immediate: 42 | | \- memory address: 0xdeadbeef (note the endianness) | \- opcode modifiers: | 2 bits = addressing mode | 3 bits = register/opcode modifier | 3 bits = r/m field \- opcode: add sign-extended 8-bits immediate to register, or 32-bits memory address
The complete hierarchy of memory in a modern computer is depicted in the following picture. The ISA is only interested in accessing the registers, and the RAM memory. The processor-level caching is invisible while the lower levels (below RAM) are managed by the operating system and accessed via system calls.
This being said, even the RAM memory is not directly accessible from a normal (i.e. in protected mode - see Basics section) program. The operating system, with support from the processor, will provide the same virtual address space to all programs but map each program to different physical sections of the RAM. Using the same mechanism, memory contents can also be spilled to disk and accessed on-demand (see swapping/paging).
We can get right down to business and see what happens when we compile a very simple program written in C.
#include <stdio.h> int main() { puts("Hello world!"); return 0; }
You can compile this with gcc -m32 -O0 hello.c -o hello
. Let's take a sneak peek at the assembly generated by the GCC compiler for this basic program: objdump -M intel -d hello
. We can see it looks kind of complicated and we hope you'll be able to understand every piece of it by the end of this course but, for now, let's see what bits are actually needed and write our own minimal version directly in assembly. We are going to talk more in later sessions about topics such as disassembling, executable sections, linking, reverse engineering and static analysis.
For our minimal version we need an executable that will contain 2 kinds of information:
"Hello world!"
string)
We also need to call puts()
which is a library function. This function is already assembled in an object file and sits in the libc library. It can be used by any program running on the system in 2 ways:
We are going to use the NASM assembler to convert the following mnemonics into an actual object file containing machine code.
extern puts section .data helloStr: db 'Hello, world!',0 section .text global main main: push helloStr call puts
To assemble this run: nasm -f elf32 hello.asm
. This will produce an object file that we can inspect with objdump.
$ objdump -M intel -d hello.o hello.o: file format elf32-i386 Disassembly of section .text: 00000000 <main>: 0: 68 00 00 00 00 push 0x0 5: e8 fc ff ff ff call 6 <main+0x6>
As we can see, there is no reference to the puts()
function but it is present in the relocation records that will be used by the linker.
$ objdump -M intel -r hello.o hello.o: file format elf32-i386 RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000001 R_386_32 .data 00000006 R_386_PC32 puts
To dynamically link our object file with libc
we can use ld
.
$ ld -s -lc -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -e main hello.o -o hello_min
You can spend a few minutes and figure out what all those options do. The most important are: -lc
and -e main
.
-dynamic-linker
option should not be necessary but, at least on the system used to write this session, ld
could not find the correct linker.
The disassembly of the final binary also contains some code that will find the puts()
function at runtime. We will learn more about the .plt
section in the following sessions.
$ objdump -M intel -d hello_min hello_min: file format elf32-i386 Disassembly of section .plt: 08048170 <puts@plt-0x10>: 8048170: ff 35 40 92 04 08 push DWORD PTR ds:0x8049240 8048176: ff 25 44 92 04 08 jmp DWORD PTR ds:0x8049244 804817c: 00 00 add BYTE PTR [eax],al ... 08048180 <puts@plt>: 8048180: ff 25 48 92 04 08 jmp DWORD PTR ds:0x8049248 8048186: 68 00 00 00 00 push 0x0 804818b: e9 e0 ff ff ff jmp 8048170 <puts@plt-0x10> Disassembly of section .text: 08048190 <.text>: 8048190: 68 4c 92 04 08 push 0x804924c 8048195: e8 e6 ff ff ff call 8048180 <puts@plt>
exit()
at the end, it will throw a segmentation fault after running our code. This should be handled by __libc_start_main
which is part of the Linux Standard Base Core Specification.
As new versions of the x86 processors appeared, new features where introduced and, in order to maintain backward compatibility, the processors had to provide different operation modes. For example, the original 8086 allowed access to 1MB of memory, with no protection and no support for virtual memory, thus newer versions (80286, 80386) were introduced and had to be switched to protected mode which overcame the limitations of the older real mode. Other, even newer processors, also introduced the virtual 8086 mode, and the long mode. All x86 processors start in real mode and most operating systems (e.g. Linux) will switch to 80386 protected mode at boot time.
While in protected mode, an x86 processor has access to 8 32-bit general registers (depicted below), 6 segment registers (cs
, ds
, ss
, es
, fs
, gs
), 1 status register (eflags
), an instruction pointer (eip
), as well as other control, debug, and test registers. The segment registers usually point to the same address in modern operating systems, which use paging - they where initially used for the segmentation mechanism. Other registers were also added by different extensions to the processor (e.g. SSE, MMX). The 32-bit registers have names that start with “e” but their 16-bit and 8-bit versions are still accessible via special names, as described in the picture.
While they can be used to store any value, the 8 general registers are commonly used as follows:
eax
: accumulator, used in arithmetic operationsebx
: base pointer in memory operations (e.g., arrays)ecx
: loop countersedx
: also used in arithmetic operationsesi
: source addresses in memory operationsedi
: destination addreses in memory operationsebp
: frame base pointeresp
: stack pointerFor convenience while working with complex data structures (e.g., structs in an array), x86 ISA offers multiple addressing modes. The most simple one, direct addressing, you only need to specify an absolute value, while other modes compute the absolute address based on some registers. All addressing modes supported in 32-bit protected mode are summarized by this formula:
In Intel syntax, the previous formula translates to:
mov eax, [0xcafebab3] ; direct (displacement) mov eax, [esi] ; register indirect (base) mov eax, [ebp-8] ; based (base + displacement) mov eax, [ebx*4 + 0xdeadbeef] ; indexed (index*scale + displacement) mov eax, [edx + ebx + 12] ; based-indexed w/o scale (base + index + displacement) mov eax, [edx + ebx*4 + 42] ; based-indexed w/ scale (base + index*scale + displacement)
mov word ax, 0x42
(in NASM syntax). This syntax is usually assembler-specific. Some discussions regarding this can be found here, and on StackOverflow.
Data transfer instructions move bytes between memory-register, register-register, and register-memory. Memory to memory data transfers are not possible. The most common such instructions are:
mov <dest>, <src>
: movexchg <dest>, <src>
: exchange (swap)movzx <dest>, <src>
: move with zero extendmovsx <dest>, <src>
: move with sign extendmovsb
: move byte from location pointed to by esi
to edi
movsw
: similar, move word (2 bytes)lea <dest>, <src>
: load effective address (calculate address of <src> and load it to <dest>)lea
instruction represents <src> with square brackets, but it only computes the address and DOES NOT read the contents at that address, as mov
does. Example: lea ebx, [ebx*8+ebx]
.
The following two instructions are equivalent:
lea eax, [ebx]
mov eax, ebx
xchg eax, eax
is equivalent with nop
, which is an instruction that does nothing.
As a program executes, the address of the next instruction is stored in the eip
register. Changing the value of this register allows control of the execution flow. Instructions directly influencing eip
are:
jmp <addr>
: loads <addr> into eip
call <addr>
: pushes current eip
on stack, and loads <addr> into eip
ret <val>
: loads head of stack into eip
, and pops <val> bytes off the stackloop <addr>
: decrements ecx
, and jumps to <addr> if ecx != 0
Conditional jumps act in the same way as jmp
, but they require different combinations of flags to be set in the eflags
register. Flags are set by arithmetic (e.g., add
, sub
), logical (e.g., xor
, or
), or comparison (cmp
, test
) instructions. The most common flags are:
Arithmetic instructions (NASM/Intel syntax):
add <dest>, <src>
: additionsub <dest>, <src>
: subtractionmul <arg>
: multiplication with corresponding byte-wise eax
(i.e., <arg> = “dh” ⇒ dh * ah)imul <arg>
: signed multiplicationimul <dest>, <src>
: signed multiplication (dest = dest * src)imul <dest>, <src>, <aux>
: signed multiplication (dest = src * aux)div <arg>
: divisionidiv <arg>
: signed divisionneg <arg>
: 2's complement negation
Shifts and rotations: shr
, shl
(logical shift right/left), sar
, sal
(arithmetic shift right/left), shld
, shrd
(double-shift), ror
, rol
(rotate), rcr
, rcl
(rotate with carry).
Logical instructions: and
, or
, xor
, not
.
Function (subroutines) calls are nothing more that a convention on how parameters are passed, how the return value is passed back to the caller, and how the registers can be modified by the callee. The addresses to which a function needs to return after execution are stored in a stack data structure. Other values such as frame base pointer, and the functions local variables are also placed on the stack. Each function will thus have a corresponding stack frame that it allocates immediately after it is called (function prologue), and deallocates just before returning (function epilogue). The size of this allocation (changing the esp
register) is establishes at compile time, and its based on the size of the function's local variables.
While esp
points to the top of the stack, ebp
points tot the beginning of the current frame, and it's previous values are also saved on the stack. This is used to conveniently navigate the call stack in debuggers, and to address local variables from a fixed address (esp
might change during the function's execution, by calling push
and pop
for example).
There are multiple calling conventions mainly classified by who (caller or callee) is responsible for cleaning the parameters after the function finished. The most common are:
The following subsections will show the previous conventions, for C code, in real-life. Take a few minutes to dissect and understand the snippets. You can also try to reproduce the example on your machine.
cdecl
. Using the stdcall
or fastcall
function attributes will force GCC to use the specified convention.
struct x { int x1, x2; char x3; }; int func(struct x a, float b, void* c, int d) { return 42; } int main() { struct x a; a.x3 = '$'; func(a, 3.14, (void*)0xdeadbeef, 1); return 0; }
$ gcc -O0 -m32 -no-pie cdecl.c -o cdecl $ objdump -M intel -d ./cdecl
080483ef <main>: ... 8048403: 6a 01 push 0x1 8048405: 68 ef be ad de push 0xdeadbeef 804840a: d9 80 c0 e4 ff ff fld DWORD PTR [eax-0x1b40] 8048410: 8d 64 24 fc lea esp,[esp-0x4] 8048414: d9 1c 24 fstp DWORD PTR [esp] 8048417: ff 75 fc push DWORD PTR [ebp-0x4] 804841a: ff 75 f8 push DWORD PTR [ebp-0x8] 804841d: ff 75 f4 push DWORD PTR [ebp-0xc] 8048420: e8 b6 ff ff ff call 80483db <func> 8048425: 83 c4 18 add esp,0x18 ...
080483db <func>: 80483db: 55 push ebp 80483dc: 89 e5 mov ebp,esp 80483de: e8 4c 00 00 00 call 804842f <__x86.get_pc_thunk.ax> 80483e3: 05 1d 1c 00 00 add eax,0x1c1d 80483e8: b8 2a 00 00 00 mov eax,0x2a 80483ed: 5d pop ebp 80483ee: c3
main
), using multiple push
instructions, and are removed also by the caller function, using a single add esp,0x18
instruction.
struct x { int x1, x2; char x3; }; __attribute__((stdcall)) int func(struct x a, float b, void* c, int d) { return 42; } int main() { struct x a; a.x3 = '$'; func(a, 3.14, (void*)0xdeadbeef, 1); return 0; }
$ gcc -O0 -m32 -no-pie stdcall.c -o stdcall $ objdump -M intel -d ./stdcall
080483f1 <main>: ... 8048405: 6a 01 push 0x1 8048407: 68 ef be ad de push 0xdeadbeef 804840c: d9 80 c0 e4 ff ff fld DWORD PTR [eax-0x1b40] 8048412: 8d 64 24 fc lea esp,[esp-0x4] 8048416: d9 1c 24 fstp DWORD PTR [esp] 8048419: ff 75 fc push DWORD PTR [ebp-0x4] 804841c: ff 75 f8 push DWORD PTR [ebp-0x8] 804841f: ff 75 f4 push DWORD PTR [ebp-0xc] 8048422: e8 b4 ff ff ff call 80483db <func> ...
080483db <func>: 80483db: 55 push ebp 80483dc: 89 e5 mov ebp,esp 80483de: e8 4b 00 00 00 call 804842e <__x86.get_pc_thunk.ax> 80483e3: 05 1d 1c 00 00 add eax,0x1c1d 80483e8: b8 2a 00 00 00 mov eax,0x2a 80483ed: 5d pop ebp 80483ee: c2 18 00 ret 0x18
main
) and are removed by the callee(func
) using ret 0x18
intruction.
__attribute__((fastcall)) int func(int a, int b, int c, int d) { return 42; } int main() { func(1, 2, 3, 4); return 0; }
$ gcc -O0 -m32 -no-pie fastcall.c -o fastcall $ objdump -M intel -d ./fastcall
080483fa <main>: 80483fa: 55 push ebp 80483fb: 89 e5 mov ebp,esp 80483fd: e8 1f 00 00 00 call 8048421 <__x86.get_pc_thunk.ax> 8048402: 05 fe 1b 00 00 add eax,0x1bfe 8048407: 6a 04 push 0x4 8048409: 6a 03 push 0x3 804840b: ba 02 00 00 00 mov edx,0x2 8048410: b9 01 00 00 00 mov ecx,0x1 8048415: e8 c1 ff ff ff call 80483db <func> ...
080483db <func>: 80483db: 55 push ebp 80483dc: 89 e5 mov ebp,esp 80483de: 83 ec 08 sub esp,0x8 80483e1: e8 3b 00 00 00 call 8048421 <__x86.get_pc_thunk.ax> 80483e6: 05 1a 1c 00 00 add eax,0x1c1a 80483eb: 89 4d fc mov DWORD PTR [ebp-0x4],ecx 80483ee: 89 55 f8 mov DWORD PTR [ebp-0x8],edx 80483f1: b8 2a 00 00 00 mov eax,0x2a 80483f6: c9 leave 80483f7: c2 08 00 ret 0x8
main
). All arguments from the stack are being removed by the callee, using ret 0x8
instruction. Other compilers might use more registers as arguments.
Syscalls are the interface that allows user applications to request services from the OS kernel, such as reading the disk, starting new processes, or managing existing ones. Just like function calls, syscalls are just a set of conventions on how to pass arguments to a kernel function. The mechanism is invoked by triggering an interrupt (int 0x80
) which will call the kernel's syscall dispatcher, which, in turn, will call the syscall based on the eax
register. The conventions for invoking a syscall on Linux are:
eax
contains the syscall IDebx
, ecx
, edx
, esi
, edi
, ebp
(in this order)libc
. You can read about how this is implemented in this LWN article.
In the end, let's take a look at some common C language constructs, and how they are compiled into machine code by GCC. You are encouraged to try other constructs too.
You can try out the Compiler explorer at http://gcc.godbolt.org/ to see how each line is translated into instructions. Check this example out: http://goo.gl/gVeH5p
80483ed: 55 push ebp 80483ee: 89 e5 mov ebp,esp 80483f0: 83 ec 08 sub esp,0x8
80483fe: c9 leave 80483ff: c2 08 00 ret 0x8
int main() { int x = 1000; for (int i = 1; i < 10; i++) { x++; } return 0; }
$ gcc -O0 -m32 for.c -o for $ objdump -M intel -d ./for
080483ed <main>: ... 80483f3: c7 45 f8 e8 03 00 00 mov DWORD PTR [ebp-0x8],0x3e8 80483fa: c7 45 fc 01 00 00 00 mov DWORD PTR [ebp-0x4],0x1 8048401: eb 08 jmp 804840b <main+0x1e> 8048403: 83 45 f8 01 add DWORD PTR [ebp-0x8],0x1 8048407: 83 45 fc 01 add DWORD PTR [ebp-0x4],0x1 804840b: 83 7d fc 09 cmp DWORD PTR [ebp-0x4],0x9 804840f: 7e f2 jle 8048403 <main+0x16> ...
int main() { int x = 1000, i = 42; while (--i > 0) { x--; } return 0; }
$ gcc -O0 -m32 while.c -o while $ objdump -M intel -d ./while
080483ed <main>: ... 80483f3: c7 45 f8 e8 03 00 00 mov DWORD PTR [ebp-0x8],0x3e8 80483fa: c7 45 fc 2a 00 00 00 mov DWORD PTR [ebp-0x4],0x2a 8048401: eb 04 jmp 8048407 <main+0x1a> 8048403: 83 6d f8 01 sub DWORD PTR [ebp-0x8],0x1 8048407: 83 6d fc 01 sub DWORD PTR [ebp-0x4],0x1 804840b: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0 804840f: 7f f2 jg 8048403 <main+0x16> ...
int main() { int x = 1000, i, j; for (i = 1; i < 10; i++) { for (j = 1; j < 4; j++) { if (x == 42) break; } if (i == 3) continue; } return 0; }
$ gcc -O0 -m32 nested.c -o nested $ objdump -M intel -d ./nested
080483ed <main>: ... 80483f3: c7 45 fc e8 03 00 00 mov DWORD PTR [ebp-0x4],0x3e8 80483fa: c7 45 f4 01 00 00 00 mov DWORD PTR [ebp-0xc],0x1 8048401: eb 26 jmp 8048429 <main+0x3c> 8048403: c7 45 f8 01 00 00 00 mov DWORD PTR [ebp-0x8],0x1 804840a: eb 0c jmp 8048418 <main+0x2b> 804840c: 83 7d fc 2a cmp DWORD PTR [ebp-0x4],0x2a 8048410: 75 02 jne 8048414 <main+0x27> 8048412: eb 0a jmp 804841e <main+0x31> 8048414: 83 45 f8 01 add DWORD PTR [ebp-0x8],0x1 8048418: 83 7d f8 03 cmp DWORD PTR [ebp-0x8],0x3 804841c: 7e ee jle 804840c <main+0x1f> 804841e: 83 7d f4 03 cmp DWORD PTR [ebp-0xc],0x3 8048422: 75 01 jne 8048425 <main+0x38> 8048424: 90 nop 8048425: 83 45 f4 01 add DWORD PTR [ebp-0xc],0x1 8048429: 83 7d f4 09 cmp DWORD PTR [ebp-0xc],0x9 804842d: 7e d4 jle 8048403 <main+0x16> ...
Use assembly to write a program that receives N command line parameters. If the 1st parameter starts with .
(dot) (such as ./ping 8.8.8.8
) the program prints the message FAILED
. If the first parameter doesn't start with .
(dot) (such as /bin/ping 8.8.8.8
) the program prints the message WORKS
.
01-challenge-execve/src
.
main()
.
$ ./execve ./ping 8.8.8.8 => prints FAILED message $ ./execve /bin/ping 8.8.8.8 => prints WORKS message
Update the above program and use assembly to write a program that receives N command line parameters, and dispatches them to the execve
syscall. If the 1st parameter starts with .
(dot) (such as ./ping 8.8.8.8
) the program should NOT call execve
and instead print an error message.
You can use libc's printf()
or puts()
for the error message. You can assume the command line parameters are already on the stack, and you can generate the boilerplate code that takes care of this by linking with gcc
as opposed to ld
.
execve(argv[1], argv+1, NULL);
You have to translate that in assembly.
Use assembly to write a program that iterates through a statically allocated string (use the .data
section), and calls a function that replaces each letter based on the following formula: NEW_LETTER = 33 + ((OLD_LETTER * 42 / 3 + 13) % 94)
. Print the new string at the end.
02-challenge-looping-math/src
.
call denied!
the result is tX66v$2Rj2$&
.
The binary file 03-challenge-call-secret/src/call-secret
needs to call a specific function. However, because of a nasty “voice”, the specific function doesn't get called. Please fix it and find out the flag.
NOP
instruction. You need to find out the NOP
instruction for x86.
The binary file 04-challenge-no-exit/src/no-exit
needs to call a specific function. However, because of a nasty exit, the specific function doesn't get called. Please fix it and find out the flag.
secret()
function instead of the exit()
function. Find out the offset issue the appropriate call
instruction.
The secret()
function will use the argument that has been “appropriately” provided to the exit()
call.
The binary 05-challenge-funny-convention/src/funny
is already dynamically linked with a missing library (libfunny.so
), that you'll have to recreate in assembly. The library should contain a wrapper for the write
syscall called leet_write()
. The original library was using a funny calling convention, slightly different from the standard one. Figure out the convention, write the wrapper in NASM, and compile the library. Test by running the provided binary.
The library is position independent, and exposes 2 symbols: the function, and some global variable. You can find the skeleton for this task in the directory 05-challenge-funny-convention/src
.
You should be able to run the provided binary as long as the correct library is in ./
.
count_param
as a global symbol, thus it will reside inside the caller's address space as opposed to the .data section of the library. Because of this the library cannot access the data by using count_param
and needs to use count_param wrt ..sym
instead. A more detailed explaination can be found here
Write a program that does a completely different thing than what objdump
will show by jumping into the middle of an instruction. After the jump, the processor will “see” another stream of valid instructions.
06-challenge-obfuscation/src
.