User Tools

Site Tools


session:02

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
session:02 [2019/07/13 15:08]
Radu-Nicolae NICOLAU (78289)
session:02 [2020/07/19 12:49] (current)
Line 1: Line 1:
-0x02. Assembly Language+====== Refresher. Assembly Language ======
  
-== Slides+===== Resources =====
  
-[[https://security.cs.pub.ro/summer-school/res/slides/02-assembly-language.pdf|Session slides]]+[[https://security.cs.pub.ro/summer-school/res/slides/02-assembly-language.pdf|Session slides]]
  
 [[https://security.cs.pub.ro/summer-school/res/arc/02-assembly-language-skel.zip|Session's tutorials and challenges archive]] [[https://security.cs.pub.ro/summer-school/res/arc/02-assembly-language-skel.zip|Session's tutorials and challenges archive]]
Line 9: Line 9:
 [[https://security.cs.pub.ro/summer-school/res/arc/02-assembly-language-full.zip|Session's solutions]] [[https://security.cs.pub.ro/summer-school/res/arc/02-assembly-language-full.zip|Session's solutions]]
  
-== Tutorials+===== Tutorials =====
  
 This session will serve as a quick **refresher** of basic computer architecture and assembly language. For the sake of brevity, we are going to focus on x86. Also, people are generally more familiar with this one. This session will serve as a quick **refresher** of basic computer architecture and assembly language. For the sake of brevity, we are going to focus on x86. Also, people are generally more familiar with this one.
Line 16: Line 16:
  
 Let's get our hands dirty! Let's get our hands dirty!
-=== Computer Architecture: A Blistering Approach+==== Computer Architecture: A Blistering Approach ====
  
 A microprocessor executes, one by one, **logical**, **arithmetic**, **control**, and **input/output (I/O)** operations specified by the instructions of a computer program that was previously loaded in the system's memory. An instruction is just a set of bytes that specify the operation or opcode (e.g., addition, multiplication, memory read/write) and the operands (e.g. numbers, memory locations). The list of supported operations is specified by an **Instruction Set Architecture (ISA)**. ISAs can be classified into types such as [[http://en.wikipedia.org/wiki/Complex_instruction_set_computing|CISC]], [[http://en.wikipedia.org/wiki/Reduced_instruction_set_computer|RISC]], [[http://en.wikipedia.org/wiki/Very_long_instruction_word|VLIW]] and others. Particular processors implement this specification in different ways - this is called a microarchitecture, and allows the same program to be compatible with processors produced by different vendors. For example, both Intel 80386, and AMD K7 Athlon implement the same x86 ISA. Moreover, newer ISAs tend to be backward-compatible with older ones (e.g., x86 is still supported on newer 64-bit ISAs). A microprocessor executes, one by one, **logical**, **arithmetic**, **control**, and **input/output (I/O)** operations specified by the instructions of a computer program that was previously loaded in the system's memory. An instruction is just a set of bytes that specify the operation or opcode (e.g., addition, multiplication, memory read/write) and the operands (e.g. numbers, memory locations). The list of supported operations is specified by an **Instruction Set Architecture (ISA)**. ISAs can be classified into types such as [[http://en.wikipedia.org/wiki/Complex_instruction_set_computing|CISC]], [[http://en.wikipedia.org/wiki/Reduced_instruction_set_computer|RISC]], [[http://en.wikipedia.org/wiki/Very_long_instruction_word|VLIW]] and others. Particular processors implement this specification in different ways - this is called a microarchitecture, and allows the same program to be compatible with processors produced by different vendors. For example, both Intel 80386, and AMD K7 Athlon implement the same x86 ISA. Moreover, newer ISAs tend to be backward-compatible with older ones (e.g., x86 is still supported on newer 64-bit ISAs).
Line 55: Line 55:
 </note> </note>
  
-=== Hello (Assembly) World+==== Hello (Assembly) World ====
  
 We can get right down to business and see what happens when we compile a very simple program written in C. We can get right down to business and see what happens when we compile a very simple program written in C.
Line 175: Line 175:
 </note> </note>
  
-=== Basics+==== Basics ====
  
 As new versions of the x86 processors appeared, new features where introduced and, in order to maintain backward compatibility, the processors had to provide different operation **modes**. For example, the original 8086 allowed access to 1MB of memory, with no protection and no support for virtual memory, thus newer versions (80286, 80386) were introduced and had to be switched to **protected mode** which overcame the limitations of the older **real mode**. Other, even newer processors, also introduced the **virtual 8086 mode**, and the **long mode**. All x86 processors start in real mode and most operating systems (e.g. Linux) will switch to 80386 protected mode at boot time. As new versions of the x86 processors appeared, new features where introduced and, in order to maintain backward compatibility, the processors had to provide different operation **modes**. For example, the original 8086 allowed access to 1MB of memory, with no protection and no support for virtual memory, thus newer versions (80286, 80386) were introduced and had to be switched to **protected mode** which overcame the limitations of the older **real mode**. Other, even newer processors, also introduced the **virtual 8086 mode**, and the **long mode**. All x86 processors start in real mode and most operating systems (e.g. Linux) will switch to 80386 protected mode at boot time.
Line 216: Line 216:
 </note> </note>
  
-=== Data Transfer+==== Data Transfer ====
  
 Data transfer instructions move bytes between memory-register, register-register, and register-memory. Memory to memory data transfers are not possible. The most common such instructions are: Data transfer instructions move bytes between memory-register, register-register, and register-memory. Memory to memory data transfers are not possible. The most common such instructions are:
Line 239: Line 239:
 </note> </note>
  
-=== Control Flow+==== Control Flow ====
  
 As a program executes, the address of the next instruction is stored in the ''eip'' register. Changing the value of this register allows control of the execution flow. Instructions directly influencing ''eip'' are: As a program executes, the address of the next instruction is stored in the ''eip'' register. Changing the value of this register allows control of the execution flow. Instructions directly influencing ''eip'' are:
Line 261: Line 261:
 </note> </note>
  
-=== Arithmetic/Logic+==== Arithmetic/Logic ====
  
 Arithmetic instructions (NASM/Intel syntax): Arithmetic instructions (NASM/Intel syntax):
Line 278: Line 278:
 Logical instructions: ''and'', ''or'', ''xor'', ''not''. Logical instructions: ''and'', ''or'', ''xor'', ''not''.
  
-=== Function Calls+==== Function Calls ====
  
 Function (subroutines) calls are nothing more that a convention on how parameters are passed, how the return value is passed back to the caller, and how the registers can be modified by the callee. The addresses to which a function needs to return after execution are stored in a stack data structure. Other values such as frame base pointer, and the functions local variables are also placed on the stack. Each function will thus have a corresponding **stack frame** that it allocates immediately after it is called (function prologue), and deallocates just before returning (function epilogue). The size of this allocation (changing the ''esp'' register) is establishes at compile time, and its based on the size of the function's local variables. Function (subroutines) calls are nothing more that a convention on how parameters are passed, how the return value is passed back to the caller, and how the registers can be modified by the callee. The addresses to which a function needs to return after execution are stored in a stack data structure. Other values such as frame base pointer, and the functions local variables are also placed on the stack. Each function will thus have a corresponding **stack frame** that it allocates immediately after it is called (function prologue), and deallocates just before returning (function epilogue). The size of this allocation (changing the ''esp'' register) is establishes at compile time, and its based on the size of the function's local variables.
Line 302: Line 302:
 The default convention used by GCC is ''cdecl''. Using the ''stdcall'' or ''fastcall'' [[https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html|function attributes]] will force GCC to use the specified convention. The default convention used by GCC is ''cdecl''. Using the ''stdcall'' or ''fastcall'' [[https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html|function attributes]] will force GCC to use the specified convention.
 </note> </note>
-==== cdecl+=== cdecl ===
  
 <code c> <code c>
Line 359: Line 359:
 </note> </note>
  
-==== stdcall+=== stdcall ===
  
 <code c> <code c>
Line 415: Line 415:
 </note> </note>
  
-==== fastcall+=== fastcall ===
  
 <code c> <code c>
Line 466: Line 466:
 </note> </note>
  
-=== System calls+==== System calls ====
  
 Syscalls are the interface that allows user applications to request services from the OS kernel, such as reading the disk, starting new processes, or managing existing ones. Just like function calls, syscalls are just a set of conventions on how to pass arguments to a kernel function. The mechanism is invoked by triggering an interrupt (**''int 0x80''**) which will call the kernel's syscall dispatcher, which, in turn, will call the syscall based on the ''eax'' register. The conventions for invoking a syscall on Linux are: Syscalls are the interface that allows user applications to request services from the OS kernel, such as reading the disk, starting new processes, or managing existing ones. Just like function calls, syscalls are just a set of conventions on how to pass arguments to a kernel function. The mechanism is invoked by triggering an interrupt (**''int 0x80''**) which will call the kernel's syscall dispatcher, which, in turn, will call the syscall based on the ''eax'' register. The conventions for invoking a syscall on Linux are:
Line 484: Line 484:
  
 </note> </note>
-=== Compiler Patterns+==== Compiler Patterns ====
  
 In the end, let's take a look at some common C language constructs, and how they are compiled into machine code by GCC. You are encouraged to try other constructs too. In the end, let's take a look at some common C language constructs, and how they are compiled into machine code by GCC. You are encouraged to try other constructs too.
  
  
-==== Compiler Explorer+=== Compiler Explorer ===
  
 You can try out the Compiler explorer at http://gcc.godbolt.org/ to see how each line is translated into instructions. You can try out the Compiler explorer at http://gcc.godbolt.org/ to see how each line is translated into instructions.
Line 497: Line 497:
 </note> </note>
 */ */
-==== function prologue+=== function prologue ===
  
 <code objdump> <code objdump>
Line 505: Line 505:
 </code> </code>
  
-==== function epiloque+=== function epiloque ===
  
 <code objdump> <code objdump>
Line 512: Line 512:
 </code> </code>
  
-==== for loop+=== for loop ===
  
 <code c> <code c>
Line 542: Line 542:
 </code> </code>
  
-==== while loop+=== while loop ===
  
 <code c> <code c>
Line 572: Line 572:
 </code> </code>
  
-==== nested fors with break and continue+=== nested fors with break and continue ===
  
 <code c> <code c>
Line 618: Line 618:
  
  
-== Challenges+===== Challenges =====
  
-=== 01. Execve+==== 01. Execve ====
  
-==== Simple printing+=== Simple printing ===
  
 Use assembly to write a program that receives N command line parameters. If the 1st parameter starts with ''.'' (//dot//) (such as ''./ping 8.8.8.8'') the program prints the message ''FAILED''. If the first parameter **doesn't** start with ''.'' (//dot//) (such as ''/bin/ping 8.8.8.8'') the program prints the message ''WORKS''. Use assembly to write a program that receives N command line parameters. If the 1st parameter starts with ''.'' (//dot//) (such as ''./ping 8.8.8.8'') the program prints the message ''FAILED''. If the first parameter **doesn't** start with ''.'' (//dot//) (such as ''/bin/ping 8.8.8.8'') the program prints the message ''WORKS''.
Line 639: Line 639:
 </code> </code>
  
-==== Simple syscall+=== Simple syscall ===
  
 Update the above program and use assembly to write a program that receives N command line parameters, and dispatches them to the ''execve'' syscall. If the 1st parameter starts with ''.'' (//dot//) (such as ''./ping 8.8.8.8'') the program should NOT call ''execve'' and instead print an error message. Update the above program and use assembly to write a program that receives N command line parameters, and dispatches them to the ''execve'' syscall. If the 1st parameter starts with ''.'' (//dot//) (such as ''./ping 8.8.8.8'') the program should NOT call ''execve'' and instead print an error message.
Line 656: Line 656:
 The syscall number for ''execve'' is ''11''. Check the [[http://man7.org/linux/man-pages/man2/execve.2.html|man page]] for the other arguments. The syscall number for ''execve'' is ''11''. Check the [[http://man7.org/linux/man-pages/man2/execve.2.html|man page]] for the other arguments.
 </note> </note>
-=== 02. Looping math+==== 02. Looping math ====
  
 Use assembly to write a program that iterates through a statically allocated string (use the ''.data'' section), and calls a function that replaces each letter based on the following formula: ''NEW_LETTER = 33 + ((OLD_LETTER * 42 / 3 + 13) % 94)''. Print the new string at the end. Use assembly to write a program that iterates through a statically allocated string (use the ''.data'' section), and calls a function that replaces each letter based on the following formula: ''NEW_LETTER = 33 + ((OLD_LETTER * 42 / 3 + 13) % 94)''. Print the new string at the end.
Line 671: Line 671:
 If the string you use it ''call denied!'' the result is ''tX66v$2Rj2$&''. If the string you use it ''call denied!'' the result is ''tX66v$2Rj2$&''.
 </note> </note>
-=== 03. Call secret function+==== 03. Call secret function ====
  
 The binary file ''03-challenge-call-secret/src/call-secret'' needs to call a specific function. However, because of a nasty "voice", the specific function doesn't get called. Please fix it and find out the flag. The binary file ''03-challenge-call-secret/src/call-secret'' needs to call a specific function. However, because of a nasty "voice", the specific function doesn't get called. Please fix it and find out the flag.
Line 683: Line 683:
  
 </note> </note>
-=== 04. No exit+==== 04. No exit ====
  
 The binary file ''04-challenge-no-exit/src/no-exit'' needs to call a specific function. However, because of a nasty exit, the specific function doesn't get called. Please fix it and find out the flag. The binary file ''04-challenge-no-exit/src/no-exit'' needs to call a specific function. However, because of a nasty exit, the specific function doesn't get called. Please fix it and find out the flag.
Line 693: Line 693:
 The ''secret()'' function will use the argument that has been "appropriately" provided to the ''exit()'' call. The ''secret()'' function will use the argument that has been "appropriately" provided to the ''exit()'' call.
 </note> </note>
-=== 05. Funny convention+==== 05. Funny convention ====
  
 The binary ''05-challenge-funny-convention/src/funny'' is already dynamically linked with a missing library (''libfunny.so''), that you'll have to recreate in assembly. The library should contain a wrapper for the ''write'' syscall called ''leet_write()''. The original library was using a funny calling convention, slightly different from the standard one. Figure out the convention, write the wrapper in NASM, and compile the library. Test by running the provided binary. The binary ''05-challenge-funny-convention/src/funny'' is already dynamically linked with a missing library (''libfunny.so''), that you'll have to recreate in assembly. The library should contain a wrapper for the ''write'' syscall called ''leet_write()''. The original library was using a funny calling convention, slightly different from the standard one. Figure out the convention, write the wrapper in NASM, and compile the library. Test by running the provided binary.
Line 705: Line 705:
 A more detailed explaination can be found [[https://www.nasm.us/doc/nasmdoc9.html#section-9.2.4|here]] A more detailed explaination can be found [[https://www.nasm.us/doc/nasmdoc9.html#section-9.2.4|here]]
 </note> </note>
-=== Extra: 06. Obfuscation+==== Extra: 06. Obfuscation ====
  
 Write a program that does a completely different thing than what ''objdump'' will show by jumping into the middle of an instruction. After the jump, the processor will "see" another stream of valid instructions. Write a program that does a completely different thing than what ''objdump'' will show by jumping into the middle of an instruction. After the jump, the processor will "see" another stream of valid instructions.
session/02.1563019737.txt.gz · Last modified: 2019/07/13 15:08 by Radu-Nicolae NICOLAU (78289)