CSE333(0-3): Systems Programming

Apart from CS106L, CSE333 is another amazing course(@ University of Washington Computer Science & Engineering) for beginners like me.

These days it’s rainy, and everyone works from home. When people work from home, they may want a cat to work with them. However, buying a Cat is too expensive for poor fabbit T.T

Alternatively, it’s good to learn more things starting with ‘C’, like ‘C’, ‘C++’, or ‘CSE’, which all looks similar to Cat. šŸ±

2020_12_01_0

CSE333: Systems Programming

Catalog Description

Includes substantial programming experience in languages that expose machine characteristics and low-level data representation (e.g., C and C++); explicit memory management; interacting with operating-system services; and cache-aware programming. Prerequisite: CSE 351.

Compile C Program

Hello World in C

#include <stdio.h> // for printf()
#include <stdlib.h> // for EXIT_SUCCESS
int main(int argc, char** argv) {
	printf("Hello, World!\n");
	return EXIT_SUCCESS;
}

Compile:

gcc -Wall -g -std=c11 -o hello helloword.c
./hello

Lecture 1: Intro, C Refresher

  • Workflow, Variables, Functions

C to Machine Code

C .c file -> Assembly .s file -> Machine Code .o file

Errors and Exceptions (ugly and inelegant)

  • C does not have exception handling (no try/catch)
  • Errors are returned as integer error codes from functions
  • segmentation fault is the ā€œgoodā€ option

How C is different from Java

  • Casting
    • Java enforces type safety, C does not
  • Arrays
    • Not objects, donā€™t know their own length, no bounds checking
  • Memory Management
    • Manual (malloc/free), no garbage collection

Primitive Types in C

2020_12_01_1

C99 Extended Integer Types

#include <stdint.h>
void foo(void) {
int8_t a; // exactly 8 bits, signed
int16_t b; // exactly 16 bits, signed
int32_t c; // exactly 32 bits, signed
int64_t d; // exactly 64 bits, signed
uint8_t w; // exactly 8 bits, unsigned
...
}

Basic Data Structures

C does not support objects

  • Arrays are contiguous chunks of memory
    • Arrays have no methods and do not know their own length
    • Can easily run off ends of arrays in C ā€“ security bugs!!!
  • Strings are null-terminated char arrays
    • Strings have no methods, but string.h has helpful utilities
char* x = "hello\n";
	-	-	-	-	-	--	---- 
x | h | e | l | l | o | \n | \0 |
	-	-	-	-	-	--	----
  • Structs are the most object-like feature, but are just collections of fields ā€“ no ā€œmethodsā€ or functions

Function

Function Definitions

Generic format:

returnType fname(type param1, ā€¦, type paramN) {
// statements
}

Function Ordering

You shouldnā€™t call a function that hasnā€™t been declared yet.

  • Reverse Ordering
  • Function Declaration
#include <stdio.h>
int sumTo(int); // func prototype
int main(int argc, char** argv) {
printf("sumTo(5) is: %d\n", sumTo(5));
return 0;
}
// sum of integers from 1 to max
int sumTo(int max) {
int i, sum = 0;
for (i = 1; i <= max; i++) {
sum += i;
}
return sum;
}

Function Declaration vs. Definition

  • Definition:
    • Must be exactly one definition of each thing (no duplicates)
  • Declaration:
    • function prototype, external variable declaration
    • Needs to appear in all files that use that thing
void sumstore(int x, int y, int* dest);

Compiling Multi-file Programs

The linker combines multiple object files plus staticallylinked libraries to produce an executable.

  • A library is just a pre-assembled collection of .o files

Lecture 2: Memory, Arrays

OS and Processes

  • There are many processes running on OS.
  • The OS lets you run multiple applications at once.
  • An application runs within an OS “process”.
  • The OS timeslices each CPU between runnable processes

Processes and Virtual Memory

The OS gives each process the illusion of its own private memory

  • Called the processā€™ address space
  • Contains the processā€™ virtual memory, visible only to it (via translation)
  • 2^64 bytes on a 64-bit machine
  • Virtual Memory: From 0x00..00 to 0xFF..FF, contains code, data, libraries, stack, etc.

Loading

When OS loads a program, it:

  1. create an address space
  2. inspects the executable file to see what’s inside
  3. Lazily copies regions of the file into the right place in the address space
  4. Does any final linking, relocation, or other needed preparation.

0xFF..FF | Stack <- -> Shared Libraries <- Heap | Read/Write Segment .data, .bss | Read-Only Segment .text, .rodata | 0x00..00

Memory Management

  • Local variables on the Stack
    • Allocated and freed via calling conventions (push, pop, mov)
  • Global and static variables in Data
    • Allocated/freed when the process starts/exits
  • Dynamically-allocated data on the Heap
    • malloc() to request; free() to free, otherwise memory leak

2020_12_01_2

Stack

  • Used to store data associated with function calls
    • Compiler-inserted code manages stack frames for you

Stack frame (x86-64) includes:

  • Address to return to
  • Saved registers
    • Based on calling conventions
  • Local variables
  • Argument build
    • Only if > 6 used
						 %rbp                                           %rsp
						   ā†“	  										  ā†“
|-------| Arg 7+ | Rt Addr | Old %rbp | Saved Regs + Local vars | Args 7+ |
|<-Caller frame->|<----------------------Callee Frame-------------------->|

Pointers (type *p)

Pointers are Variables that store addresses.

  • It points to somewhere in the processā€™ virtual address space
  • &foo produces the virtual address of foo
  • Dereference a pointer using the unary * operator
    • Access the memory referred to by a pointer

Address Space Layout Randomization

Linux uses address space layout randomization (ASLR) for added security.

  • Randomizes:
    • Base of stack
    • Shared library (mmap) location
  • Makes Stack-based buffer overflow attacks tougher
  • Makes debugging tougher
  • Can be disabled (gdb does this by default)

Arrays

  • Definition: Allocates size * (size of type) bytes of contiguous memory
type name[size]
  • Size of an array is unknown, not stored anywhere.
  • sizeof(array) only works in variable scope of array definition
  • Initialization:
type name[size] = {val0, ..., valN};
  • Array name (by itself) evaluates to the address of the start of the array

Multi-dimensional Arrays

Generic 2D format:

type name[rows][cols] = {{values},ā€¦,{values}};
  • Still allocates a single, contiguous chunk of memory
  • C is row-major
  • 2-D arrays normally only useful if size known in advance. Otherwise use dynamically-allocated data and pointers (later)

Arrays as Parameters

  • Solution 1: Declare Array Size
int sumAll(int a[5]); // prototype
  • Solution 2: Pass Size as Parameter
int sumAll(int a[], int size); 

Returning an Array

  • Canā€™t safely return local arrays from functions

  • Solution: Output Parameter Pass it as an output parameter to copyarray()

void copyArray(int src[], int dst[], int size) {    //dst[] is the output parameter
	int i;
	for (i = 0; i < size; i++) {
		dst[i] = src[i];
	}
}

Output parameters are common in library functions:

long int strtol(char* str, char** endptr, int base);
int sscanf(char* str, char* format, ...);

Array: Call by value

Technical answer: a T[] array parameter is ā€œpromotedā€ to a pointer of type T*, and the pointer is passed by value

  • So it acts like a call-by-reference array (if callee changes the array parameter elements it changes the callerā€™s array)

  • But itā€™s really a call-by-value pointer (the callee can change the pointer parameter to point to something else)

void copyArray(int src[], int dst[], int size) {
	int i;
	dst = src; // evil!
	for (i = 0; i < size; i++) {
		dst[i] = src[i]; // copies source array to itself!
	}
}

Lecture 3: Pointers

Pointers & Pointer Arithmatic

  • Pointers are typed

    • Tells the compiler the size of the data you are pointing to
    • Exception: void* is a generic pointer (i.e. a placeholder)
  • Pointer arithmetic is scaled by sizeof(*p)

    • Works nicely for arrays
    • Does not work on void*, since void doesnā€™t have a size!
  • Valid pointer arithmetic:

    • Add/subtract an integer to a pointer
    • Subtract two pointers (within stack frame or malloc block)
    • Compare pointers (<, <=, ==, !=, >, >=), including NULL

Endianness

4-byte data 0xa1b2c3d4 at address 0x100

  • Big-Endian: a1b2c3d4
  • Little-Endian: d4c3b2a1

Pointers and Parameters

C is Call-By-Value:

  • Callee receives a local copy of the argument, Register or Stack
  • If the callee modifies a parameter, the callerā€™s copy isnā€™t modified

How to swap: Faking Call-By-Reference in C

Can use pointers to approximate call-by-reference.

Callee still receives a copy of the pointer (i.e. call-by-value), but it can modify something in the callerā€™s scope by dereferencing the pointer parameter.

ā€¢ ptr[i] is *(ptr+i) with pointer arithmetic

Pointers and Arrays

  • A pointer can point to an array element
  • You can use array indexing notation on pointers
    • ptr[i] is *(ptr+i) with pointer arithmetic ā€“ reference the data i elements forward from ptr
  • An array nameā€™s value is the beginning address of the array
    • Like a pointer to the first element of array, but canā€™t change

Array Parameters

  • Array parameters are actually passed (by value) as pointers to the first array element

These two codes are the same:

void f(int a[]);
int main( ... ) {
	int a[5];
	...
	f(a);
	return 0;
}

with

void f(int* a);
int main( ... ) {
	int a[5];
	...
	f(&a[0]);
	return 0;
}

Function Pointers

function name

Generic format:

returnType (* name)(type1, ā€¦, typeN)

Function Pointer Example

map() performs operation on each element of an array.

  • funcptr dereference
  • funcptr definition
  • funcptr assignment

C allows you to omit & on a function parameter and omit * when calling pointed-to function; both assumed implicitly.

#define LEN 4

int negate(int num) {return -num;}
int square(int num) {return num*num;}

// perform operation pointed to on each array element
void map(int a[], int len, int (* op)(int n)) {
	for (int i = 0; i < len; i++) {
		a[i] = op(a[i]); // dereference function pointer, implicit funcptr dereference (no * needed)
	}
}
int main(int argc, char** argv) {
	int arr[LEN] = {-1, 0, 1, 2};
	map(arr, LEN, square);  // no & needed for func ptr argument

Haha, congrats, the first 3 lectures are finished! Hope that I can have a British Shorthair kitten next year!! šŸ±

2020_12_01_3

References