- Header
.hfiles vs.cfiles in C - Pointers and Double Pointers in C
- More C Behaviors
- Recap: pass by value
- Array decay
- Undefined behavior (Undefined Behavior)
- Integer promotions and signed/unsigned mixing
- Evaluation order is mostly unspecified
- Strict aliasing
- Strings and string literals
- Struct padding
- Reading declarations right-to-left from the name
- Linkage:
static,extern, file scope - Preprocessor macros
charmay be signed or unsigned- How to make life easier when writing in C?
- Quick Summary
50.005 Computer System Engineering
Information Systems Technology and Design
Singapore University of Technology and Design
Natalie Agus (Summer 2025)
C Primer
This notes will be used in Week 2 and Week 3 Labs when we introduce C and finish up lecture theory in class. You should refer to this notes whenever possible to not make beginner mistakes when writing in C.
During the lab, we will use this exercises. Special thanks to Ryan Pek, a 2027 CSD graduate who created this exercise.
Header .h files vs .c files in C
In C, a header file should usually describe what other files are allowed to use. A .c file should usually contain the actual implementation.
General Rule
Put declarations in
.hfiles. Put definitions in.cfiles.
This matters because #include literally copies the header file into the .c file before compilation. If many .c files include the same header, they all receive the same header content.
So if you define a global variable inside a header, every .c file gets its own copy. That commonly causes linker errors such as:
duplicate symbol '_something'
Below we list things that belong in the header file.
Function Declaration
A function declaration tells C that this function exists somewhere.
Example header:
// math_utils.h
#ifndef MATH_UTILS_H
#define MATH_UTILS_H
int add(int a, int b);
int multiply(int a, int b);
#endif
The function bodies go in the .c file:
// math_utils.c
#include "math_utils.h"
int add(int a, int b) {
return a + b;
}
int multiply(int a, int b) {
return a * b;
}
Another file can use these functions:
// main.c
#include <stdio.h>
#include "math_utils.h"
int main(void) {
printf("%d\n", add(3, 4));
return 0;
}
The header says what functions exist and the .c file (that includes it) contains the actual implementation.
Global variables
Suppose you want to share a global counter between multiple .c files. Do not write this in the header:
// counter.h
int total_requests = 0; // This is bad
This is a definition. It creates real storage.
Why definition in header file is bad
If both
main.candserver.cinclude this header, then both files define their owntotal_requests.
That causes a duplicate symbol error. Some people however don’t care about this because they know that this header file will only be included by exactly 1 .c file so in this case that’s okay.
This is the good version:
// counter.h
#ifndef COUNTER_H
#define COUNTER_H
extern int total_requests;
void increment_requests(void);
#endif
Then define the variable in exactly one .c file:
// counter.c
#include "counter.h"
int total_requests = 0;
void increment_requests(void) {
total_requests++;
}
Now other files can use the same shared variable:
// main.c
#include <stdio.h>
#include "counter.h"
int main(void) {
increment_requests();
increment_requests();
printf("Total requests: %d\n", total_requests);
return 0;
}
The key difference is that this means this variable exists somewhere else:
extern int total_requests;
But the following means: create the actual variable here.
int total_requests = 0;
Arrays
The same rule applies to arrays. This is a bad header:
// colors.h
const char *colors[] = {
"red",
"green",
"blue"
};
This creates the actual array every time the header is included.
This is the correct header:
// colors.h
#ifndef COLORS_H
#define COLORS_H
extern const char * const colors[];
extern const int NUM_COLORS;
#endif
And the correct .c file:
// colors.c
#include "colors.h"
const char * const colors[] = {
"red",
"green",
"blue"
};
const int NUM_COLORS =
sizeof(colors) / sizeof(colors[0]);
You can use it elsewhere freely too:
// main.c
#include <stdio.h>
#include "colors.h"
int main(void) {
for (int i = 0; i < NUM_COLORS; i++) {
printf("%s\n", colors[i]);
}
return 0;
}
This is safe because only colors.c creates the real array.
Private variables should stay in the .c file
Not everything needs to be exposed through a header. Suppose logger.c has an internal log level.
// logger.c
#include <stdio.h>
static int log_level = 1;
void log_message(const char *message) {
if (log_level > 0) {
printf("[LOG] %s\n", message);
}
}
The static keyword at file scope means: this variable is private to this .c file.
Other files cannot directly access log_level.
The header only exposes the public function:
// logger.h
#ifndef LOGGER_H
#define LOGGER_H
void log_message(const char *message);
#endif
Other files do not need to know how the logger stores its internal state. They only need to call:
log_message("Program started");
This is usually better design, but you probably won’t need to care about it in this course since we don’t go really deep into C.
Structs and typedefs
Headers are good places to put shared types.
Example:
// point.h
#ifndef POINT_H
#define POINT_H
typedef struct {
int x;
int y;
} Point;
int point_distance_squared(Point p);
#endif
Implementation:
// point.c
#include "point.h"
int point_distance_squared(Point p) {
return p.x * p.x + p.y * p.y;
}
Usage:
// main.c
#include <stdio.h>
#include "point.h"
int main(void) {
Point p = {3, 4};
printf("%d\n", point_distance_squared(p));
return 0;
}
This is fine because the header defines a type, not a global variable with storage.
About typedef
typedef gives an existing type a new name.
Example:
typedef unsigned int uint;
Now instead of writing:
unsigned int age;
you can write:
uint age;
Most common use is with struct. Without typedef, you need the keyword struct to declare it:
struct Point {
int x;
int y;
};
struct Point p;
With typedef, it gets a little neater:
typedef struct {
int x;
int y;
} Point;
Point p;
So this means: create a struct type with x and y, and let me call that type Point.
typedef struct {
int x;
int y;
} Point;
It does not create a variable, only a type name.
Constants
For simple integer constants, this is okay to put in a header:
// config.h
#ifndef CONFIG_H
#define CONFIG_H
#define MAX_USERS 100
#define BUFFER_SIZE 1024
#endif
Another common style is to use enum:
// config.h
#ifndef CONFIG_H
#define CONFIG_H
enum {
MAX_USERS = 100,
BUFFER_SIZE = 1024
};
#endif
Both are fine for simple compile-time constants, but avoid putting actual global variables in the header:
// config.h
int max_users = 100; // BAD
Instead use this correct version:
// config.h
#ifndef CONFIG_H
#define CONFIG_H
extern int max_users;
#endif
// config.c
#include "config.h"
int max_users = 100;
Include guards
Every header should usually have an include guard:
#ifndef FILE_NAME_H
#define FILE_NAME_H
// declarations here
#endif
Example:
// point.h
#ifndef POINT_H
#define POINT_H
typedef struct {
int x;
int y;
} Point;
int point_distance_squared(Point p);
#endif
The guard prevents the same header from being processed multiple times in one compilation unit.
For example:
// main.c
#include "point.h"
#include "geometry.h"
If geometry.h also includes point.h, then point.h may be included twice.
The include guard prevents duplicate type definitions and repeated declarations inside the same .c file.
Meaning:
#ifndef POINT_H
means:
If POINT_H has not been defined yet, continue.
Then the following marks this header as already included so that the second time the compiler sees the same header, it skips the content.
#define POINT_H
Summary about Header files
If you don’t want to read anything else, at least read these:
// Good things for .h files
int add(int a, int b); // function declaration
extern int total_requests; // global variable declaration
typedef struct {
int x;
int y;
} Point; // shared type definition
#define BUFFER_SIZE 1024 // macro constant
And put these in .c files:
// Good things for .c files
int add(int a, int b) {
return a + b;
}
int total_requests = 0;
static int internal_state = 123;
Avoid this in headers because these create actual storage, and every .c file that includes the header will define another copy:
// Bad in .h files
int total_requests = 0;
const char *colors[] = {
"red",
"green",
"blue"
};
So basically, a header file should usually contain:
Declarations
Shared types
Constants
Function prototypes
extern variable declarations
A .c file should usually contain:
Function bodies
Actual global variable definitions
Private helper functions
Private static variables
Implementation logic
The header tells other files what they can use and the .c file contains the actual code and storage.
Pointers and Double Pointers in C
This is a quick note about T * vs T **, and what’s actually happening underneath, complete with snippets and sample code just enough to make you understand the basics. You still need to do more practice on your own to actually feel natural about it.
The one rule about C
C passes everything by value. When you pass a pointer to a function, the function gets a copy of the pointer. Assigning to that copy changes only the local parameter, not the caller’s variable.
If you want a function to modify something the caller owns, you must give the function the address of that something. Then it can write through the address.
This single rule explains every use of double pointers.
What each indirection level means
| Parameter | What it points to | What the function can modify in the caller |
|---|---|---|
int x |
a copy of x | nothing in the caller |
int *p |
an int | the int |
int **p |
a pointer to an int | the pointer, and through it, the int |
int ***p |
a pointer to a pointer to an int | the pointer to a pointer, and so on |
Rule of thumb: one more * in the parameter than the thing you want to mutate in the caller.
When int *p is the right choice
The function only needs to read or write the int(s). The caller’s pointer variable does not need to change.
Modify a single value the caller owns:
void increment(int *p) { (*p)++; }
int x = 5;
increment(&x); // x is now 6
Iterate over an array (array decays to int *):
int sum(int *a, size_t n) {
int s = 0;
for (size_t i = 0; i < n; i++) s += a[i];
return s;
}
Return multiple values via out-parameters:
void divmod(int a, int b, int *q, int *r) {
*q = a / b;
*r = a % b;
}
Fill a buffer the caller already allocated:
void read_line(char *buf, size_t cap);
When int **p is the right choice
The function needs to change the caller’s pointer itself, not just the data behind it.
Allocate on behalf of the caller:
int alloc_int(int **out) {
*out = malloc(sizeof(int));
if (!*out) return -1;
**out = 42;
return 0;
}
int *p;
alloc_int(&p); // p is now a heap pointer
Why not alloc_int(int *out) and out = malloc(...) inside? Because out is a local copy. The malloc address is written into the local slot and lost when the function returns. The caller’s p never sees it. Result: leak.
Grow a buffer that may move (realloc):
int grow(int **buf, size_t *cap) {
int *tmp = realloc(*buf, (*cap * 2) * sizeof(int));
if (!tmp) return -1;
*buf = tmp; // update caller's pointer if realloc moved the block
*cap *= 2;
return 0;
}
Free and null in one helper:
void int_free(int **pp) {
free(*pp);
*pp = NULL; // caller's pointer is nulled
}
Linked list operations without head special-casing:
void list_remove(Node **cur, int key) {
while (*cur) {
if ((*cur)->key == key) {
Node *dead = *cur;
*cur = dead->next; // patches previous node's next, or head
free(dead);
return;
}
cur = &(*cur)->next;
}
}
Array of pointers (not the same use case, just looks similar):
int main(int argc, char **argv);
Here the second * means “array of”; argv[i] is a char * (one C string). This is a different reason for the second indirection from the “modify caller’s pointer” cases above, but the syntax is the same.
Memory Leak Example
void alloc_int_BROKEN(int *out) {
out = malloc(sizeof(int)); // writes to LOCAL copy
*out = 42;
}
int *p = NULL;
alloc_int_BROKEN(p); // p is still NULL, malloc'd block leaked
Here’s what’s happening:
- Caller’s
psits at some address, say0x7ff0, holdingNULL. - The call copies
NULLinto the function’s parameter slotout, at some other address, say0x7fa0. out = malloc(...)writes the new heap address into0x7fa0. The caller’spat0x7ff0is untouched.- Function returns. Parameter slot is destroyed. No surviving pointer to the heap block. Leak.
The fix is to pass &p and take int **out, so *out = ... writes back through the address of p.
Storage class is independent of pointer level
The pointer indirection level says nothing about where the memory lives. Where the memory lives depends on how it was acquired.
| Source | Storage | Free? |
|---|---|---|
int x; local |
stack | no |
int a[10]; local |
stack | no |
int *p = malloc(...) |
heap | yes |
static or global |
static | no |
"literal string" |
static, read-only | no |
strdup("...") |
heap | yes |
Common mistake: returning a pointer to a stack local.
int *bad(void) {
int x = 42;
return &x; // x dies when bad() returns; pointer is dangling
}
C has no garbage collector
Every malloc / calloc / realloc / strdup (and friends that return heap memory) must be matched by exactly one free.
Here’s a simple guideline:
- Make ownership explicit. For every allocation, one piece of code is responsible for freeing it.
realloc(p, n): on success the oldpis invalid (even if not moved). On failure (NULLreturned) the oldpis still valid and still must be freed eventually.- After
free(p),pis dangling. Set toNULLif it might be reused. - Use AddressSanitizer (
-fsanitize=address) or valgrind to catch leaks and use-after-free.
One subtle point on const
int *p as a parameter does not promise the function won’t write through it. If you want that promise:
size_t my_strlen(const char *s);
const placement matters. const int * (pointer to const int), int *const (const pointer to int), and const int *const (const pointer to const int) all mean different things. For double pointers it gets thornier: const int **, int *const *, int **const are three different types. Worth its own note.
Quick Summary
Here’s one simple decision table to live by:
| The function needs to… | Take |
|---|---|
| Read or write one int | int *p |
| Iterate over N ints | int *p, size_t n |
| Return several ints to caller | several int * out params |
| Promise it will not write | const int *p |
| Allocate and hand back a new int | int **p |
| Reallocate (buffer may move) | int **buf |
| Null out the caller’s pointer | int **p |
Walk an array of int * |
int **p |
And if you don’t read the above and just want to know what’s important, here it is:
- Pass by value applies to pointers too. The function gets a copy.
- Use one more
*than the thing you want to mutate in the caller. int *p: change the data.int **p: change the caller’s pointer.- Storage class is orthogonal to pointer level. Stack/heap is decided by how the memory was acquired, not by how many stars the parameter has.
- No GC. Every heap allocation needs exactly one
free. Ownership must be explicit.
More C Behaviors
This section details a handful of C behaviors that explain most of the bugs students write and most of the wtf moments in production code. We do not include any niche stuffs, just stuffs that you might encounter during this term.
Recap: pass by value
As mentioned avoce, everything in C is passed by value. When you pass a pointer, the function gets a copy of the pointer. To let a function modify the caller’s variable of type T, pass a T * (the address of that variable). Inside the function, write through it with *.
This is the rule behind double pointers, output parameters, and most of what looked weird in the previous note.
Array decay
Array decay
In almost every expression, an array’s name is automatically converted to a pointer to its first element. The array’s “array-ness” (its size, the fact that it’s an array type) is dropped on the floor.
The standards literature calls this “conversion of array to pointer,” and informally people call it the array decaying into a pointer. The information loss is the key idea.
int a[10];
int *p = a; // a "decays" to &a[0], an int *
// identical to: int *p = &a[0];
int a[10];
foo(a); // foo receives an int *, not an array
int a[10];
a[3] = 7; // a decays to int *; a[3] is *(a + 3)
void f(int x[]) { ... } // looks like an array parameter
void f(int x[10]) { ... } // looks more specific
void f(int *x) { ... } // these three are all the SAME function
The compiler ignores the [] or [10] in a parameter declaration. The 10 is documentation.
Where decay does NOT happen
These three exceptions:
sizeof:
int a[10];
sizeof(a); // 10 * sizeof(int), e.g. 40 bytes. NO decay.
void f(int a[10]) {
sizeof(a); // sizeof(int *), e.g. 8. Because the parameter
// is actually int *, and the 10 was a lie.
}
This is why you cannot find an array’s length inside a function from the parameter alone. You MUST pass the length separately.
Address-of (&):
int a[10];
&a; // type: int (*)[10], pointer to array of 10 ints
// NOT int **, and NOT int *
a; // type after decay: int *
&a and a have the same numeric address, but different types. Pointer arithmetic on them advances by different amounts:
a + 1; // advances by sizeof(int)
&a + 1; // advances by sizeof(int) * 10
String literal initializing a char array:
char s[] = "hello"; // copies 6 bytes into s. No decay of the literal here.
char *p = "hello"; // p points to static read-only memory.
Consequences
These are a short summary of the consequences of array decay:
- You cannot pass an array by value. Passing
apasses a pointer. The function cannot see the size. - You cannot return an array. Same reason; functions return pointers, and returning a pointer to a local array is UB.
sizeofonly gives the array size where the compiler can still see the array type. Usually that means the same scope as the declaration.int **is NOT how you pass a 2D array.int m[3][4]decays toint (*)[4], a pointer to an array of 4 ints. The right parameter isint m[][4]orint (*m)[4].
void f(int (*m)[4], int rows); // correct
void f(int **m); // WRONG for int m[3][4]
Undefined behavior (Undefined Behavior)
UB does not mean “might crash.” It means the standard imposes no requirements, and modern compilers ASSUME UB never happens, then optimize on that assumption. Code that worked at -O0 can vanish, behave randomly, or appear to violate cause and effect at -O2.
This is a classic example:
int foo(int x) {
if (x + 1 < x) return 1; // signed overflow is UB
return 0; // compiler: "x + 1 < x cannot happen, return 0"
}
The compiler may eliminate the comparison entirely because it would only be true on signed overflow, which the standard says cannot happen.
Common sources to memorize:
- Signed integer overflow
- Dereferencing NULL, dangling, or uninitialized pointers
- Out-of-bounds array access
- Use-after-free, double-free
- Modifying a string literal
- Strict aliasing violations (see below)
- Data races
To tackle this: compiling with -fsanitize=address,undefined catches most of these at runtime.
Integer promotions and signed/unsigned mixing
Anything narrower than int gets promoted to int before arithmetic. Mixed signed/unsigned operations convert the signed operand to unsigned.
unsigned int u = 1;
int s = -1;
if (s < u) puts("yes");
else puts("no");
// Prints: no
// s is converted to unsigned, becomes UINT_MAX, which is > 1.
uint8_t a = 200, b = 200;
int c = a + b; // a and b promoted to int. c = 400. No overflow.
size_t n = some_value();
for (int i = 0; i < n; i++) { ... }
// i is signed, n is unsigned. The comparison converts i to size_t.
// If i goes negative for some reason, the loop runs nearly forever.
Never compare signed and unsigned without thinking about it. Either match types, or cast explicitly.
Evaluation order is mostly unspecified
The order in which function arguments are evaluated, and most operands of most operators, is unspecified. If two side effects on the same object happen without a sequence point between them, you get UB, which is a nightmare.
int i = 0;
printf("%d %d\n", i++, i++); // unspecified order; pre-C11 also UB
a[i] = i++; // UB
f() + g(); // order of f and g calls unspecified
Here are the list of sequenced operators (safe between their operands):
&&and||(short-circuit; also create a sequence point)?:(condition is fully evaluated before the chosen branch),(the comma operator, not the comma between function arguments)
Strict aliasing
You may access an object only through a pointer of a compatible type, or through char * / unsigned char *. Casting a pointer to an incompatible type and dereferencing it is UB, and the optimizer will reorder loads and stores around it as if the aliasing never happens.
float f = 1.0f;
int i = *(int *)&f; // UB. Strict aliasing violation.
The correct way to reinterpret bits is using memcpy:
int i;
memcpy(&i, &f, sizeof i); // legal. Compilers turn this into a register move.
Unions are also allowed for type punning in C (this is one of the C-vs-C++ differences). This confuses people most often when reading binary file formats or doing low-level networking. Use memcpy instead and stop fighting the optimizer.
Strings and string literals
char *p = "hello"; // p points to static, read-only memory.
p[0] = 'H'; // UB. Probably segfault.
char s[] = "hello"; // s is a stack array, 6 bytes, mutable.
s[0] = 'H'; // fine.
Other things to take note of:
sizeof("abc")is 4. Includes the null terminator.strlen("abc")is 3. Does not.strncpydoes NOT guarantee null termination. If the source is longer thann, no\0is written. It also pads short sources with extra zeros. It is not a safestrcpy.strncat’snis “max bytes to append,” not buffer size.snprintfreturns the number of bytes it WOULD have written, which can exceed the buffer. Useful for sizing.
Struct padding
The compiler inserts padding between members to satisfy alignment, and adds trailing padding so arrays of the struct stay aligned.
struct S {
char c; // 1 byte
int i; // 4 bytes, but needs 4-byte alignment
};
sizeof(struct S); // typically 8, not 5
Here’s the layout:
offset 0: c
offset 1..3: padding
offset 4..7: i
What consequences do this bring?
memcmpon two structs may report inequality even when all members match, because padding bytes are indeterminate. Zero-init with= {0}ormemsetfirst if you must compare withmemcmp.- Reorder members from largest to smallest alignment to save space.
- Use
offsetoffrom<stddef.h>to find member offsets portably. Never hand-compute.
Reading declarations right-to-left from the name
C declaration syntax mirrors usage. Read outward from the variable name following the precedence of (), [], and *.
int *a[10]; // a is an array of 10 pointers to int
int (*a)[10]; // a is a pointer to an array of 10 ints
int *f(void); // f is a function returning int *
int (*f)(void); // f is a pointer to a function returning int
int (*fp[5])(int); // fp is an array of 5 pointers to functions (int) -> int
When in doubt, use typedef:
typedef int (*handler_t)(int);
handler_t fp[5]; // much clearer
Linkage: static, extern, file scope
You probably see some c files containing the word static or extern. Here’s the basic rule:
staticat file scope: internal linkage. The symbol is private to this translation unit. Use this for helpers in a.cfile.staticat block scope: same name visibility, but the variable’s lifetime is the entire program and it’s zero-initialized once.extern: this is a declaration only, no storage. Used in headers to announce a variable defined in some.cfile.- File scope variable with NO
static: external linkage by default. Visible to all translation units.
The most common bug you might make as beginner is to define a variable in a header. It is kinda acceptable for class projects, but NOT anywhere near production.
// header.h
int global_counter; // every .c that includes this defines its own,
// OR you get a multiple-definition link error.
Correct pattern:
// header.h
extern int global_counter; // declaration only
// one .c file
int global_counter; // the single definition
Preprocessor macros
Some macros definition might not work the way you want them to:
#define SQ(x) x*x
SQ(1 + 2); // expands to: 1 + 2*1 + 2 = 5
Always parenthesize macro arguments AND the whole expansion:
#define SQ(x) ((x)*(x))
SQ(1 + 2); // ((1+2)*(1+2)) = 9
Beware side effects in macro arguments:
#define MAX(a,b) ((a) > (b) ? (a) : (b))
MAX(i++, j++); // the larger one gets incremented TWICE
When you can, do static inline functions. They’re type-safe, debuggable, and don’t evaluate arguments multiple times.
char may be signed or unsigned
Whether plain char is signed or unsigned is implementation-defined. On most ARM platforms it’s unsigned; on x86 it’s typically signed.
char c = 0xFF;
if (c == 0xFF) { ... } // false on platforms with signed char.
// c sign-extends to -1, compared to 255.
Here’s some generic rules to live by peacefully:
- For text, use
char. - For bytes (binary data), use
unsigned charoruint8_t. - For indexing tables by character value, cast to
unsigned charfirst:
isalpha((unsigned char)c); // the ctype functions require this
Passing a negative char to isalpha or similar is UB on systems with signed char.
How to make life easier when writing in C?
Always build with this flags at all times:
-std=c11 -Wall -Wextra -Wpedantic -Wshadow -Wconversion
-Wstrict-prototypes -Wmissing-prototypes
For test and debug builds, add sanitizers:
-fsanitize=address,undefined -fno-omit-frame-pointer -g
Most C bugs that ship to production were warnings the developer ignored, so don’t ignore the warnings and start fixing them.
Quick Summary
If you do not read anything else, hopefully this table is useful for you.
| Topic | Content |
|---|---|
| Pass by value | Function modifies a copy, caller sees nothing |
| Array decay | sizeof lies in functions; int ** is not 2D |
| Undefined behavior | Compiler assumes UB cannot happen, deletes your code |
| Integer promotions | Signed vs unsigned comparisons go wrong silently |
| Evaluation order | f(i++, i++) is unspecified or UB |
| Strict aliasing | Type-punning casts break under optimization; use memcpy |
| String literals | Read-only; modifying is UB |
| Struct padding | sizeof is not the sum of member sizes; memcmp lies |
| Declarator syntax | int *a[10] vs int (*a)[10] are different |
| Linkage | Variables in headers cause multi-def or duplication |
| Macros | Always parenthesize; never put side effects in args |
char signedness |
Implementation-defined; cast to unsigned char for bytes |
50.005 CSE