- Why Kernel and User Mode? (Basic)
- Two Protection Mechanisms (Basic)
- Adjusting
XP(Basic) - Classify and Trace (Intermediate)
- Timer Quantum, Edge Detection, and Scheduler Overhead (Challenging)
- Who Should Adjust
XP? (Basic) - Implications of
PC31as Status Bit (Intermediate) - Trap, System Call, and Context Switch Trace (Challenging)
50.002 Computation Structures
Information Systems Technology and Design
Singapore University of Technology and Design
Virtual Machine
Each topic’s questions are grouped into three categories: basic, intermediate, and challenging. You are recommended to do all basic problem set before advancing further.
Why Kernel and User Mode? (Basic)
An SUTD student claims: “User mode restrictions are unnecessary. The compiler can simply refuse to emit LD or ST whose address is in the kernel range, and refuse to emit branches that cross PC31.”
Give two distinct technical arguments rebutting this claim. Each argument should identify a class of attacks the compiler approach cannot prevent.
1. Trust boundary: The compiler is part of the user toolchain. A malicious user can use a different compiler, hand write assembly, patch the binary after compilation, or simply use a hex editor. Any safety property whose enforcement lives in software the attacker controls is not enforcement at all, it is a request. Hardware is the only place a rule can be made binding on code the attacker chose to run.
2. Computed addresses: Many memory and control instructions take an address from a register that is computed at runtime: LD(Ra, k, Rc) loads from Reg[Ra] + k, and JMP(Rc) jumps to Reg[Rc]. The compiler in general cannot know the runtime value of Reg[Ra] (it may depend on user input, on a pointer from a data structure, on the result of an arithmetic loop). Even a perfectly cooperative compiler cannot statically refuse to emit a load whose address turns out to be in the kernel range, because the compiler does not know what that address will be. The check has to happen when the value is actually formed, which is **in the datapath, in hardware**.
Two Protection Mechanisms (Basic)
A user mode process is executing at \(\text{PC} = \texttt{0x00400000}\). The register file holds \(\text{Reg}[R30] = \texttt{0x80001234}\) and \(\text{Reg}[R2] = \texttt{0x80000020}\).
The next two instructions in the user program are:
JMP(R30, R0)
LD(R2, 0x10, R3)
Answer the following questions:
- What value lands in
PCafterJMP(R30, R0)executes? What value lands inReg[R0]?Show AnswerThe candidate next PC from JMP is
Reg[R30] = 0x80001234. In user mode (PC31 = 0), the hardware masks the top bit of any value being written into PC, so the actual write is0x00001234. The link register receivesPC + 4 = 0x00400004, with no masking needed since that value already has MSB 0. - Suppose in alternate ISA, a
JMPinitiated to kernel space that failed caused the user program continues with the next instruction at the user side. What address does the memory unit see whenLD(R2, 0x10, R3)executes, and what doesReg[R3]receive?
The address adder computes Reg[R2] + 0x10 = 0x80000030. The notes specify that LD, LDR, and ST in user mode ignore the MSB of the computed address, so the memory unit sees 0x00000030. Reg[R3] receives M[0x00000030], which is in user space.
- The two questions above involve two different hardware mechanisms that both protect the kernel. Name where in the datapath each mechanism lives, and explain in one sentence why a single mechanism cannot cover both cases.
Show Answer
The ALU computes `EA`
Reg[R2] + 0x10 = 0x80000030. If we specify thatLD,LDR, andSTin user mode ignore the MSB of the computed address, we need to ensure that the datapath always mask the highest bit of `EA` to `0`, such that the Memory Unit receives0x00000030. For the `JMP` protection, we `AND` `PC31` with `Reg[Ra]31` to disallow switching from user to kernel mode via `JMP`.
Adjusting XP (Basic)
The asynchronous interrupt handler at XAddr ends with
SUBC(XP, 4, XP)
JMP(XP)
The ILLOP handler, used for SVC and exceptions, ends with:
JMP(XP)
with no SUBC. Both handlers entered kernel mode via the same hardware mechanism: the Control Unit set Reg[XP] <- PC + 4 and forced PC to a fixed entry address.
Answer the following questions:
-
Pick a concrete user address, say
PC = 0x00002040, and explain whatReg[XP]contains at the moment each handler begins executing, for the case where the user instruction at0x00002040was the one that caused the entry.Show AnswerIn both cases the hardware writes
Reg[XP] <- PC + 4 = 0x00002044at the moment of entry. The hardware does not know or care whether the trigger was async or sync; it always saves the address of the next instruction. -
For each handler, state the address at which the user program resumes after the handler returns. Show that the async handler’s resume address is correct and the ILLOP handler’s resume address is also correct, given the purpose of each.
Show AnswerAsync return computes
Reg[XP] - 4 = 0x00002040and jumps there. This re-executes the instruction at0x00002040, which is what we want: that instruction was interrupted before it could commit, so it has not yet had its effect, and we need it to run.ILLOP return jumps directly to
Reg[XP] = 0x00002044. This skips the instruction at0x00002040, which is also what we want: that instruction was anSVC(or other illegal opcode used as a supervisor call), and its entire purpose was to trap into the kernel. Re-executing it would just trap again. The kernel has now serviced the request, so we move past it. -
What would go wrong, in one sentence each, if you swapped the two endings: the async handler used
JMP(XP)alone, and the ILLOP handler usedSUBC(XP, 4, XP); JMP(XP)?Show AnswerAsync with
JMP(XP)alone: the resumed user program silently skips the instruction it was actually running when the timer fired, losing whatever side effect that instruction was supposed to produce.ILLOP with
SUBC(XP, 4, XP); JMP(XP): the resumed user program lands back on theSVCinstruction, which traps again, and the process is stuck in an infinite trap loop never making progress past the syscall.
Classify and Trace (Intermediate)
For each of the five events below, answer in three things:
- Is the event a synchronous or asynchronous interrupt?
- Which Beta entry point does the hardware jump to:
RESET,ILLOP, orXAddr? - What value does the hardware write into
Reg[XP], expressed in terms of the program counter at the moment of the event?
| Event | Explanation |
|---|---|
| A | A user program executing at PC = 0x00001000 issues SVC(0) to call getchar. |
| B | A user program executing at PC = 0x00001100 divides by zero. (Assume Beta treats this as an illegal instruction.) |
| C | The user presses a key on the keyboard while a user program is in the middle of a long MUL loop. |
| D | The system is powered on. |
| E | A user program executing at PC = 0x00001200 performs JMP(R5) with Reg[R5] = 0x00ABCD00, where 0x00ABCD00 happens to contain a word that is not a valid Beta opcode. |
Then: in which two of these events is Reg[XP] - 4 the address the kernel should resume at, and in which two is Reg[XP] itself the right resume address? Justify the leftover case.
Classification table:
| Event | Sync / Async | Entry point | Reg[XP] |
|---|---|---|---|
| A | Synchronous | ILLOP (0x80000004) | 0x00001004 |
| B | Synchronous | ILLOP (0x80000004) | 0x00001104 |
| C | Asynchronous | XAddr (0x80000008) | current PC + 4, wherever the MUL loop happened to be |
| D | Neither (reset is not an interrupt) | RESET (0x80000000) | undefined / not used |
| E | Synchronous | ILLOP (0x80000004) | 0x00ABCD04 |
The classification rule is the one stated in the notes: synchronous interrupts are caused by the instruction the CPU is currently executing (faulty opcode, divide by zero, SVC, bad branch target), while asynchronous interrupts come from hardware unrelated to the instruction stream (timer, keyboard, mouse).
Note for event E: by the time the trap fires, the CPU has already fetched from 0x00ABCD00 and discovered the opcode is invalid, so PC at the moment of the trap is 0x00ABCD00 and Reg[XP] becomes 0x00ABCD04. The fact that the user jumped to that address is irrelevant; the trap is attributed to the bad fetch, not to the JMP.
Resume addresses:
Reg[XP] - 4is correct for C only. Asynchronous interrupts pause an instruction that had not yet committed, so the kernel must re-execute it.Reg[XP]itself is correct for A and B. In A the SVC has done its job (trap into the kernel) and should not run again. In B the divide by zero is a fatal exception; the kernel will most likely terminate the process, but if for some reason it chose to resume, it must resume past the faulting instruction, otherwise the same fault recurs immediately.- D is the leftover case. Reset is not a resume at all. There is no interrupted process, no saved state, and
Reg[XP]is meaningless. The kernel boots from scratch. - E is interesting: it is an illegal opcode that the user program reached by an otherwise legal jump. The kernel will almost certainly terminate the process (this is an exception, not a deliberate SVC), so there is no resume. If for some reason the kernel did try to resume, it would have to resume at
Reg[XP]itself, since re-executing the bad opcode would just trap again.
Timer Quantum, Edge Detection, and Scheduler Overhead (Challenging)
A Beta CPU runs at \(f_\text{cpu} = 25\) MHz. The OS scheduler uses an asynchronous free running counter that is 24 bits wide, clocked at \(f_\text{tmr} = 100\) kHz. Bit \(k\) of this counter is fed through a rising edge detector that is itself clocked by the CPU clock, and the output of that edge detector drives the IRQ line of the Control Unit.
The purpose of this scheduler is to perform context switching.
Here’s an illustration of the setup:

Answer the following questions.
-
Derive an expression for the quantum \(T_k\) (the wall clock time between two consecutive
IRQpulses) as a function of \(k\) and \(f_\text{tmr}\). Compute \(T_k\) in milliseconds for \(k = 10\).Show AnswerBit
kof a free running counter toggles every2^kcounter ticks, so its period is2^(k+1)counter ticks. Each counter tick lasts1/f_tmrseconds, so:T_k = 2^(k+1) / f_tmrFor
k = 10andf_tmr = 100kHz:T_10 = 2048 / 10^5 ≈ 20.48 ms. -
The statement above stresses that the rising edge detector be clocked by the CPU clock, not the timer clock. Explain the precise failure mode that occurs if the detector is clocked by the timer clock instead. Your answer should refer to how the Control Unit samples
IRQand to the duration of the resulting pulse measured in CPU cycles.Show AnswerThe Control Unit samples
IRQonce per CPU clock edge. For correct behaviour the pulse onIRQmust be high for exactly one CPU cycle, otherwise the Control Unit will seeIRQ = 1on multiple consecutive cycles and re-trap toXAddrrepeatedly.If the edge detector is clocked by the slower timer clock (100 kHz here), its output pulse lasts one timer cycle, which is
f_cpu/f_tmr = 250CPU cycles wide. The Control Unit will therefore seeIRQ = 1on roughly 250 consecutive CPU cycles. Each of those cycles either (1) re-asserts the trap (settingReg[XP] <- PC + 4andPC <- XAddr) or (2) ifPC31is now 1 and the CPU masksIRQ(or disable interrupt), it might re-trigger interrupt again after it leaves the interrupt handler, if the interrupt handler takes less than 250 CPU cycles. In the unmasked case, `Reg[XP]` gets overwritten 250 times, and it ends up pointing into the handler itself rather than into the interrupted user program. This means the interrupted process is unrecoverable. -
Suppose one full context switch (entry to
XAddr, register save, scheduler decision, register restore, return) costs \(C = 200\) CPU cycles. This means we have an overhead of 200 CPU cycles. Using the result from the first question with \(k = 10\), compute the fraction of CPU time spent on context switching overhead. Then determine the smallest \(k\) for which this overhead is strictly below 0.1%.Show AnswerNumber of CPU cycles per quantum is
N_k = T_k x f_cpu = 2^(k+1) x f_cpu / f_tmr = 2^(k+1) x 250.For
k = 10:N_10 = 2048 x 250 = 512,000. Overhead is200 / 512,000 ≈ 0.039%.For overhead below 0.001 we need
200 / (2^(k+1) x 250) < 0.001, that is2^(k+1) > 800. Since2^10 = 1024 > 800and2^9 = 512 < 800, the *smallest* acceptable value isk+1 = 10, sok_min = 9.We can easily prove this by computing more cases: at
k = 9,N_9 = 1024 x 250 = 256,000and200/256,000 ≈ 0.078% < 0.1%. Atk = 8,N_8 = 128,000and overhead is≈ 0.156%. -
The scheduler handler return sequence ends with
SUBC(XP, 4, XP)followed byJMP(XP). A student proposes swapping them, on the grounds that “JMP is one cycle, SUBC is one cycle, and reordering saves nothing but is also harmless”. Is the student correct? Explain in two sentences, and state the semantic purpose of theSUBCstep.Show AnswerThe student is wrong.
JMP(XP)transfers control immediately, so theSUBCthat follows it never executes; the program counter is already pointing elsewhere.The
SUBCexists because the hardware savesPC + 4(the address of the instruction after the interrupted one) intoReg[XP], but on resume we want to re-execute the interrupted instruction itself, whose address isReg[XP] - 4. -
Consider an alternative ISA in which, on an asynchronous interrupt, the hardware saves the current PC (not PC + 4) into
Reg[XP]. Describe how the return code changes, and identify the new constraint this places on the hardware regarding the interrupted instruction’s side effects. Is this constraint stricter, looser, or equivalent to Beta’s?Show AnswerThe handler return becomes simply
JMP(XP)with no decrement, which is slightly cleaner.The constraint is identical to Beta's, just packaged differently. In both schemes the interrupted instruction must **not** have committed any architectural side effects (no register write, no memory store, no flag update), because both schemes resume by re-executing it. Beta saves PC + 4 and in turn, it must decrements on return (of interrupt), while the alternative ISA saves PC directly so it doesn't have to decrement upon return from interrupt handler. Either way the hardware must abort the instruction cleanly. The constraint is therefore equivalent. The only real difference is that Beta forces every handler to **remember** the decrement, which is one more place a kernel programmer can introduce a bug.
**Note**: If this alternate ISA also saves `PC` on `ILLOP`, then it must **increment** `XP` by `4` before `JMP(XP)` and returning to the calling process. Otherwise, the trap/SVC will be re-executed.
Who Should Adjust XP? (Basic)
A scheduler’s final routine should “return to user” (resume a process). This procedure needs to know whether the process being resumed was last suspended by an asynchronous interrupt or by an SVC, because the former requires SUBC(XP, 4, XP) and the latter does not. Propose a concrete way for the kernel to record this in the process table, and explain why mixing the two cases up is a correctness bug, not merely a performance bug. Describe the visible symptoms in each direction of mistake.
There are multiple viable answer to this. Here's one proposed method
Add a one bit field, call it resume_kind, to each process table entry. The async interrupt entry handler at XAddr writes resume_kind = 1 (meaning "decrement XP on resume") immediately after saving registers. The ILLOP entry handler writes resume_kind = 0 (meaning "do not decrement"). The restore stub branches on this bit and conditionally executes the SUBC.
Mixing the two up is a correctness bug:
- SVC suspended process resumed with the decrement.
Reg[XP]was set to the address of the instruction afterSVC, namely0x00002008. Decrementing gives0x00002004, which is theSVCitself. On resume, the user re-executes the supervisor call, which traps back into the kernel, which resumes again with the decrement, and so on forever. The visible symptom is an infinite trap loop. The user program never makes progress past the syscall. - Async suspended process resumed without the decrement. The hardware saved
PC + 4, wherePCwas the address of the interrupted instruction. Without the decrement, the resume jumps toPC + 4, which is the instruction after the one that was interrupted. The interrupted instruction is silently skipped. The visible symptom depends on what that instruction was: a missing register update, a missing store, an arithmetic result that never happened. Often the program does not crash immediately but produces wrong results much later, which is the worst kind of bug.
Neither symptom is "the program ran a bit slower". Both change observable behaviour, so the bit is mandatory.
Implications of PC31 as Status Bit (Intermediate)
A user mode program executing at \(\text{PC} = \texttt{0x00400000}\) contains the instruction
LD(R0, 0x1000, R1)
with Reg[R0] holding the value 0x80000000. Trace what actually happens: what address is presented to the memory unit, what value lands in Reg[R1], and whether the kernel data nominally stored at 0x80001000 was protected. Then state what additional hardware would be required for a user load of 0x80001000 to actually fault rather than silently “aliasing”.
The effective address computed by the adder is Reg[R0] + 0x1000 = 0x80001000. Beta ISA specify that for LD, LDR, and ST, the MSB of the computed address is ignored when in user mode. So the value sent to the memory unit is 0x00001000, and Reg[R1] receives M[0x00001000], a location in user space. The kernel data at 0x80001000 was not read.
So protection is achieved, but by aliasing rather than by faulting. The user sees a shadow of every kernel address 0x8000XXXX at 0x0000XXXX. The OS must therefore be careful not to store anything sensitive in low memory and must accept that user programs can read and write the low addresses freely.
To make a user access to 0x80001000 fault outright, you need a hardware check that compares the MSB of the computed address against PC31 and raises a synchronous trap (e.g. routes the PC to ILLOP) when a user mode access targets the kernel half. In practice this is one of the jobs of an **MMU**: a privilege bit on each page or region, checked by a comparator in the memory access stage.
A user mode program executes BEQ(R0, label, R1) where the computed branch target \(\text{PC} + 4 + 4 \cdot \text{SXT(literal)}\) has MSB 1. Describe the hardware mechanism in Beta that prevents this from entering kernel mode. Then contrast with JMP(R31) when Reg[R31] contains 0x80004000. Are both protected by the same mechanism, and at what stage of the datapath?
For both instructions the protection is enforced at the write to the PC register itself, not at the adder or at the register file. The general rule is: when in user mode (PC31 == 0), the bit that would land in PC[31] on the next clock edge is forced (or ANDed) to 0.
For BEQ, the branch target adder may compute any 32 bit value, including one with MSB 1. The MSB of that value is masked before it is muxed into the PC.
For JMP, the target comes from Reg[R31] which the user has free reign over (so Reg[R31] = 0x80004000 is fine to store). Again, the MSB is masked on the path from the register file output to the PC input.
So yes, both are protected by the same mechanism (masking PC[31] on the next-PC path) and at the same stage (the PC write). The masking happens after whatever computation produced the candidate next PC, which is what makes it impossible for the user to escape via a clever target value.
Our lecture notes name three legal entry points to kernel mode: RESET (0x80000000), ILLOP (0x80000004), and XAddr (0x80000008). Why is it a design choice that these addresses be hardwired into the CPU rather than being soft configurable by the kernel at boot?
Hardwiring guarantees that the very first instruction executed after any privileged transition is one chosen by the kernel author at assembly time. The kernel author places the handler at the agreed address of XAddr or ILLOP, and the CPU's hardwired entry point is chosen to match. The property the hardware needs to give the OS is: if PC31 transitions from 0 to 1, the new PC is one of three known kernel addresses. Hardwiring is the simplest way to provide that property.
There are three reasons to prefer hardwiring these entry points on Beta:
- Bootstrap necessity for
RESET.RESETmust be hardwired because it is the address the CPU fetches from at power-on, before any code has run. There is nothing that could have set it. This is a chicken and egg problem: you would need kernel code to configureRESET, butRESETis how you get to kernel code in the first place. Hardwiring breaks the dependency. - Simplicity of the datapath. Making
XAddrandILLOPwritable would add a register, a write port, and a mux for each entry point, plus the control logic to protect those registers from user mode writes. Hardwiring reduces all of this to three constant values embedded in the Control Unit. - One fewer thing for the kernel to get right. A writable entry register is a new initialisation step in the boot sequence, and therefore a new way for a buggy kernel to fail. Hardwiring eliminates that class of bug entirely: there is no initialisation to forget and no register to accidentally overwrite later.
Note that hardwiring is *not* the only valid design choice. Real architectures such as x86 (with the IDTR register) and ARM (with VBAR) make the interrupt vector base writable by privileged code. This buys flexibility such as a relocatable kernel, multiple handler tables, and virtualization support, at the cost of the extra hardware and the discipline to set the register early in boot. For a teaching CPU like Beta, the hardwired approach is the right tradeoff; for a production CPU, the writable approach usually wins.
Trap, System Call, and Context Switch Trace (Challenging)
A user process P1 is executing a C statement int c = getchar();. The relevant assembly is:
0x00002000: ADDC(R31, 3, R0) || service index 3 = read keyboard
0x00002004: SVC(0) || illegal opcode 1, traps to ILLOP
0x00002008: ST(R0, c, R31) || store returned char available at `R0` into c
SVC(0) is an instruction with an opcode that does not correspond to any real Beta instruction, used by convention to make a supervisor call (generic). Suppose P1 is at PC = 0x00002004 when the SVC is fetched. The ILLOP handler at 0x80000004 reads R0, dispatches to a keyboard read service routine because it found the value 3 in R0 (note that these values are arbitrary). The service routine marks P1 as blocked (waiting for input) and calls the scheduler. The scheduler picks a different ready process P2 to run.
Trace, in order, every write to PC and Reg[XP] from the moment the SVC is fetched until P2 begins executing its first instruction. For each write, label it as hardware (datapath does it automatically) or software (an explicit instruction in some handler does it).
- Hardware (cycle that fetches
SVC(0)at0x00002004): the Control Unit decodes the illegal opcode, assertsPCSEL = 011, writesReg[XP] <- PC + 4 = 0x00002008, and writesPC <- 0x80000004(ILLOP). The next fetch will be from the kernel, withPC31 = 1. - Software (in the ILLOP handler): Just like callee entry procedure, it should save
R0throughR30of P1 into P1's slot in the process table. The currently held valueReg[XP] = 0x00002008is saved as part of this, so P1 **remembers** where to resume. - Software (handler dispatch logic): reads
Reg[R0] = 3, calls the keyboard read service routine, e.g: `KBD_READ`. - Software (keyboard service routine): The service routine marks P1 as blocked and calls the scheduler to start P2. No
PCorXPwrites happen here, just ordinary subroutine linkage between one kernel code to another kernel code. - Software (scheduler): selects P2 from the ready queue. Note that while the scheduler is running,
PC31 = 1, so any timer interrupt that fires now is ignored by the hardware (Beta is non reentrant). - Software (scheduler restore P2's context, also known as *stub*): loads
R0throughR30for P2 from P2's process table slot. This includes loading P2's saved return address intoReg[XP]. - Software (final instruction of the restore procedure): executes
JMP(XP)to resume P2. This is assuming that `XP` has already been adjusted properly prior to this. - Hardware (the
JMP(XP)): writesPC <- Reg[XP](withPC[31]masked to 0 because the destination is in user space). On the next cycle P2 fetches its first resumed instruction withPC31 = 0.
In the lecture notes, we state that Beta disables interrupt when PC31 = 1. This is called non reentrant: while PC31 = 1, IRQ is masked (ignored) in hardware. Identify TWO specific moments in the trace you did in the previous part above at which an asynchronous timer interrupt, if not masked, would corrupt the system. For each, describe the corruption concretely (what register or memory location ends up holding what wrong value).
Moment 1: while the ILLOP handler is saving P1's registers. Suppose the handler has already executed ST(R0, save_location) and ST(R1, save_location + 4), and is about to save R2. A timer interrupt fires. The hardware traps to XAddr, writing Reg[XP] <- PC + 4, where PC is now somewhere inside the ILLOP handler. The original Reg[XP] = 0x00002008 (P1's resume address) is overwritten and lost. The new `XAddr` handler then begins saving "registers", but the register file by this point holds a mix: some entries are still P1's, others have been clobbered by handler scratch work. The state written to the process table is **meaningless**, and on the eventual resume P1 sees garbage register contents and a Reg[XP] pointing into kernel code.
Moment 2: while the scheduler is restoring P2's registers. Suppose the restore loop has loaded R0 through R10 of P2 but has not yet loaded R11 through R30. A timer interrupt fires. The hardware traps to XAddr. The new handler invocation now saves the current register file as if it were a meaningful process state. But the current register file is half P2 (R0 through R10) and half whoever ran previously, perhaps the scheduler's own working values in R11 through R30. This Frankenstein state is written into some process table slot. On its eventual resume, P2 (or whichever process the kernel believes that slot belongs to) sees registers that no real execution ever produced. If the restore loop happened to have already touched XP, the saved XP of this spurious trap points into the middle of the scheduler's restore loop, and "resuming" it would re-enter the scheduler at an arbitrary instruction.
In both cases the deeper failure is the same: the process table entry that represents a process must always reflect a **consistent** snapshot of one process at one moment in time, and the save and restore code achieves that only by virtue of running **atomically**. You might want to search what "atomic" means. We will dive deeper into this next term.
50.002 CS