- Stack and Procedures
50.002 Computation Structures
Information Systems Technology and Design
Singapore University of Technology and Design
Stack and Procedures
You can find the lecture video here. You can also click on each header to bring you to the section of the video covering the subtopic.
Overview
In the previous chapter, we learned the basics of how to naively compile C-language into \(\beta\) assembly language. \(\beta\) UASM provides a layer of abstraction such that we don’t need to bother ourselves with the details on how to load each and every bytes of instruction onto the memory unit, or keeping up with accounting matters such as physical memory addresses (we can replace these with labels instead).
In this chapter, we will learn about function call procedures, and why we need to understand another concept called the stacks. Both will allow us to have reusable code fragments that are called as needed.
Procedures and Functions
Consider the following declaration and implementation of function fact
below. It receives one argument int n
, and returns an int
.
1
2
3
4
5
6
7
8
int fact(int n){
int r = 1;
while (n > 0){
r = r*n;
n = n-1;
}
return r;
}
The function is implemented as a fixed set of procedure / instructions. We can call it anywhere afterwards:
1
2
int result_1 = fact(4);
int result_2 = fact(9);
If we were to naively assemble this, we can translate this into the following \(\beta\) assembly source code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
.include beta.uasm
|| we call fact(4) first
LD(R31, n_1, R1) | load 4 to R1
ADDC(R31, 1, R2) | instantiate r
check_while_fact_4: CMPLT(R31, R1, R0) | compute whether n_1 > 0
BNE(R0, while_true_fact_4, R31) | if R0 != 0, go to while_true_fact_4
ST(R2, result_1, R31) | store to result_1
|| then we call fact(9)
LD(R31, n_2, R1) | load 9 to R1, rewriting its old value
ADDC(R31, 1, R2) | reset r
check_while_fact_9: CMPLT(R31, R1, R0) | compute whether n_1 > 0
BNE(R0, while_true_fact_9, R31) | if R0 != 0, go to while_true_fact_9
ST(R2, result_2, R31) | store to result_2
HALT() | stop
while_true_fact_9: MUL(R1, R2, R2) | r = r*n
SUBC(R1, 1, R1) | n = n-1
BEQ(R31, check_while_fact_9, R31) | always go back to check_while_fact_9
while_true_fact_4: MUL(R1, R2, R2) | r = r*n
SUBC(R1, 1, R1) | n = n-1
BEQ(R31, check_while_fact_4, R31) | always go back to check_while_fact_4
n_1 : LONG(4)
n_2 : LONG(9)
result_1 : LONG(1)
result_2 : LONG(1)
There’s obviously a few issues here:
- The code is not scalable. What if we call the function
fact
many more times? How long the assembly code is going to be?1 2 3 4 5 6
int result_1 = fact(4); int result_2 = fact(9); int result_3 = fact(20); int result_4 = fact(39); int result_5 = fact(18); int result_6 = fact(99);
- There’s so many repeated, boilerplate code. We are wasting memory (RAM) space.
while_true_fact_4
andwhile_true_fact_9
is technically identical for all 3 instructionsMUL
,SUBC
, andBEQ
, only differ in the second argument ofBEQ
: either branching tocheck_while_fact_4
andcheck_while_fact_9
.- Same issue with
check_while_fact_4
andcheck_while_fact_9
(theCMPLT
,BNE
, andST
instructions)
- How do we know if we can overwrite the existing contents on any register
Rx
?
In principle, a function:
- Is a callable, reusable series of instructions.
- It has a single NAMED entry point. It means we know the address of first instruction for this function in memory.
- Parameterizable (can access or receive a predefined number of arguments).
- Has local variables, which cannot be accessed anymore once the function returned.
Once a function returns, the CPU needs to know how to return to the caller, i.e: to know where is the return address of the caller.
For example, given this simple code:
1
2
3
4
5
6
7
8
9
10
11
int fact(int n){ --- (2)
int r = 1; --- (3)
while (n > 0){ --- (4)
r = r*n; --- (5)
n = n-1; --- (6)
}
return r; --- (7)
}
int result_1 = fact(4); --- (1)
int result_2 = fact(9); --- (8)
We know that we need to execute fact
twice, the first run with argument 4
and the second run with argument 9
.
-
The first line of instruction that is executed is labeled with (1) above. We then enter the function, executing the lines labeled as (2) to (7) in sequence.
-
After (7), the CPU must know how to return to the caller, that is the address of instruction (1).
-
Instruction (8) tells the CPU to enter the function again for the second run, thus executing (2) to (7) for another round, before finally stopping.
Procedure Linkage and Stack
Referring to the fact
example in the previous section, notice a few accounting issues that we need to address in order to have a scalable and memory efficient function calls.
Assuming we have a single CPU, and that the instructions are loaded to the Memory Unit from address 0
and onwards sequentially,
We know that r
and n
are local variables within function fact
. We can no longer access them, e.g: print them outside of the function. This means that we need to clean them up (from any storage unit) once the function returns.
We also need to establish a location where:
- The function can always access its arguments:
4
and9
for the first and second run respectively. - The function is expected to put the return value
- The CPU can find the address of the caller, i.e: where instruction (8) is in memory.
Procedure Linkage Problem
In summary, the procedure linkage problems that we need to solve are:
- We need to find a way to pass arguments into procedures
- Procedures need their own local variables
- Procedures need to be able to use the CPU registers without worrying about overwriting their original content.
- Procedures might need to call other procedures, including themselves (recursion)
Procedure Linkage Convention
These issues can be solved by establishing a kind of procedure linkage convention. We assume that the following is always obeyed whenever we are executing instructions within a function:
- Return value is always stored at
R0
- Arguments can be found in a dedicated space in the Memory Unit, called the Stack. The address of the top of the stack (first unused location) can always be found in
R29
. The address of the base of the stack can always be found inR27
.We label
R29
asSP
(Stack Pointer), andR27
asBP
(Base Pointer). - Return address is always stored at
R28
.We label
R28
asLP
(Linkage Pointer).
A caller must obey the following sequence :
- Put arguments on the stack
- Branch to the target function, leaving return address in
LP
- Remove arguments on stack upon return
- Obtain return value from
R0
(if any)
A callee must obey the following sequence :
- Perform promised computation
- Leave result in
R0
(if any) - Leave the state of all registers in REGFILE except
R0
unchanged upon returning to the caller. - Leave stack data unchanged upon returning to the caller.
The “Stack” implementation
In general, a stack is a type of data structure where you can perform two essential operations: PUSH
and POP
. Its principle is last-in-first-out.
You always add item via PUSH
operation to the top of the stack, and can only remove the topmost item in succession via POP
operation from the top of the stack.
How do we implement the stack data structure to aid our function execution?
We reserve an arbitrary block of location in our Memory Unit to be space for the stack. Illustrated below is a block of memory unit from address 0x010C
to 0x0128
reserved as our stack:
We have conventions as mentioned:
R29 (SP)
contains the address of the top of the stack (available location to write to).- Addresses are increasing downwards, hence our stack grows downwards.
We can PUSH
(add data) to the stack via these two instructions:
ADDC(SP, 4, SP)
(increaseSP
’s content by4
so that it points to the next line address)ST(Rx, -4, SP)
(store content of chosen registerRx
to address pointed byReg[SP]-4
).
To POP
(remove data) from the stack, we can do so via these two instructions:
LD(SP , -4, Rx)
(load the content pointed byReg[SP]-4
to a chosenRx
)SUBC(SP, 4, SP)
(decreaseSP
’s content by4
so that it points to the previous line address)
In \(\beta\) uasm
, we simply create these macros for PUSH
and POP
:
PUSH(Rx)
: pushReg[Rx]
onto the stackPOP(Rx)
: pop the value on top of the stack toReg[Rx]
And also the following macros for easier stack management:
ALLOCATE(k) : ADDC(SP, 4*k, SP)
DEALLOCATE(k) : SUBC(SP, 4*k, SP)
They’re both used to move
SP
to a reserved location for the start of the stack.
Example of stack usage
Allocating the stack
Assuming the initial content of SP
is 0
, then in order for SP
to point to the address 0x010C
shown in the figure above, we can write:
1
2
3
.include beta.uasm
ALLOCATE(67)
0x10C
is 268 in decimal. Since macroALLOCATE
will multiply the input by4
before storing it at SP, we need to put268/4=67
as the input toALLOCATE
.
This is done to tell the CPU that the stack starts at address 0x010C
onwards, so that whatever content stored in the earlier addresses will not be overwritten by the stack.
Otherwise, the stack will start at address 0. Typically what is stored address
0
is the first executable instruction (i.e: the beginning of our program). We usually don’t want to overwrite any instructions.
Adding items to the stack
Suppose we put the following values: 16, 17, 18
to R1
, R2
, and R3
respectively:
1
2
3
ADDC(R31, 16, R1)
ADDC(R31, 17, R2)
ADDC(R31, 18, R3)
We can put these values into the stack using:
1
2
3
PUSH(R1)
PUSH(R2)
PUSH(R3)
Upon execution of these series of instruction, the state of the stack is:
Notice that the “top” of the stack is at the bottom, and that the stack grows downwards by convention.
Heads up: familiarize yourself and get used to tracking the “state” of the stack, PC
, and all registers in the REGFILE after executing each line of instruction .
Now suppose we use R1
, R2
, R3
for other stuffs and overwrite its values:
1
2
3
MULC(R1, 2, R1)
MULC(R2, 3, R2)
MULC(R3, 4, R3)
The new state of the registers are now written in red. The stack stays the same, and so does SP
.
We can now say that the stack contains the old value of the registers R1, R2, R3
, as shown:
Removing items from the stack
To remove items from the top of the stack, we can use POP
instruction:
1
2
3
POP(R3)
POP(R2)
POP(R1)
The state of the stack and the registers is therefore as shown. Notice how POP
instructions can be used to restore the registers state to before as in Step 1.
The remnants of data that was pushed to the stack actually stays in memory, but its rendered irrelevant because it can be overwritten again by other instructions (hence space is seen as freed). In fact, there’s no such thing as erasing contents in the memory unit unless we explicitly write a bunch of zeroes to make it disappear. We simply always overwrite them.
Resetting the stack
Finally, we reset a stack by simply changing the value SP
. For example, we can return the state of SP
to 0
as it was before Step 1.
1
DEALLOCATE(67)
Try out these instructions in
bsim
, and observe the state of the stack and registers after executing each instruction as practice.
Implementing Procedure Linkage Contract Using Stack
Recall that:
A (function) caller must obey the following calling sequence ,
- Put arguments on the stack
Put them in reverse order. Last argument is pushed first. First argument is pushed last.
- Branch to the target function, leaving return address in
LP
- Remove arguments on stack upon return
Can be done using
DEALLOCATE(N)
whereN
is the number of arguments - Obtain return value from
R0
(if any)For example:
ST(R0, location, Rx)
Let’s use our factorial function again:
1
2
3
4
5
6
7
8
9
10
11
int fact(int n){
int r = 1;
while (n > 0){
r = r*n;
n = n-1;
}
return r;
}
int result_1 = fact(4) -- (first call)
int result_2 = fact(9) -- (second call)
The function fact
is called twice, so we have two callers.
We can hand assemble the first caller following the caller procedure above.
Firstly, allocate some memory space for caller variables and instructions.
To allocate memory space for variables, we can either:
- Write them after all the instructions
- Use the
.
operator to load them at the desired memory location, e.g:0x01B0
- Define labels and its values
- Then, redirect the instruction back at address
0
using the.
operator again
To allocate memory space for instructions (i.e: make SP
point to some unused memory location), we can use ALLOCATE
.
1
2
3
4
5
6
7
.include beta.uasm
. = 0x01B0 | load values at fixed location
result_1 : LONG(0)
result_2 : LONG(0)
. = 0x0000 | load instructions from address 0 onwards
ALLOCATE(50)
Secondly, implement the calling sequence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|| Calling sequence
ADDC(R31, 4, R1) | put 4 to R1
|| (1) put argument on the stack
PUSH(R1)
|| (2) branch to the function, storing return address at LP
BEQ(R31, fact, LP)
|| (3) remove argument from stack after function returns
DEALLOCATE(1)
|| (4) obtain return value at R0
ST(R0, result_1, R31)
HALT()
We assume that fact
is a label that contains the address of the first instruction of the factorial function.
Thirdly, implement callee entry sequence and computation (body of the function)
Recall that a callee must:
- Leave the state of all registers in REGFILE except
R0
unchanged upon returning to the caller. - Leave stack data unchanged upon returning to the caller.
This is possible by implementing two parts: callee entry sequence and callee exit/return sequence.
The entry sequence is consisted of five steps:
PUSH(LP)
: to preserve the state ofLP
which containts the return address to the caller of this function.PUSH(BP)
: to preserve the state ofReg[R27]
ADD(SP, R31, BP)
: move the content ofSP
toBP
. This is to set the stack frame base for this function call.We have macro for this:
MOVE(SP, BP)
- Push all registers that we will use for the computation (except
R0
) - Load arguments from the stack
Afterwards, we then proceed with actually implementing the function.
Continuing our assembly code, writing the entry procedure is straightforward.
1
2
3
4
|| Callee entry sequence
fact: PUSH(LP) | (1)
PUSH(BP) | (2)
MOVE(SP, BP) | (3)
Now to figure out (4), we need to know the registers that we will use for computation of fact
.
Recall the implementation of the function fact
:
1
2
3
4
5
6
7
8
|| Assume (value of) n in R1 and (value of) r in R2
check_while_fact: CMPLT(R31, R1, R0)
BNE(R0, while_true_fact, R31)
BEQ(R31, done, R31)
while_true_fact: MUL(R1, R2, R2)
SUBC(R1, 1, R1)
BEQ(R31, check_while_fact, R31)
We can see that the registers that are used for the computation (apart from R0
) are : R1
and R2
R31
is “not counted” as it is always0
and you cannot rewrite new values into it.
Therefore the next instructions for the callee should be:
1
2
3
|| (4) Push the contents of all registers we will use for the computations onto to stack to preserve its old value
PUSH(R1)
PUSH(R2)
Then, we load the argument n
onto R1
, and put 1
in R2
. R1
holds the value of n
and R2
holds the value of r
.
1
2
3
|| (5) get arguments.
LD(BP, -12, R1) | Note: the first argument is at memory address BP-12
ADDC(R31, 1, R2)
Afterwards, we can continue with the implementation of fact
above as it is.
Think carefully about why the first argument is always at address pointed by
Reg[BP]-12
. This is a direct result of caller procedure and callee entry sequence (1) to (3).
Fourthly, implement callee exit sequence.
The exit sequence is consisted of six steps:
- Put return value at
R0
. - Restore all registers’ state that were used by the function computations using
POP
, in reverse order. ADD(BP, R31, SP)
: move the content ofBP
toSP
. This is to set the stack pointer back to the stack frame base for this function call.Remember, we need to leave the stack state as per the original state when the function returns. So whatever stuffs are added to the stack by this function must be removed. We can also use the macro for this:
MOVE(BP, SP)
POP(BP)
: restoreReg[BP]
to its original valuePOP(LP)
: restoreReg[LP]
to its original value – remember, this register contains the caller’s return address.JMP(LP, R31)
: return to the caller.
Let’s write them out in assembly, continuing our fact
example above:
1
2
3
4
5
6
7
8
9
10
11
|| (1) put return value r (originally in R2) at R0
done: ADD(R2, R31, R0)
|| (2) restore register contents
POP(R2)
POP(R1)
MOVE(BP, SP) | (3)
POP(BP) | (4)
POP(LP) | (5)
JMP(LP, R31) | (6)
Now to call fact
for the second time, we can simply write another calling sequence. The fact
instructions remain intact and reusable.
Combining everything, we have:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
.include beta.uasm
||| Space allocation for variables and instructions (set start of stack)
. = 0x01B0 | load values at fixed location
result_1 : LONG(0)
result_2 : LONG(0)
. = 0x0000 | load instructions from address 0 onwards
ALLOCATE(50)
||| First caller "calling sequence" -- fact(4)
ADDC(R31, 4, R1) | put 4 to R1
PUSH(R1) | step (1)
BEQ(R31, fact, LP) | step (2)
DEALLOCATE(1) | step (3)
ST(R0, result_1, R31) | step (4)
||| Second caller "calling sequence" -- fact(9)
ADDC(R31, 9, R1) | put 9 to R1
PUSH(R1) | step (1)
BEQ(R31, fact, LP) | step (2)
DEALLOCATE(1) | step (3)
ST(R0, result_2, R31) | step (4)
HALT()
||| Callee entry sequence
fact: PUSH(LP) | step (1)
PUSH(BP) | step (2)
MOVE(SP, BP) | step (3)
| step (4)
PUSH(R1) | n
PUSH(R2) | r
LD(BP, -12, R1) | step (5)
ADDC(R31, 1, R2)
||| Computation
check_while_fact: CMPLT(R31, R1, R0)
BNE(R0, while_true_fact, R31)
BEQ(R31, done, R31)
while_true_fact: MUL(R1, R2, R2)
SUBC(R1, 1, R1)
BEQ(R31, check_while_fact, R31)
||| Callee exit (return) sequence
done: ADD(R2, R31, R0) | step (1)
POP(R2) | step (2)
POP(R1)
MOVE(BP, SP) | step (3)
POP(BP) | step (4)
POP(LP) | step (5)
JMP(LP,R31) | step (6)
Run the program in bsim
step by step. At each instruction, pay attention on the state of all registers, the stack, and PC.
In this course, you are expected to mentally run each instruction line by line and give the state of all registers involved, the PC, and the stack at all times. Lots of practice is definitely needed.
At the end of the program, you should see that you have each answer in the memory address 0x01B0
and 0x01B4
:
An example with recursion
We can also implement fact
recursively:
1
2
3
4
5
6
7
8
int fact(int n){
if (n > 1){
return n * fact(n-1);
}
return 1;
}
int result_1 = fact(3);
The recursive function fact
can be hand assembled into:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
||| callee entry procedure
fact: PUSH(LP)
PUSH(BP)
MOVE(SP, BP)
| Preserve old register values before using them
PUSH(R1) | n
PUSH(R2) | temp reg
PUSH(R3) | temp reg
PUSH(R4) | for constant
LD(BP, -12, R1) | get first argument
ADDC(R31, 1, R4) | put 1 to R4 as a constant
||| Computation Part 1
begin_fact_check: CMPLT(R4, R1, R2) | compare if 1 < n, store in R2
BNE(R2, if_true, R31)
|| leave 1 in R0
ADDC(R31, 1, R0)
||| Callee exit sequence
exit_sequence: POP(R4) | Pop registers in reverse order, to restore each old register value
POP(R3)
POP(R2)
POP(R1)
MOVE(BP, SP)
POP(BP)
POP(LP)
JMP(LP, R31)
||| Computation Part 2
if_true: SUBC(R1, 1, R3) | compute n-1, store at R3
||| Recursive calling sequence
PUSH(R3)
BEQ(R31, fact, LP) | recurse
|| remove unused argument from stack when recursion returns
DEALLOCATE(1)
|| compute n * fact(n-1), store at R0 before returning
MUL(R0, R1, R0)
BEQ(R31, exit_sequence, R31) | return
The calling sequence is simple:
1
2
3
4
5
6
7
8
9
.include beta.uasm
ALLOCATE(100)
ADDC(R31, 3, R1)
PUSH(R1)
BEQ(R31, fact, LP) | call fact(3)
DEALLOCATE(1)
HALT()
You can add .breakpoint
in the code to allow for debugging, and observe the stack frame for each function call.
In this example, calling fact(3)
will cause fact(2)
, and fact(1)
to be called recursively.
The state of the stack frame right before fact(3)
branching, right before fact(2)
branching, right before fact(1)
branching, and right before fact(1)
calls its exit_sequence
is as follows:
Notice several things:
- The stack size for each callee frame is constant.
- In this example, it is 7 words (1 word is 32 bits).
- This is caused by 1 argument
PUSH
, twoPUSH
from callee entry sequence, and fourPUSH
of registers (to be used for computation).
- The current
BP
andSP
(rendered in black) points to the base and the top of the stack of the current caller.Base of the stack is not the start of the stack frame. It is defined as the location of the first item pushed into the stack by that caller after
MOVE(SP, BP)
step. - The location of the argument for the current callee is always 3 lines above the location pointed by the current
BP
.- It is the address pointed by the current
BP
subtracted by 12 (bytes).
- It is the address pointed by the current
- The return address of the current caller is always placed after the argument in the stack.
- This is the direct result of
PUSH(LP)
as the first instruction in the callee entry sequence. - For example, the return address for the first
fact
call:fact(3)
is at0x0014
(just look at the lower 16 bits for simplicity). - This is the address of instruction:
DEALLOCATE(1)
(see next section to understand why this is so).
- This is the direct result of
Again, test your understanding and sharpen your skills by mentally running each instruction one by one from the top and be aware of the current instruction, stack state, PC state, and register states at all times.
Instruction Loading Address
To figure out the address of each instruction, we can assume (if not given) that the address of the first instruction is at 0x0000
.
- Recall that each \(\beta\) instruction is 32 bits long (4 bytes).
- Therefore, the address of subsequent instructions are always increased by 4.
For instance, if we simply load this instruction into the memory unit starting from address 0x0000
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
.include beta.uasm
ALLOCATE(100)
ADDC(R31, 3, R1)
PUSH(R1)
BEQ(R31, fact, LP) | call fact(3)
DEALLOCATE(1)
HALT()
||| callee entry procedure
fact: PUSH(LP)
PUSH(BP)
MOVE(SP, BP)
...
It will obtain the following addresses:
This screenshot is taken from
bsim
just before we branch tofact
for the first time with argument3
. That is why thePC
is pointing to address0x0010
.
Notice that as mentioned, the address of instruction DEALLOCATE(1)
(macro for SUBC(SP, 4, SP)
) is 0x0014
.
Detail: About PC
We have learned before that PC
register always contain the address of the next instruction to be executed.
There’s one subtle point to realise: when the PC
is currently pointing to an address A
, the instruction at A
is actually already computed in the current cycle, but the result (of this computation) hasn’t been stored yet. The result will only be stored (in some register or memory address) only in the next cycle.
Remember that memory write is synchronized with the clock.
Take for example this particular execution state in the \(\beta\) CPU:
-
The current content of register
PC
is0x004
, that is where instruction:ADDC(R31, 3, R1)
resides. - The output of
ADDC
has already been computed in this cycle, as denoted in the blue value at the output port of the ALU:0x00000003
.- All values in blue are computed values (in this cycle) due to this current instruction
ADDC
.
- All values in blue are computed values (in this cycle) due to this current instruction
-
However, notice
Reg[R1]
still contains0
and not the new result3
yet. In this cycle, we are trying to write3
intoR1
. - Only at the next cycle,
Reg[R1]
finally stores the new value3
. The figure below shows the state of the \(\beta\) CPU at the following cycle:
Therefore, what we actually mean when we say that the PC
always point to the next
instruction executed is precisely that the result will only be synchronized in the next cycle (but computation is done in the current cycle).
Don’t worry too much about it. It’s just some subtle detail to appreciate.
An Example with Multiple Arguments
Consider the following function with multiple arguments:
1
2
3
4
5
int y(int m, int x, int c){
return m*x + c;
}
int result = y(2, 5, 3);
In order to call this function, we need to push each argument in the reverse order:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
.include beta.uasm
. = 0x0CC
result: LONG(0)
. = 0x000
ADDC(R31, 2, R1) | first argument
ADDC(R31, 5, R2) | second argument
ADDC(R31, 3, R3) | third argument
|| Calling Sequence
|| Push arguments in the reverse order
PUSH(R3)
PUSH(R2)
PUSH(R1)
|| Branch to the function
BEQ(R31, y, LP)
|| Deallocate the arguments
DEALLOCATE(3)
|| Store return value
ST(R0, result, R31)
HALT()
Then in the function y
, we obtain the arguments sequentially, starting from BP-12
for the first argument, BP-16
for the second argument, and so on:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
||| Callee entry sequence
y : PUSH(LP)
PUSH(BP)
MOVE(SP, BP)
| Preserve old register values before using them
PUSH(R1)
PUSH(R2)
PUSH(R3)
| Load arguments
LD(BP, -12, R1) | m
LD(BP, -16, R2) | x
LD(BP, -20, R3) | c
| Computation
MUL(R1, R2, R1)
ADD(R1, R3, R0) | leave the answer at R0
||| Callee exit sequence
| Return all register values (POP in reverse order)
POP(R3)
POP(R2)
POP(R1)
MOVE(BP, SP)
POP(BP)
POP(LP)
JMP(LP, R31)
Tough Problems
The sequences that we learned don’t really solve all problems.
Problem 1: Nested procedure definitions
In Python, we can define nested procedure as follows:
1
2
3
4
def f(x):
def g(y):
return x+y;
return g
This requires g
to access non-local variable x
. We would require some kind of “static-links” in stack frames.
C avoids this problem by outlawing nested procedure declarations.
Problem 2: Dangling references
1
2
3
4
5
6
7
8
int *p; /* a pointer */
int h(x) {
int y = x*3;
p = &y; // set p to be pointing to address of y, a local variable
return 37;
}
h(10);
print(*p);
The C-code above will compile, but when executed, unexpected behavior can happen.
Crash (segmentation fault), random stuffs being printed, etc.
In C/C++, life is harsh and it is the responsibility of the programmer to ensure that mistakes like these do not happen.
Java and Python will babysit and protect us from these mistakes, as there’s language restriction that forbids constructs that could lead to dangling references (we are also given automatic storage management, and lots of things are taken care of – garbage collection, variables allocation, etc).
Summary
You may want to watch the post lecture videos here:
The calling and callee sequence is designed such that we have a fixed convention for linking procedures. The data structure needed for procedure linkage is a stack, and it can simply be implemented using macroinstructions: PUSH
and POP
on some unused memory block, which address is stored in register R29: SP
.
It is extremely important to change the content of SP
(reserved stack pointer register) to reflect a memory address where there isn’t any important instruction or data in it.
Remember that PUSH
and POP
each require two clock cycles to complete, because each of them is consisted of two atomic \(\beta\) instructions.
We also reserve two other registers: R27: BP
and R28: LP
for this purpose, so that we know where to get the function arguments from the stack, and return address (back to the caller).
We do not use these registers
R27, R28, and R29
for other purposes.
The callee has to leave stack data unchanged upon returning to the caller, that is to clear whatever data that was put in the stack during its execution. As a result, we might find dangling pointers:
e.g: pointer that points to an address that is no longer used.
when we try to access a function’s local variable long after the function has returned.