50.002 Computation Structures
Information Systems Technology and Design
Singapore University of Technology and Design

Problem Set 5

This page contains all practice questions that constitutes the topics learned in Week 5: Beta Datapath.

Each topic’s questions are grouped into three categories: basic, intermediate, and challenging. You are recommended to do all basic problem set before advancing further.

Beta Datapath

$\beta$ Trivia (Basic)

In an unpipelined Beta implementation, when is the signal RA2SEL set to 1?

Show Answer

The RA2SEL signal is set to 1 when executing a ST instruction. When RA2SEL is 1 the 5-bit Rc field of the instruction is sent to the RA2 port of the register file, causing Reg[Rc] to be sent to the write data port of main memory.
In an unpipelined Beta implementation, when executing a BR(foo,LP) instruction to call procedure foo, what should WDSEL should be set to?

Show Answer

BR(foo,LP) is a macro for BEQ(R31,foo,LP). All BNE/BEQ instructions save the address of the following instruction in the specified destination register (LP in the example instruction). So WDSEL should be set 0, selecting the output of the PC+4 logic as the data to be written into the register file.
The minimum clock period of the unpipelined Beta implementation is determined by the propagation delays of the datapath elements and the amount of time it takes for the control signals to become valid. Which of the following select signals should become valid first in order to ensure the smallest possible clock period: PCSEL, RA2SEL, ASEL, BSEL, WDSEL, WASEL?

Show Answer

To ensure the smallest possible clock period RA2SEL should become valid first. The RA2SEL mux must produce a stable register address before the register file can do its thing. All other control signals affect logic that operates after the required register values have been accessed, so they don't have to be valid until later in the cycle.

$\beta$ Assembly Language (Basic)

What does the following piece of Beta assembly do? Hand assemble the beta assembly language into machine language.

I = 0x5678
B = 0x1234

LD(I,R0) -- (1)
SHLC(R0,2,R0) --  (2)
LD(R0,B,R1) -- (3)
MULC(R1,17,R1) -- (4)
ST(R1,B,R0)  -- (5)

Finally, what is the result stored in R0?

Show Answer

The machine language is:

I = 0x5678
B = 0x1234
|| LD(R31,I,R0) -> 011000 00000 11111 0101 0110 0111 1000 
0x601F5678
|| SHLC(R0,2,R0) -> 111100 00000 00000 0000 0000 0000 0010 
0xF0000002
||LD(R0,B,R1) -> 011000 00001 00000 0001 0010 0011 0100
0x60201234
||MULC(R1,17,R1) -> 110010 00001 00001 0000 0000 0001 0001
0xC8210011
||ST(R1,B,R0) -> 011001 00001 00000 0001 0010 0011 0100
0x64201234

Explanation:

Line 1: move the content of the memory unit at EA=I to register R0
Line 2: the content of R0 is multiplied by 4 and stored back at register R0
Line 3: move the content of memory address EA: EA= B + content of register R0; to register R1.
Line 4: The content of register R1 is multiplied by 17 and stored back at register R1.
Line 5: Store / copy the content of register R1 to the memory unit with address EA: EA= B + content of register R0.

The result of R0 is the content of memory address I: Mem[I] multiplied by 4.

Non $\beta$ Architecture Benchmarking (Basic)

A local junk yard offers older CPUs with non-Beta architecture that require several clocks to execute each instruction. Here are the specifications:

\[\begin{matrix} \text{Model} & \text{Clock Rate} & \text{Avg. clocks per Instruction}\\ \hline x & 40 Mhz & 2.0\\ y & 100 Mhz & 10.0\\ z & 60 Mhz & 3.0\\ \end{matrix}\]

You are going to choose the machine which will execute your benchmark program the fastest, so you compiled and ran the benchmark on the three machines and counted the total instructions executed:

$x$: 3,600,000 instructions executed
$y$: 1,900,000 instructions executed
$z$: 4,200,000 instructions executed

Based on the above data, which machine would you choose?

Show Answer

First we find out the time taken to execute those instructions:

$x$: $\frac{3.6M}{40M / 2}$ = $0.18$ seconds
$y$: $\frac{1.9M} {100M / 10}$ = $0.19$ seconds
$z$: $\frac{4.2M}{60M / 3}$ = $0.21$ seconds

From the result above, $x$ is the fastest machine. Hence we choose $x$.

Clumsy Lab Assistant (Basic)

Notta Kalew, a somewhat fumble-fingered lab assistant, has deleted the opcode field from the following table describing the control logic of an unpipelined Beta processor.

Help Notta out by identifying which Beta instruction is implemented by each row of the table.

Show Answer

From first row to the last: SUBC, BEQ, LDR, CMPEQ, ST.
Notta notices that WASEL is always zero in this table. Explain briefly under what circumstances WASEL would be non-zero.

Show Answer

WASEL is 1 if an interrupt, an illegal opcode is trapped, or a fault occurs</strong>. When WASEL is 1, it selects XP as the write address for the register file; Reg[XP] is where we store the current PC+4whenever there is an interrupt, a fault, or an illegal opcode. </p></div>

Notta has noticed the following C code fragment appears frequently in the benchmarks:

 int *_p; /_* Pointer to integer array *_/_
 _int i,j; /_* integer variables *_/_

 _..._

 _j = p[i]; /_* access ith element of array */

The pointer variable p contains the address of a dynamically allocated array of integers. The value of p[i] is stored at the address Mem[p +4i] where p and i are locations containing the values of the corresponding C variables. On a conventional Beta this code fragment is translated to the following instruction sequence:

 LD(...,R1)     /* R1 contains p, the array base address */
 LD(...,R2)     /* R2 contains I, the array index */    ...
 SHLC(R2,2,R0)  /* compute byte-addressed offset = 4*i */
 ADD(R1,R0,R0)  /* address of indexed element */
 LD(R0,0,R3)    /* fetch p[i] into R3 */

Notta proposes the addition of an LDX instruction that shortens the last three instructions to:

 SHLC(R2,2,R0)  /* compute byte-addressed offset = 4*i */
 LDX(R0,R1,R3)  /* fetch p[i] into R3 */

Give a register-transfer language description for the LDX instruction.

Show Answer

LDX( Ra, Rb, Rc ):
     EA <- Reg[Ra] + Reg[Rb]
     Reg[Rc] <- Mem[EA]
     PC <- PC + 4

Using a table like the one above specify the control signals for the LDX opcode.

Show Answer

$$\begin{matrix} PCSEL & RA2SEL & ASEL & BSEL& WDSEL & ALUFN & WR & WERF & WASEL \\ \hline 0 & 0 & 0 & 0 & 2 & ADD & 0 & 1 & 0 \end{matrix}$$
It occurs to Notta that adding an STX instruction would probably be useful too. Using this new instruction, p[i] = j might compile into the following instruction sequence:
```
 SHLC(R2,2,R0)  /* compute byte-addressed offset = 4*i */
 STX(R3,R0,R1)  /* R3 contains j, R1 contains p */
```
Briefly describe what (hardware) modifications to the Beta datapath would be necessary to be able to execute STX in a single cycle.

Show Answer
The register transfer language description of STX would be:
```
STX(Rc, Rb, Ra)
 EA <- Reg[Ra] + Reg[Rb]
 Mem[EA] <- Reg[Rc]
 PC <- PC + 4
```
It's evident that we need to perform 3 register reads, but the Beta's register file has only 2 read ports. Thus we need to add a third read port to the register file.

Incidentally, adding a third read port would eliminate the need for the RA2SEL mux because we no longer need to choose between Rb and Rc, since each register field has its own read port.

New Beta Instruction (Basic)

Write the register transfer language below corresponds to the instruction with the following control signal:

Show Answer
```
Reg[Rc] <-- (PC+4)+4*SXT(C) 
 PC <-- PC + 4
```
Explain why the following instruction cannot be added to our Beta instruction set without further hardware modifications on the datapath:
```
 PUSH(Rc, 4, Ra):
     Mem[Reg[Ra]] <-- Reg[Rc]
     Reg[Ra] <-- Reg[Ra] + 4
```
Show Answer

To implement this PUSH, somehow the ALU would have to produce two 32-bit values instead of the original one 32-bit output. The new two 32-bit values are: Reg[Ra] to be used as the memory address and Reg[Ra]+4 to be written into the register file.
</ins>

Another New Beta Instruction (Basic)

Given the following C-code:

if (a != 0){ 
	b = 3;
}  
(other code....)

where a, b are variables that have been initialised in the earlier part of the code (not shown). If we were to implement the following C-code using the Beta instruction set, we must do this in at least two cycles:

BEQ(Ra, label_continue, R31)  
ADDC(R31, 3, Rb)  
label_continue: (other code)

where Ra, Rb are assumed to be registers containing values a and b.

The ALU in this particular Beta however, implements five new functions on top of the standard functions: “B”, “NOTA”, “NOTB”, “TRUE”, “FALSE”.

Due to this, your classmate suggested that we can actually do this in one cycle by modifying the Control Unit to accept this new instruction called MCNZ (move constant if not zero) instead:

MCNZ(Ra, literal, Rc) : 
	if(Reg[Ra] != 0)
		Reg[Rc] <-- literal 
	PC <-- PC + 4

What values should the Control Unit give for this instruction MCNZ?

Show Answer

$$\begin{matrix} PCSEL & RA2SEL & ASEL & BSEL& WDSEL & ALUFN & WR & WERF & WASEL \\ \hline 0 & - & - & 1 & 1 & "B" & 0 & Z?0:1 & 0 \end{matrix}$$
Note: Z?0:1 -- means 0 if Z==1, and 1 otherwise.

Faulty Detection in Beta (Intermediate)

You suspected that your Beta CPU is faulty, in particular, these two components:

The ASEL mux might be faulty:
- if ASEL = 0, the output is always 0.
- There’s no problem if ASEL = 1.
The part of the CU that gives out RA2SEL signal might be faulty:
- RA2SEL is always stuck at 0 (it cannot be 1 regardless of the instruction)

Your friend came up with several short test programs. You want to select one of these programs to run in the faulty Beta, but you don’t want to waste your time loading and running multiple programs and would like to select one that can detect both faults. Which of the following program(s) can detect both faults?

Meaning that :

The values in the PC / Registers in Regfile / Memory Unit will be different from a working Beta CPU if these programs were to be executed in this faulty Beta.
You can be 100% sure the discrepancy is caused by both RA2SEL signal or ASEL mux faulty.
Programs that can only detect the RA2SEL signal faulty but not ASEL multiplexer faulty (or vice versa) is not acceptable.

You can assume that the initial content of all registers are 0.

Program 1:

.=0x000  
LDR(constant, R0) 
LDR(constant + 4, R1) 
ADD(R0, R1, R2)  
ST(R2, constant + 8, R31) 
HALT()  

constant: LONG(8)
LONG(4)

Program 2:

.=0X000  
CMOVE(5, R1) 
LDR(constant, R2) 
ST(R2, answer, R31) 
MUL(R1, R2, R3) 
HALT()  

constant: LONG(0) 
.=0xFFFC  
answer: LONG(0)

Program 3:

.=0x000  
constant: LONG(8)
LONG(4)
LDR(constant, R0) 
ADD(R0, R0, R0) 
ST(R0, .+8, R31) 
HALT()

Program 4:

.=0x000  
CMOVE(5, R0)  
ST(R0, constant + 8, R31) 
LDR(constant, R1)  
ADD(R1, R1, R2)  
HALT()  

.=0xABCC  
constant: LONG(8)
LONG(4)

Show Answer

There's only one instruction: ST that requires RA2SEL to be 1. Therefore our program must have this instruction to test against a working Beta CPU. We also must ensure that we utilize instructions that results in ASEL=0 and that the output of the ASEL mux should be nonzero in a working Beta CPU. We also need to ensure that the programs need to utilize these instructions in a way that results in a different state when run on a working Beta CPU.

Program 1 and Program 4 fulfills the criteria, and the other two don't.

For Program 1: * The content store at R2 will be 4 instead of 12 if the ASEL mux is faulty.

We will end up storing 8 instead of 12 to Mem[constant + 8] if RA2SEL signal remains 0 due to the faulty CU.

For Program 4:

The content of R31 is stored to Mem[Constant+8] instead of the content of R0. Therefore, Mem[Constant+8] is 0 instead of 5.
The content of R2 is 8 instead of 16.

Program 2 and Program 3 also utilizes ST and OP instructions: MUL/ADD, etc that involve the ASEL mux but if you run them with the faulty Beta and with a working Beta, the end state is either the same or different due to one of the faulties only, and therefore can't be used to detect both faulties.

Beta Instruction Replacements (Intermediate)

For each of the statements below, indicate whether they’re True or False and provide your reasoning.

Statement 1: In the Beta, every ADDC instruction can always be replaced by a SUBC instruction that puts precisely the same value in the destination register. For example, ADDC(R0,1,R0) is equal to SUBC(R0,-1,R0) (think about all constants).
Statement 2: In a Beta program, you can use BEQ(R31, label, R31) as a substitute for JMP(Ra) where Ra stores the address of label, no matter where label is.
Statement 3: We can never perform LD and ST to any two independent addresses in a single cycle (even if the memory unit supports it) by just modifying the control unit of the Beta. In other words, we need to modify the datapath of the Beta in order to do this.

Show Answer

Statement 1 is False. We can have ADDC(R0, -65536, R0) but we cant have SUBC(R0, 65536, R0) as the most positive number that a signed 16-bit can represent is 65535.

Statement 2 is False. Ra contains 32-bit of data, so we can set PC to be pointing to any address in the memory (4GB of address space) with JMP(Ra). However, BEQ only covers 65536*4 (above PC+4) + 65535*4 (*below and inclusive of PC+4*) bytes of address space.

Statement 3 is True. The output of the ALU supplies a single address for both load and store to the memory unit.

PCSEL Fault Detection (Intermediate)

This time round, consider a Beta machine with a faulty control unit, where its PCSEL signal is always 0, meaning that the input to the PC register is always
PC+4 regardless of the instruction.

As always, we can detect this particular fault by running a simple test program written in Beta assembly language. State which of the following programs can detect this particular fault, meaning that if it was to be run on a faulty Beta machine, we will get different results (contents) on the registers in the regfiles, PC, or Memory Unit, and provide your reasoning.

Assume that all register values were 0 at the beginning of each program execution.

Program 1: (executed for two CLK cycles)

.= 0  
BEQ(R0, .+4, R31)  
ADDC(R0, 1, R0)  

Program 2: (executed for three CLK cycles)

.=0  
CMPEQ(R0, R0, R0)  
BNE(R0, .-4, R31)  
ADDC(R0, 1, R0)

Program 3: (executed for four CLK cycles)

.=0  
LD(R0, 0, R0)  
MULC(R0, 1, R0)  
BNE(R0, .+4, R31)  
CMPEQ(R0, R31, R2)

Program 4: (executed for two CLK cycles)

.=0
ST(R0, x, R31) 
x: LONG(12)

Program 5: (executed for two CLK cycles)

.=0  
JMP(R1)  
ADDC(R0, 1, R1)

Program 6: (executed for two CLK cycles)

.=0  
LDR(R31, .+8, R0)  
ADDC(R0, 1, R1)  
x : LONG(3)

Show Answer

Program 2, 4, and 5 can successfully detect this faulty. All of them forces the control unit to produce non-zero PCSEL. For example, Program 4 results in illop when the Beta attempts to execute LONG(12) because it isn't an instruction. Therefore PCSEL=3 if the control unit works properly and that the content of PC will be ILLOP (wherever the address of illegal operation handler is) instead of address 0xC.

Quality Control (Intermediate)

One Beta manufacturer is having quality-control problems with their design. In particular, they’ve had reliability issues with various device connections that are circled in the diagram below.

Your job is to write some test programs to help determine if a machine is fault-free. ==Assume that when a device connection is “faulty,” the indicated bus or signal is always producing “0” instead of the expected value.==

For each of the circled connections, write an instruction sequence that when executed for a specified number of cycles will leave the following result in R0:

1 in R0 if the connection was working.
Other values in R0 if the connection was faulty.

You can assume that all registers are reliably set to 0 before each sequence is executed.

Give your instruction sequence for each of the six indicated faults and briefly explain how each sequence detects the fault and produces something besides 1 in R0 when the fault is present:

Fault A: Input 1 of PCSEL mux has a value of 0 instead of PC+4+4*SEXT(C).
Fault B: RA2SEL multiplexer control signal is 0 instead of as per intended current instruction OPCODE.
Fault C: Z input to control logic is always 0 instead of the correct value depending on RD1.
Fault D: BSEL multiplexer control signal 0 instead of as per intended current instruction OPCODE.
Fault E: WR memory control signal is 0 instead of as per intended current instruction OPCODE.
Fault F: Input 0 of WDSEL mux has a value of 0 instead of PC+4.

Note: there’s many alternate answers. They aren’t unique.

Fault A: Input 1 of PCSEL mux has a value of 0 instead of PC+4+4*SEXT(C).

Show Answer

| starts at address 0
. = 0
BEQ(R0,.+4,R31) | 0x0
ADDC(R0,1,R0) | 0x4

Execute for 2 cycles (i.e., execute two instructions):

If fault A is not present, R0 contains 1 after the second cycle, since the second instruction is fetched from location 0x4.
If fault A is present, the second instruction is fetched from location 0 (instead of 4, since the input 1 to the PCSEL mux is 0), so the value of R0 stays 0. </ul>
Note that the label .+4 means “memory location of current instruction + 4”, which is 0+4 here. </p></div>
**Fault B:** `RA2SEL` multiplexer control signal is `0` instead of as per intended current instruction `OPCODE`.
Show Answer
```
| starts at address 0
. = 0
ADDC(R1,1,R1)
ST(R1,0,R0)
LD(R0,0,R0)
```
Execute for 3 cycles:
- If fault B is not present, the ST instruction writes the value 1 into location 0, which is then LD-ed (loaded) into R0.
- If fault B is present, the ST instruction writes the contents of R0 instead (ie, the value 0), so now the LD instruction puts 0 into R0.