Skip to main content Link Search Menu Expand Document (external link)

50.002 Computation Structures
Information Systems Technology and Design
Singapore University of Technology and Design

(Verilog) Lab 6: Beta CPU

This is a Verilog parallel of the Lucid + Alchitry Labs Lab 6. It is not part of the syllabus, and it is written for interested students only. You still need to complete all necessary checkoffs in Lucid, as stated in the original lab handout.

If you are reading this document, we assume that you have already read Lab 4 Lucid version, as some generic details are not repeated. This lab has the same objectives and related class materials so we will not paste them again here. For submission criteria, refer to the original lab 6 handout.

Introduction

The goal of this lab is to build a fully functional 32-bit Beta Processor on our FPGA so that it could simulate simple programs written in Beta Assembly Language. It is a huge device, and to make it more bearable we shall modularise it into six major components:

  • Memory Unit: the RAM or physical memory, separated into data and instruction memory.
  • (Beta CPU Part A) PC Unit: containing the PC register and all necessary components to support the ISA
  • (Beta CPU Part B) REGFILE Unit: containing 32 32-bit registers, WASEL, and RA2SEL MUX, plus circuitry to compute Z
  • (Beta CPU Part C) CONTROL Unit: containing the ROM and necessary components to produce all Beta control signals given an OPCODE
  • (Beta CPU Part D) ALU+WDSEL Unit: containing the ALU and WDSEL, ASEL, BSEL MUXes
  • Motherboard: We assemble the entire Beta CPU using all subcomponents above and connect it to I/O


The signals indicated in red refers to external INPUT to our Beta, supplied by the Memory Unit. The signals illustrated yellow refers to our Beta’s OUTPUT to the Memory Unit.

Please study each section carefully as this will be beneficial not only for your 1D Project and Exam, but also to sharpen your knowledge in basics of computer architecture which might be useful in your future career as a computer science graduate.

Memory Unit

We strongly suggest that the memory Unit is made physically separated into two sections for ease of explanation and implementation:

  • the instruction memory and
  • the data memory

In practice, the data segment and the instruction segment are only logically segregated, so it would need to support two reads in a single cycle (for both data and instruction). They still share the same physical device we call RAM, but we implement them as two RAM blocks here to keep the Beta CPU wiring simple. For more details regarding 2R1W type of RAM and possible implementation in FPGA, see the appendix.

Below is a sample implementation of the memory unit that can be used to alongside your Beta CPU. It utilises simple_ram and simple_dual_port_ram (see sections below) provided by Alchitry Labs’ component library, which will utilise the BRAMs of the FPGA to implement the memory unit instead of using the limited LUTs.

// Byte-addressed inputs, word-aligned internally (addr >> 2)
module memory_unit #(
    parameter integer WORDS = 16
)(
    input  wire clk,

    // data memory (byte addressing expected)
    input  wire [$clog2(WORDS)+2-1:0] raddr,
    input  wire [$clog2(WORDS)+2-1:0] waddr,
    input  wire [31:0]                wd,
    input  wire                       we,
    output wire [31:0]                mrd,

    // instruction memory (byte addressing expected)
    input  wire [$clog2(WORDS)+2-1:0] ia,
    input  wire                       instruction_we,
    input  wire [31:0]                instruction_wd,
    output wire [31:0]                id
);

  localparam integer AW = $clog2(WORDS);

  // Convert byte address -> word address by dropping low 2 bits
  wire [AW-1:0] ia_word = ia[AW+1:2];
  wire [AW-1:0] ra_word = raddr[AW+1:2];
  wire [AW-1:0] wa_word = waddr[AW+1:2];

  // Instruction memory: single-port RAM (sync read, 1-cycle latency)
  simple_ram #(
      .WIDTH(32),
      .ENTRIES(WORDS)
  ) instruction_memory (
      .clk          (clk),
      .address      (ia_word),
      .read_data    (id),
      .write_data   (instruction_wd),
      .write_enable (instruction_we)
  );

  // Data memory: dual-port RAM (sync read on rclk, write on wclk)
  simple_dual_port_ram #(
      .WIDTH(32),
      .ENTRIES(WORDS)
  ) data_memory (
      .wclk         (clk),
      .waddr        (wa_word),
      .write_data   (wd),
      .write_enable (we),

      .rclk         (clk),
      .raddr        (ra_word),
      .read_data    (mrd)
  );

endmodule

Instruction Memory

The instruction memory is implemented using the simple_ram component from Alchitry Labs, code featured in Appendix.

  • read_data outputs the value of the entry pointed to by address in the previous clock cycle. If you want to read address EA, you set address = EA_word and wait one FPGA clock cycle for Mem[EA] to show up.
  • If you read and write the same address, then:

    • on the next cycle you will see the old value at read_data, and
    • one cycle later you will see the newly written value at read_data.

Since we never need to write to instruction memory during program execution, we normally keep instruction_we = 0. However, the port is provided so that we can load programs into instruction memory during testing.

The interface is:

// for instruction memory (byte addressing expected)
input  ia[$clog2(WORDS)+2],
input  instruction_we,
input  instruction_wd[32],
output id[32]

id outputs the value of the entry pointed to by ia in the previous clock cycle. Also, if you read and write the same address ia and hold ia, the first clock cycle the address will be written, the second clock cycle the old value will be output on id, and on the third clock cycle the newly updated value will be output on id.

Data Memory

The Beta CPU can read or write the Data Memory. For ease of demonstration, data memory is implemented as a dual port RAM (read and write can be done independently in the same clk cycle), see appendix. That is why we have two address ports raddr and waddr.

  • read_data outputs the value of the entry pointed to by raddr in the previous clock cycle. If you want to read address EA, set raddr = EA and wait one FPGA clock cycle for Mem[EA] to show up.
  • We should avoid reading and writing to the same address simultaneously because the returned value is undefined (tool/FPGA dependent).

The interface is:

// for data memory (byte addressing expected)
input  raddr[$clog2(WORDS)+2],
input  waddr[$clog2(WORDS)+2],
input  wd[32],
input  we,
output mrd[32]

You should avoid reading and writing to the same address simultaneously. The value read in this case is undefined. Also, mrd outputs the value of the entry pointed to by raddr in the previous clock cycle. The unit is always reading based on whatever value is at raddr, so you can ignore mrd values if you don’t need it.

Memory Read

To load (read) data from memory, the Beta supplies the effective address EA on raddr. After one rising edge of clk, the memory outputs:

  • mrd[31:0] = Mem[EA]

Memory Write

To store (write) data to memory, the Beta supplies:

  • waddr = effective address EA
  • wd[31:0] = the 32-bit value to store
  • we = 1 to perform the write on the rising edge of clk

The signal we must always be a valid logic value (0 or 1) at the rising edge of clk. If we = 1, the value on wd[31:0] is written into memory at the end of the current cycle. If we = 0, wd[31:0] is ignored.

Addressing Convention (Byte Address In, Word-Aligned)

We expect byte addresses to be supplied at ia, raddr, and waddr. However, our RAM blocks store 32-bit words, so the memory unit is word-aligned internally.

This means:

  • We ignore the lower two bits of the address (addr[1:0]).
  • Internally, the word index is addr >> 2.

In Verilog, we implement this by slicing:

wire [AW-1:0] word_addr = addr[AW+1:2];  // drop addr[1:0]

As a result:

  • Addresses 0x00, 0x01, 0x02, 0x03 all map to the same word (word 0)
  • Address 0x04 maps to the next word (word 1)
  • Unaligned accesses are not supported (they are forced to the aligned word).

We could implement a strictly byte-addressable RAM by storing bytes (WIDTH=8) and using ENTRIES=WORDS*4, but then we must add additional logic to assemble 4 bytes into a 32-bit word (and handle byte enables on stores). For the Beta CPU memory unit here, we keep the design word-aligned for simplicity.

Testbench

You can use the following testbench to observe how the memory unit work:

`timescale 1ns / 1ps

module tb_memory_unit;

  // --------------------------------------------------------------------------
  // Params
  // --------------------------------------------------------------------------
  localparam integer WORDS = 16;
  localparam integer AWB = $clog2(WORDS) + 2;  // byte-address width

  // --------------------------------------------------------------------------
  // DUT I/O
  // --------------------------------------------------------------------------
  reg            clk;

  reg  [AWB-1:0] raddr;
  reg  [AWB-1:0] waddr;
  reg  [   31:0] wd;
  reg            we;
  wire [   31:0] mrd;

  reg  [AWB-1:0] ia;
  reg            instruction_we;
  reg  [   31:0] instruction_wd;
  wire [   31:0] id;

  // --------------------------------------------------------------------------
  // Instantiate DUT
  // --------------------------------------------------------------------------
  memory_unit #(
      .WORDS(WORDS)
  ) dut (
      .clk(clk),

      .raddr(raddr),
      .waddr(waddr),
      .wd   (wd),
      .we   (we),
      .mrd  (mrd),

      .ia            (ia),
      .instruction_we(instruction_we),
      .instruction_wd(instruction_wd),
      .id            (id)
  );

  // --------------------------------------------------------------------------
  // Clock
  // --------------------------------------------------------------------------
  initial clk = 1'b0;
  always #5 clk = ~clk;  // 100 MHz

  // --------------------------------------------------------------------------
  // Wave dump
  // --------------------------------------------------------------------------
  initial begin
    $dumpfile("tb_memory_unit.vcd");
    $dumpvars(0, tb_memory_unit);
  end

  // --------------------------------------------------------------------------
  // Helpers
  // --------------------------------------------------------------------------
  task tick;
    begin
      @(posedge clk);
      #1;  // small delay for signals to settle
    end
  endtask

  task tb_fatal(input [1023:0] msg);
    begin
      $display("ASSERTION FAILED at t=%0t: %0s", $time, msg);
      $fatal(1);
    end
  endtask

  // Simple "assert equal" helper
  task assert_eq32(input [31:0] got, input [31:0] exp, input [1023:0] what);
    begin
      if (got !== exp) begin
        $display("  got = 0x%08h", got);
        $display("  exp = 0x%08h", exp);
        tb_fatal(what);
      end
    end
  endtask

  task instr_write(input [31:0] byte_addr, input [31:0] data);
    begin
      ia             = byte_addr[AWB-1:0];
      instruction_wd = data;
      instruction_we = 1'b1;
      tick();
      instruction_we = 1'b0;
    end
  endtask

  // Sets ia, ticks once (read occurs), then checks id (which updates on that tick)
  task instr_read_check(input [31:0] byte_addr, input [31:0] exp);
    begin
      ia = byte_addr[AWB-1:0];
      tick();  // id <= mem[ia_word] at this edge
      assert_eq32(id, exp, {"instr read @", hex32(byte_addr), " (word-aligned)"});
    end
  endtask

  task data_write(input [31:0] byte_addr, input [31:0] data);
    begin
      waddr = byte_addr[AWB-1:0];
      wd    = data;
      we    = 1'b1;
      tick();
      we = 1'b0;
    end
  endtask

  task data_read_check(input [31:0] byte_addr, input [31:0] exp);
    begin
      raddr = byte_addr[AWB-1:0];
      tick();  // mrd <= mem[raddr_word] at this edge
      assert_eq32(mrd, exp, {"data read @", hex32(byte_addr), " (word-aligned)"});
    end
  endtask

  // Format helper: return 8-hex string for messages
  function [8*10-1:0] hex32(input [31:0] x);
    begin
      // "0x" + 8 hex digits = 10 chars
      hex32 = {
        "0x",
        nyb(x[31:28]),
        nyb(x[27:24]),
        nyb(x[23:20]),
        nyb(x[19:16]),
        nyb(x[15:12]),
        nyb(x[11:8]),
        nyb(x[7:4]),
        nyb(x[3:0])
      };
    end
  endfunction

  function [7:0] nyb(input [3:0] n);
    begin
      case (n)
        4'h0: nyb = "0";
        4'h1: nyb = "1";
        4'h2: nyb = "2";
        4'h3: nyb = "3";
        4'h4: nyb = "4";
        4'h5: nyb = "5";
        4'h6: nyb = "6";
        4'h7: nyb = "7";
        4'h8: nyb = "8";
        4'h9: nyb = "9";
        4'hA: nyb = "A";
        4'hB: nyb = "B";
        4'hC: nyb = "C";
        4'hD: nyb = "D";
        4'hE: nyb = "E";
        4'hF: nyb = "F";
      endcase
    end
  endfunction


  function [AWB-1:0] trunc_addr(input [31:0] x);
    begin
      trunc_addr = x[AWB-1:0];
    end
  endfunction

  // --------------------------------------------------------------------------
  // Stimulus + Asserts
  // --------------------------------------------------------------------------
  initial begin
    // init inputs
    raddr          = {AWB{1'b0}};
    waddr          = {AWB{1'b0}};
    wd             = 32'h0;
    we             = 1'b0;

    ia             = {AWB{1'b0}};
    instruction_we = 1'b0;
    instruction_wd = 32'h0;

    // idle cycles (first reads likely X unless RAM init elsewhere)
    tick();
    tick();

    // ------------------------------------------------------------------------
    // Instruction memory: write then read back (word-aligned)
    // ------------------------------------------------------------------------
    instr_write(32'h0000_0000, 32'h1111_0000);
    instr_write(32'h0000_0004, 32'h2222_0001);
    instr_write(32'h0000_0008, 32'h3333_0002);

    instr_read_check(32'h0000_0000, 32'h1111_0000);
    instr_read_check(32'h0000_0004, 32'h2222_0001);
    instr_read_check(32'h0000_0008, 32'h3333_0002);

    // Word-alignment: 0x0,0x1,0x2,0x3 all map to word 0
    instr_read_check(32'h0000_0001, 32'h1111_0000);
    instr_read_check(32'h0000_0003, 32'h1111_0000);

    // ------------------------------------------------------------------------
    // Data memory: write then read back (word-aligned)
    // ------------------------------------------------------------------------
    data_write(32'h0000_000C, 32'hAAAA_0003);  // word 3
    data_write(32'h0000_0010, 32'hBBBB_0004);  // word 4

    data_read_check(32'h0000_000C, 32'hAAAA_0003);
    data_read_check(32'h0000_0010, 32'hBBBB_0004);

    // Word-alignment: 0xE maps to word 3
    data_read_check(32'h0000_000E, 32'hAAAA_0003);

    // ------------------------------------------------------------------------
    // Fun: drive BOTH instruction + data read addresses together
    // (They are independent RAMs here, so both should work in parallel.)
    // ------------------------------------------------------------------------
    ia    = trunc_addr(32'h0000_0004);
    raddr = trunc_addr(32'h0000_0010);
    tick();
    assert_eq32(id, 32'h2222_0001, "parallel read: id should be instr word 1");
    assert_eq32(mrd, 32'hBBBB_0004, "parallel read: mrd should be data word 4");

    // ------------------------------------------------------------------------
    // Another write/read quick check
    // ------------------------------------------------------------------------
    data_write(32'h0000_0014, 32'hCCCC_0005);  // word 5
    data_read_check(32'h0000_0014, 32'hCCCC_0005);

    $display("All assertions passed.");
    repeat (3) tick();
    $finish;
  end

endmodule

And you will obtain the following waveform:

Few things to note:

  1. Read data comes out one cycle later after address ia/raddr/waddr is given
  2. From 0 to 55000 ps, id is x because we are technically “reading” from them as we are writing to them in each cycle here.
  3. Writing to instruction memory / data memory is only done when instruction_we or we is high
  4. Data memory was initially “empty” (giving out x)
  5. We started writing to data memory from 95000 ps onwards, on address 0x0C and 0x10 as we is high
  6. Memory data is able to produce what’s written from 125000 ps onwards, based on read address given by raddr

Appendix

Unified Memory Model

In practice, the instruction memory and data memory are only logically segregated. Architecturally, the CPU treats them as separate spaces because they serve different purposes, but physically they can reside in the same RAM device.

What the CPU actually requires in a single cycle is:

  • one instruction fetch (read),
  • one data load (read), and
  • optionally one data store (write).

This access pattern corresponds to a 2-read, 1-write (2R1W) memory.

Physical Realisation on FPGA

Most FPGA block RAMs natively support at most two ports. A true 2R1W memory is therefore not directly available as a single primitive. In practice, FPGA designs implement this behaviour using one of the following techniques:

  1. Separate instruction and data memories (what we do): Instruction memory and data memory are instantiated as two independent RAM blocks. This is simple to reason about and is commonly used in teaching designs.

  2. Replicated memory: Two identical copies of the same memory are created. This technique provides the illusion of a single unified memory with two independent read ports and one write port.

    • one copy services the instruction read port, and
    • the other copy services the data read port.
    • Any write operation updates both copies, ensuring that the two read ports always observe consistent memory contents.

Below is a sample implementation for method (2):

// Unified RAM: 2 read ports (ia + raddr) and 1 write port (waddr)
// Byte-addressed inputs, word-aligned internally (addr >> 2)
//
// Implementation: replicated RAM for the two read ports.
// Any write updates BOTH copies so both reads see the same memory contents.
module memory_unit_2r1w #(
    parameter integer WORDS = 16
)(
    input  wire clk,

    // instruction fetch (byte addressing expected)
    input  wire [$clog2(WORDS)+2-1:0] ia,
    output reg  [31:0]                id,

    // data read (byte addressing expected)
    input  wire [$clog2(WORDS)+2-1:0] raddr,
    output reg  [31:0]                mrd,

    // data write (byte addressing expected)
    input  wire [$clog2(WORDS)+2-1:0] waddr,
    input  wire [31:0]                wd,
    input  wire                       we
);

  localparam integer AW = $clog2(WORDS);

  wire [AW-1:0] ia_word = ia[AW+1:2];
  wire [AW-1:0] ra_word = raddr[AW+1:2];
  wire [AW-1:0] wa_word = waddr[AW+1:2];

  // Two identical copies to get two independent synchronous read ports
  // Tells synthesis tool to implement this array as block RAM (BRAM) instead of flip-flops (LUT RAM)
  (* ram_style = "block" *) reg [31:0] mem_i [0:WORDS-1]; // for instruction read
  (* ram_style = "block" *) reg [31:0] mem_d [0:WORDS-1]; // for data read

  integer k;
  initial begin
    // Optional: init to 0 for simulation friendliness
    for (k = 0; k < WORDS; k = k + 1) begin
      mem_i[k] = 32'h0;
      mem_d[k] = 32'h0;
    end
  end

  always @(posedge clk) begin
    // synchronous reads (1-cycle latency)
    id  <= mem_i[ia_word];
    mrd <= mem_d[ra_word];

    // write updates BOTH copies
    if (we) begin
      mem_i[wa_word] <= wd;
      mem_d[wa_word] <= wd;
    end
  end

endmodule

Note that if you use this construct, then the addresses used in LD, ST and LDR instruction would differ from when you use the separated instruction-data construct.

Timing Semantics

All memory accesses are synchronous:

  • Read data is returned in the next clock cycle.
  • Write data is committed at the end of the current cycle if the write enable is asserted.

As a result:

  • Instruction fetch and data load addresses are presented in cycle N.
  • The corresponding values become visible in cycle N+1.

Addressing Convention (Byte Address In, Word-Aligned)

The memory unit accepts byte addresses at its interface, but stores data internally as 32-bit words. Therefore, the memory is word-aligned:

  • The lowest two bits of every address (addr[1:0]) are ignored.
  • The internal word index is effectively addr >> 2.

This is implemented by slicing the address as follows:

addr_word = addr[$clog2(WORDS)+2-1 : 2];

Consequences:

  • Addresses that differ only in the lowest two bits map to the same word.
  • Unaligned byte or halfword accesses are not supported and are implicitly forced to the nearest aligned word.

A strictly byte-addressable memory could be implemented by storing 8-bit entries and assembling 32-bit words in logic, but this would significantly complicate the design. For the Beta CPU, a word-aligned memory provides a cleaner and more instructive model.

simple_ram.v

/******************************************************************************

   The MIT License (MIT)

   Copyright (c) 2026 Alchitry

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in
   all copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
   THE SOFTWARE.

   *****************************************************************************

   This module is a simple single port RAM. This RAM is implemented in such a
   way that the tools will recognize it as a RAM and implement large
   instances in block RAM instead of flip-flops.

   The parameter WIDTH is used to specify the word size. That is the size of
   each entry in the RAM.

   The parameter ENTRIES is used to specify how many entries are in the RAM.

   read_data outputs the value of the entry pointed to by address in the previous
   clock cycle. That means to read address 10, you would set address to be 10
   and wait one cycle for its value to show up. The RAM is always reading whatever
   address is. If you don't need to read, just ignore this value.

   To write, set write_enable to 1, write_data to the value to write,
   and address to the address you want to write.

   If you read and write the same address, the first clock cycle the address will
   be written, the second clock cycle the old value will be output on read_data,
   and on the third clock cycle the newly updated value will be output on
   read_data.
*/

module simple_ram #(
    parameter WIDTH = 1,                  // size of each entry
    parameter ENTRIES = 1                 // number of entries
  )(
    input clk,                            // clock
    input [$clog2(ENTRIES)-1:0] address,  // address to read or write
    output reg [WIDTH-1:0] read_data,     // data read
    input [WIDTH-1:0] write_data,         // data to write
    input write_enable                    // write enable (1 = write)
  );

  reg [WIDTH-1:0] ram [ENTRIES-1:0];      // memory array

  always @(posedge clk) begin
    read_data <= ram[address];            // read the entry

    if (write_enable)                     // if we need to write
      ram[address] <= write_data;         // update that value
  end

endmodule

simple_dual_port_ram.v

/******************************************************************************

   The MIT License (MIT)

   Copyright (c) 2026 Alchitry

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in
   all copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
   THE SOFTWARE.

   *****************************************************************************

   This module is a simple dual port RAM. This RAM is implemented in such a
   way that Xilinx's tools will recognize it as a RAM and implement large
   instances in block RAM instead of flip-flops.

   The parameter WIDTH is used to specify the word size. That is the size of
   each entry in the RAM.

   The parameter ENTRIES is used to specify how many entries are in the RAM.

   read_data outputs the value of the entry pointed to by raddr in the previous
   clock cycle. That means to read address 10, you would set address to be 10
   and wait one cycle for its value to show up. The RAM is always reading whatever
   address is. If you don't need to read, just ignore this value.

   To write, set write_enable to 1, write_data to the value to write, and waddr to
   the address you want to write.

   You should avoid reading and writing to the same address simultaneously. The
   value read in this case is undefined.
*/
module simple_dual_port_ram #(
    parameter WIDTH = 8,                // size of each entry
    parameter ENTRIES = 8               // number of entries
  )(
    // write interface
    input wclk,                         // write clock
    input [$clog2(ENTRIES)-1:0] waddr,  // write address
    input [WIDTH-1:0] write_data,       // write data
    input write_enable,                 // write enable (1 = write)
    
    // read interface
    input rclk,                         // read clock
    input [$clog2(ENTRIES)-1:0] raddr,  // read address
    output reg [WIDTH-1:0] read_data    // read data
  );
  
  reg [WIDTH-1:0] mem [ENTRIES-1:0];    // memory array
  
  // write clock domain
  always @(posedge wclk) begin
    if (write_enable)                   // if write enable
      mem[waddr] <= write_data;         // write memory
  end
  
  // read clock domain
  always @(posedge rclk) begin
    read_data <= mem[raddr];            // read memory
  end
  
endmodule

Part A: PC Unit

PC Unit Schematic

Here is the suggested PC Unit schematic that you can implement. Take note of the input and output nodes. This will come in very useful when creating the module for your PC Unit.

Here’s a suggested interface:

module pc_unit (
    input clk,
    input rst,
    input slowclk,
    input [15:0] id,
    input [2:0] pcsel,
    input [31:0] reg_data_1,
    output [31:0] pc_4,
    output [31:0] pc_4_sxtc,
    output [31:0] pcsel_out,
    output [31:0] ia
);

Task 1: PCSEL Multiplexers

The 32-bit 5-to-1 PC MUX selects the value to be loaded into the PC register at the next rising edge of the clock depending on the PCSEL control signal.

However, later on we might want to only advance the pc when some slowclk signal is 1 for manual debugging. You should take into account this aspect when building the PCSEL MUX.

XAddr and ILLOP

XAddr and ILLOP in the Beta diagram in our lecture notes represents constant addresses used when the Beta services an interrupt (triggered by IRQ) or executes an instruction with an illegal or unimplemented opcode. For this assignment assume that XAddr = 0x80000008 and ILLOP = 0x80000004 and we will make sure the first three locations of main memory contain BR instructions that branch to code which handle reset, illegal instruction traps and interrupts respectively. In other words, the first three locations of main memory contain:

Mem[0x80000000] = BR(reset_handler)
Mem[0x80000004] = BR(illop_handler)
Mem[0x80000008] = BR(interrupt_handler)

Lower Two Bits of PC

You also have to force the lower two bits of inputs going into the PC+4, PC+4+4*SXTC, and JT port of the MUX to be b00 because the memory is byte addressable but the Beta obtains one word of data/instructions at each clock cycle. You can do this with appropriate wiring using simple concatenation:

Example:

pc_d = {pcsel_out[31:2], 2'b00};

Task 2: RESET Multiplexer

Remember that we need to add a way to set the PC to zero on RESET. We use a two-input 32-bit MUX that selects 0x80000000 when the RESET signal is asserted, and the output of the PCSEL MUX when RESET is not asserted.

We shall use the RESET signal to force the PC to zero during the Beta CPU “startup” later on.

Task 3: 32-bit PC Reg

The PC is a separate 32-bit register that can be built using the register component mentioned in the previous lab.

Task 4: Increment-by-4

Conceptually, the increment-by-4 circuit is just a 32-bit adder with one input wired to the constant 4. It is possible to build a much smaller circuit if you design an adder optimized knowing that one of its inputs is 0x00000004 constant. In Verilog, you can directly concatenate the MSB of pc_q and add the remaining bits by 4 using the + operator.

Task 4: Shift-and-add

The branch-offset adder adds PC+4 to the 16-bit offset encoded in the instruction data id[15:0]. The offset is sign-extended to 32-bits and multiplied by 4 in preparation for the addition. Both the sign extension and shift operations can be done with appropriate wiring—no gates required.

  // compute sign extended C then multiply by 4, add this to PC + 4 later on
  wire [31:0] sxtc_x4 = ({ {16{id[15]} }, id[15:0]}) << 2;

Task 5: Supervisor Bit

The highest-order bit of the PC (PC31/ia31) is dedicated as the supervisor bit (see section 6.3 of the Beta Documentation).

  • The LDR instruction ignores this bit, treating it as if it were zero.
  • The JMP instruction is allowed to clear the Supervisor bit or leave it unchanged, but cannot set it,
  • No other instructions may have any effect on PC31

Setting the Supervisor Bit

Only RESET, exceptions (ILLOP) and interrupts (XAddr) cause the Supervisor bit of the Beta PC to become set.

This has the following three implications for your PC unit design:

  1. 0x80000000, 0x80000004 and 0x80000008 are loaded into the PC during reset, ILLOP and IRQ respectively. This is the only way that the supervisor bit gets set. Note that after reset the Beta starts execution in supervisor mode. This is equivalent to when a regular computer is starting up.

  2. Bit 31 of the PC+4 and branch-offset inputs to the PCSEL MUX should be connected to the highest bit of the PC Reg output, ia31; i.e., the value of the supervisor bit doesn’t change when executing most instructions.

  3. You need to add additional logic to bit 31 of the JT input to the PCSEL MUX to ensure that JMP instruction can only clear or leave the supervisor bit unchanged. Here’s a table showing the new value of the supervisor bit after a JMP as function of JT31 and the current value of the supervisor bit (PC31):

old PC31 (ia31) JT31 (ra31) new PC31
0 0
1 0 0
1 1 1

Testbench

Assuming you used the interface above, you can use this tb:

`timescale 1ns / 1ps

module tb_pc_unit;

  // -----------------------
  // DUT inputs
  // -----------------------
  reg         clk;
  reg         rst;
  reg         slowclk;
  reg  [15:0] id;
  reg  [ 2:0] pcsel;
  reg  [31:0] reg_data_1;

  // -----------------------
  // DUT outputs
  // -----------------------
  wire [31:0] pc_4;
  wire [31:0] pc_4_sxtc;
  wire [31:0] pcsel_out;
  wire [31:0] ia;

  // -----------------------
  // Instantiate DUT
  // -----------------------
  pc_unit dut (
      .clk(clk),
      .rst(rst),
      .slowclk(slowclk),
      .id(id),
      .pcsel(pcsel),
      .reg_data_1(reg_data_1),
      .pc_4(pc_4),
      .pc_4_sxtc(pc_4_sxtc),
      .pcsel_out(pcsel_out),
      .ia(ia)
  );

  // -----------------------
  // Clock gen: 100 MHz (10ns period)
  // -----------------------
  initial clk = 1'b0;
  always #5 clk = ~clk;

  // -----------------------
  // Helpers
  // -----------------------
  function [31:0] sxtc_x4;
    input [15:0] imm;
    begin
      sxtc_x4 = { {16{imm[15]} }, imm} << 2;
    end
  endfunction

  function [31:0] protect_msb;
    input [31:0] old_pc;
    input [31:0] candidate;
    begin
      protect_msb = {old_pc[31], candidate[30:0]};
    end
  endfunction

  task expect32;
    input [1023:0] tag;
    input [31:0] got;
    input [31:0] exp;
    begin
      if (got !== exp) begin
        $display("FAIL: %s got=%h exp=%h @ t=%0t", tag, got, exp, $time);
        $fatal(1);
      end else begin
        $display("PASS: %s = %h @ t=%0t", tag, got, $time);
      end
    end
  endtask

  task expect_align00;
    input [1023:0] tag;
    input [31:0] val;
    begin
      if (val[1:0] !== 2'b00) begin
        $display("FAIL: %s alignment violated val=%h @ t=%0t", tag, val, $time);
        $fatal(1);
      end else begin
        $display("PASS: %s aligned val=%h @ t=%0t", tag, val, $time);
      end
    end
  endtask

  // Pulse slowclk high across a rising edge so the PC register loads the mux output.
  task pc_load_once;
    begin
      @(negedge clk);
      slowclk = 1'b1;
      @(posedge clk);
      #1;
      slowclk = 1'b0;
    end
  endtask

  task wait_cycles(input integer n);
    integer k;
    begin
      for (k = 0; k < n; k = k + 1) @(posedge clk);
      #1;
    end
  endtask

  // Convenience: force a JMP load to a given reg_data_1 value
  task do_jmp_to(input [31:0] target);
    begin
      pcsel = 3'b010;
      reg_data_1 = target;
      #1;
      pc_load_once();
    end
  endtask

  // Convenience: do a branch load with a given immediate
  task do_branch(input [15:0] imm);
    begin
      pcsel = 3'b001;
      id = imm;
      #1;
      pc_load_once();
    end
  endtask

  // -----------------------
  // Wave dump
  // -----------------------
  initial begin
    $dumpfile("tb_pc_unit.vcd");
    $dumpvars(0, tb_pc_unit);
  end

  // -----------------------
  // Main test
  // -----------------------
  initial begin
    // defaults
    rst        = 1'b0;
    slowclk    = 1'b0;
    id         = 16'h0000;
    pcsel      = 3'b000;
    reg_data_1 = 32'h0000_0000;

    // -----------------------
    // Reset: PC reg reset value is 0x8000_0000 per register instantiation
    // -----------------------
    @(negedge clk);
    rst = 1'b1;
    wait_cycles(2);
    rst = 1'b0;
    wait_cycles(1);

    expect32("After reset, ia", ia, 32'h8000_0000);

    // -----------------------
    // PC+4 increment a bit
    // -----------------------
    pcsel = 3'b000;
    pc_load_once();
    expect32("PC+4 #1 ia", ia, 32'h8000_0004);

    pc_load_once();
    expect32("PC+4 #2 ia", ia, 32'h8000_0008);

    // -----------------------
    // IRQ vector: pcsel=100 then load => 0x8000_0008
    // -----------------------
    pcsel = 3'b100;
    pc_load_once();
    expect32("IRQ load ia", ia, 32'h8000_0008);

    // =========================================================================
    // BRANCH: bigger address, then branch back to lower address
    // =========================================================================

    // Put PC at 0x8000_0010
    pcsel = 3'b000;
    pc_load_once();  // 0x8000_000C
    pc_load_once();  // 0x8000_0010
    expect32("Setup PC=0x80000010", ia, 32'h8000_0010);

    // Branch forward by +100 (id=0x0064) => +400 bytes
    // target = protect(old_pc, (old_pc+4) + (sxtc<<2))
    begin : branch_forward_big
      reg [31:0] old_pc, exp_pc4, exp_raw, exp_prot, exp_aligned;
      old_pc = ia;
      exp_pc4 = old_pc + 32'd4;
      exp_raw = exp_pc4 + sxtc_x4(16'h0064);
      exp_prot = protect_msb(old_pc, exp_raw);
      exp_aligned = {exp_prot[31:2], 2'b00};

      pcsel = 3'b001;
      id = 16'h0064;
      #1;
      expect32("branch(+100): pc_4_sxtc combinational", pc_4_sxtc, exp_prot);
      pc_load_once();
      expect32("branch(+100): ia after load", ia, exp_aligned);
      expect_align00("branch(+100): ia alignment", ia);
    end

    // Branch back by -60 (id=0xFFC4) => -240 bytes
    begin : branch_back_lower
      reg [31:0] old_pc, exp_pc4, exp_raw, exp_prot, exp_aligned;
      old_pc = ia;
      exp_pc4 = old_pc + 32'd4;
      exp_raw = exp_pc4 + sxtc_x4(16'hFFC4);  // -60 * 4
      exp_prot = protect_msb(old_pc, exp_raw);
      exp_aligned = {exp_prot[31:2], 2'b00};

      pcsel = 3'b001;
      id = 16'hFFC4;
      #1;
      expect32("branch(-60): pc_4_sxtc combinational", pc_4_sxtc, exp_prot);
      pc_load_once();
      expect32("branch(-60): ia after load", ia, exp_aligned);
      expect_align00("branch(-60): ia alignment", ia);
    end

    // =========================================================================
    // BRANCH that would flip MSB if NOT protected (crossing 0x7FFF_FFFF -> 0x8000_0000)
    // We do:
    //   1) JMP to 0x7FFF_FFF0 while old PC MSB is 1 so JMP clears MSB to 0
    //   2) Branch forward with small positive offset that would raw-cross to 0x8000_0xxx
    //      but protection must keep MSB=0, so it becomes 0x0000_0xxx
    // =========================================================================

    // Step 1: make PC MSB become 0 by JMP to 0x7FFF_FFF0.
    // old_pc[31]=1, reg_data_1[31]=0 => AND => 0, so MSB cleared.
    do_jmp_to(32'h7FFF_FFF0);
    expect32("JMP to 0x7FFFFFF0 should clear MSB (PC becomes 0x7FFFFFF0)", ia, 32'h7FFF_FFF0);
    expect_align00("JMP 0x7FFFFFF0 alignment", ia);

    // Step 2: choose id so raw target crosses into 0x8000_xxxx.
    // old_pc=0x7FFF_FFF0
    // pc+4 = 0x7FFF_FFF4
    // want pc+4 + offset = 0x8000_0004 (raw) => offset = +0x0010 (16 bytes) => imm=4
    begin : branch_cross_msb_protection
      reg [31:0] old_pc, exp_pc4, exp_raw, exp_prot, exp_aligned;
      old_pc = ia;  // 0x7FFF_FFF0
      exp_pc4 = old_pc + 32'd4;  // 0x7FFF_FFF4
      exp_raw = exp_pc4 + sxtc_x4(16'h0004);  // +16 => 0x8000_0004 (raw)
      // protection must keep MSB=0 (old_pc[31]=0), so expected is 0x0000_0004
      exp_prot = protect_msb(old_pc, exp_raw);
      exp_aligned = {exp_prot[31:2], 2'b00};

      pcsel = 3'b001;
      id = 16'h0004;
      #1;
      expect32("branch(cross): raw would be 0x80000004, protected pc_4_sxtc", pc_4_sxtc, exp_prot);
      pc_load_once();
      expect32("branch(cross): ia after load should keep MSB=0", ia, exp_aligned);
      expect_align00("branch(cross): ia alignment", ia);

      // Extra explicit check of the expected literal in this scenario
      expect32("branch(cross): expected literal ia", ia, 32'h0000_0004);
    end

    // =========================================================================
    // JMP MSB behavior when PC31 is 0:
    // Once PC31=0, JMP to 0x8000_001C should become 0x0000_001C
    // because (old_pc[31] & reg_data_1[31]) = 0 & 1 = 0.
    // =========================================================================

    // Ensure PC31=0 already (it is from previous branch).
    if (ia[31] !== 1'b0) begin
      $display("FAIL: expected PC31=0 before JMP MSB test, ia=%h @ t=%0t", ia, $time);
      $fatal(1);
    end

    begin : jmp_msb_clear_when_pc31_zero
      reg [31:0] old_pc, exp_out, exp_aligned;
      old_pc = ia;  // MSB 0
      pcsel = 3'b010;
      reg_data_1 = 32'h8000_001C;  // MSB 1
      #1;

      exp_out = {(old_pc[31] & reg_data_1[31]), reg_data_1[30:0]};  // should be 0x0000_001C
      exp_aligned = {exp_out[31:2], 2'b00};

      expect32("jmp(PC31=0, target=0x8000001C): pcsel_out combinational", pcsel_out, exp_out);
      pc_load_once();
      expect32("jmp(PC31=0): ia after load should be 0x0000001C", ia, exp_aligned);
      expect32("jmp(PC31=0): expected literal ia", ia, 32'h0000_001C);
      expect_align00("jmp(PC31=0): ia alignment", ia);
    end

    // =========================================================================
    // JMP that would set MSB to 1 only if allowed.
    // With PC31=0, even if reg_data_1[31]=1, MSB must stay 0.
    // With PC31=1, reg_data_1[31]=1, MSB stays 1.
    // =========================================================================

    // Case A: PC31=0, reg_data_1[31]=1 -> stays 0
    do_jmp_to(32'hFFFF_FFFC);  // reg_data_1[31]=1 but AND with old_pc[31]=0 => MSB=0
    expect32("JMP with PC31=0 to 0xFFFFFFFC should still clear MSB", ia, 32'h7FFF_FFFC);
    // Explanation for the literal above:
    // pcsel_out = {0 & 1, reg_data_1[30:0]} = {0, 0x7FFF_FFFC} = 0x7FFF_FFFC
    expect_align00("JMP to 0x7FFFFFFC alignment", ia);

    // Case B: Force PC31=1 again via IRQ vector, then JMP with reg_data_1[31]=1 keeps MSB=1
    pcsel = 3'b100;
    pc_load_once();
    expect32("IRQ again sets PC MSB=1", ia, 32'h8000_0008);

    begin : jmp_keep_msb_when_pc31_one
      reg [31:0] old_pc, exp_out;
      old_pc = ia;  // MSB 1
      pcsel = 3'b010;
      reg_data_1 = 32'h9000_0011;  // MSB 1, unaligned low bits
      #1;

      exp_out = {(old_pc[31] & reg_data_1[31]), reg_data_1[30:0]};  // MSB stays 1
      pc_load_once();
      expect32("jmp(PC31=1,target msb=1): ia aligned", ia, {exp_out[31:2], 2'b00});
      expect_align00("jmp(PC31=1): ia alignment", ia);
    end

    $display("ALL TESTS PASSED");
    $finish;
  end

endmodule

If all works well, you should get the following waveform and message:

The testbench is design to test the following critical scenarios:

  1. Test RESET, IRQ, and ILLOP cases
  2. Test JMP, BEQ/BNE cases (both + and - memory addresses)
  3. PC31 protection (Cleared via JMP, attempt to set via JMP, and BNE/BEQ)