Microarchitecture

The basic idea behind Vondel's design is to follow some tips from the book Structured Computer Organization - Andrew S. Tanenbaum while implementing the ability to perform operations even if the main clock is in a non-edge level in order to reduce clock cycles.

The implementation of this design is provided by the uarch module.

Datapath structure

The way that a datapath is structured is not far from the usual three-bus design plus a IFU, the techniques involved a rather simple for now as you can see the diagram below:

Data path diagram

This implementation provides 24 registers in total: 5 memory registers, 3 system registers (to manage function calls and variables) and 16 general 16 general purpose registers (from R0 to R15). The register list is as following:

  • Memory
    • MAR (Memory Address Register): 20 bits
    • MDR (Memory Data Register): 32 bits
    • PC (Program Counter): 20 bits
    • MBR (Memory Buffer Reader): 8 bits
    • MBR2 (Memory Buffer Reader 2): 16 bits
  • System
    • SP (Stack Pointer): 20 bits
    • LV (Local Variables): 20 bits
    • CPP (Constant Pool Pointer): 20 bits
  • General Purpose
    • R0: 32 bits
    • ...
    • R15: 32 bits

Each cycle the datapath is driven by a Microinstruction provided by the Control Store. For more info see the next chapter.

Data parallelism

For achieve data parallel processing we will use two datapaths with one ALU each, but the clock trigger of one is the opposite of the other, that is, ALU1 is falling-edge triggered and ALU2 is rising-edge triggered. Furthermore, the clock signal received in ALU2 is a function of the clock signal of ALU1.

Let's say that ALU1 has a clock α1 and the delayed version of this signal is α1'. So the ALU2 must have a clock α2 = (α1 ∧ α1') plus a delay of δ1 as shown below:

Clock relation diagram

In other words, ALU1 will start its operation cycle on the falling-edge of α1 and end the operation on the rising-edge, while ALU2 will start its operation cycle on the rising-edge of α2 and end the operation on the falling-edge, taking advantage of the main clock (α1) even when it is at high level.

The advantage of this method is that we can share the control store and all the registers without big cost in the Misconstruction size, external hardware components and additional logic steps that may cost some clock cycles. The way that the components are shared is shown below:

Shared components diagram

Those two datapaths will become a thread in a future design of this Microarchitecture that is planned to have two task parallel threads.