Skip to main content

Posts

Reservation Station Controls

The control hardware used for dispatching instruction from the reservation stations is described in this post. For gaining the background on this topic you can refer here . As discussed previously, the control hardware is divided into three different parts for making the implementation a bit easier. We will now go through each of them, in detail, to understand their complete working. The Allocating Unit After the instructions are decoded, they are sent to the reservation station. This transfer of instruction from the Decode stage to the Reservation station is facilitated by the Allocating Unit. It performs the following actions: Read if any valid instructions are coming form decode. Keep a track of empty entries in the reservation station(s). In case of Distributed Reservation Station, the unit also has to decide to which station it has to transfer a particular instruction. Implementing (1) and (3) is simple and just involves reading the validity bit and instruction ...

Designing a mini Superscalar Processor

Sub-parts of a Superscalar Processor The designing of the processor is done by dividing the task into multiple smaller parts. This helps in concentrating on one problem at a time and also helps us keep a track of how the  things are going on overall. Here I will be designing a 2-way fetch out of order superscalar microprocessor - IITB-RISC . The ISA for the same can be found here . I have divided the overall designing into the following sub-designs: Fetch Decode Dispatch Execute Complete Retire Design implemented can be found here . Instruction Fetch For fetching 2 instructions per cycle, we need two port read instruction memory. As we always read from 2 consecutive addresses, we need an address register for storing the current address (PC). Also for storing the instructions fetched, we need 2 additional registers (Register1 and Register2). For improving the efficiency of the processor, we will be using a Branch Predictor which will help from stalling t...

Execution Unit: Mini Superscalar Processor Design

In my processor, I have used 5 execution units, namely: 2 ALUs, 1 Branch Handling Unit, 2 Load/Store Units. The overview of the system is shown in the figure below and the details are explained ahead. After the detailed execution units' view, the problem of R7's equivalence to PC is solved. Arithmetic Logic Unit, The ALU The instructions which are executed by this unit are: ADD ADC ADZ ADI NDU NDC NDZ The result along with the carry and zero flags are then broadcasted when the execution is finished. Before completing the instruction these flags are updated and checked against the old values in deciding whether or not to write back to the register file. This is as single cycle stage and can be visualized as: The Branch Handling Unit The instructions executed by this unit are: BEQ JAL JLR The value of PC is either used or modified in these instructions. This unit consists of an adder and an equality check unit. Depending on the in...

The Re-order Buffer

Structure A typical entry in the reorder buffer can be visualized as: Busy Instruction Type Validity Register Affected Data 1 bit 2 bit 1 bit 3 bit 16 bits The values shown above are in accordance with the designed mini-superscalar processor. Whenever an instruction is decoded, an entry is allotted to it in the reorder buffer. Index corresponding to this is returned to the corresponding instruction. The busy bit is set to '1' whenever an entry is allotted to an instruction and is cleared to '0' whenever the instruction is declared as completed. Completion : An instruction is changed to be completed when all of its preceding instructions are completed. To manage this, a register (R_old) is defined which holds the index value to the oldest instruction. If the validity of this is set to '1' by the execute stage, this instruction is...