Computer Architecture/C.A (ETH Zürich, Spring 2020)

Lecture 12: Microarchitecture II

Tony Lim 2021. 6. 22. 15:53

destination address is chosen in instruction through 11~15bits. 

RegWrite control singal is ON (1)

in R-type we have rs , rt to read so second mux choose read data2.

ALUOp singal tells funct module to needed ALU operation with those 2 read registers.

we don't read or write to memory so both of them are 0

result is written in Write register

R-type doesn't change control flow so PC+4 is chosen

 

destination register comes from 20~16bits in instruction 

and immediate get sign extended and do ALU operation with given opcode instruction.

 

Load Word is simliar to I - type operation but we do add operation with source register and given immediate and calcualte address and load that address's value to destination register.

 

Store Word is similar as LW but instead of reading we write on memory

 

calculate condition , comparing first register and second register in ALU.  X are don't care because we don't write to registers.

and in orange add gate it checks if it is "branch" operation and condition matches. in this case it is not taken so we just do PC + 4.

 

this case orange and gate output is one so we take ALU result.

 

we choose our next PC from 25~0 bits from instruction and choose that in mux.

notice that below modules are not used at all. we will cover in the next lecture how we can make use of them.

 

Single Cycle Processor

Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction

assumption is not realistic just for study.

shows different types of instructions , what stage do they go through and how many time it takes to execute each steps.

 

Inefficient

  • All instruction run as slow as the slowest instruction
  • must provide worst case combinational resources in parallel as required by any instruction -> need bunch of different module -> hardware size increases.
  • not easy to optimize or improve performance

 

Microarchitecture Design Principles

Critical path design

  • find and decrease the maximum combinational logic delay
  • break a path into mutiple cycles if it takes too long.

Bread and butter (common case) design

  • Spend time and resource on where it matters most

Balanced design

  • Balance instruction / data flow through hardware componets.
  • Design to eliminate bottlenecks : blanace the hardware for the work.

but single cycle architecture violates all of them.

 

Muti Cycle Mircoarchitectures

Determine clock cycle time independently of instruction processing time

Each instruction takes as many clock cycles as it needs to take

nubmer of cycle depends on instruction might be many or less.

 

Benefits of Muti-Cycle design

Critical path design

  • can keep reducing the critical path independently of the worst case prcoessing time of any instruction

Bread and butter (cmmon case) design

  • can optimize the nubmer of states it takes to execute "important" instructions that make up much of the execution time

Blanaced dsign

  • No need to provide more capability or resources than really needed
  • An instruction that needs resource X mutiple times doesn't require mutiple X's to be implemented
  • Leads to more efficent hardware -> can reuse hardware components needed mutiple times for an instruction 

 

Downside of Multi Cycle Design

Need to store intermediate results , overhead for registers

 

 

unlike single cycle we can use ALU for various state(cycle)  , we have register to store certain data at each stage and use it on later cycle.

every instruction can be split into small pieces

 

IorD(the first mux) shoud be zero so we can take PC into Instr/Data Memory as address.  

IRwirte =1 to store instruction in register(blue circle on the right)

can update PC to PC+4.

 

it is not updateing memory or register so there are many X (don't care).

we don't really know what is instruction at this point , we are setting control unit and trying to figure out what is instruction.

next state is conditional.

 

if instructions were either LW or SW we come to this memory calculation state.

read first register (source register) and do ALU (add) with sign extended intermmediate and store result in blue circled register for later use.

 

there is branch if opcode is LW we read memory and write into destination register. and go back to fetch state.

 

every instructions are runned by control unit which is implemented as FSM.

if other instruction need to be add than we just make more branch and use give moudle with control unit.