728x90

Computer Architecture 29

ML HW & Systems. Lecture 5: Microarchitecture

from big picture (Architecture) to actual unit design (Mircro Architecture) , will not cover Circuit  accumulator = adding number (need to hold a number) , adder doesn't need to hold statethis takes O(N) to compute dot product of length n vectors. e.g) 8 cycles if n is 8 usping multiple multiplier will be faster but how to merge the results ,need adder treenow it takes 2 cycle to calculate 8 len..

Lecture 21b: Memory Hierarchy and Caches

Sram is used for L1,L2 caches which is more expensive. having higher bandwidth. Cache access index into the tag and data stores with index bits in address check valid bit in tag store compare tag bits in address with the stored tag in tag store every block ( for example 00010 , 01010 , 10010, 11010 4 blocks) can be mapped to exact same place. and with tag we can find which cluster of block is th..

Lecture 20: Graphics Processing Units

Promgramming Model = how the programmer expresses the code e.g) Sequential (von Neumann) , Data Parallel (SIMD) , Dataflow, Multi-threaded (MIMD, SPMD) Execution Model = how the hardware executes the code underneath e.g) Out of order execution , Vector processor , Array processor , dataflow processor, Multithreaded processor GPU = SPMD(Single program Multiple data) model implemented by a SIMD pr..

Lecture 15b: Out of Order , DataFlow & LD/ST Handling

Reverse Engineer and create Data flow by looking at first picture we can create second picture's right , which is data flow graph. and by looking at data flow graph we can create left instructions. Out of Order Execution with Precise Exception user reorder buffer to reorder instructions before committing them to architectural state instruction updates the Register alias table(RAT, frontend regis..

Lecture 15a: Out-of-Order Execution

for example, we are trying to excute the red instruction "Add R3 R4". we need R3 so we check reorder buffer whether it is valid or not if not valid we stall the instruction. if valid we can take bypass and take value from reorder buffer. notice below red instruction there are no depdency. but since in red instruction R3 has to wait for IMUL instrction (8 cycles) , other 3 blue line instruction s..

728x90