pentium shares 4 stages in common : Fetch , Decode-1 , Decode-2 , Write
when instruction reaches the execute phase it enters a more specialized pipeline. depth of pipeline can be different
Petium's basic integer pipeline has 5 stages
1. Prefetch/Fetch == fetched from instruction cache and aligned in prefetch buffers for decoding.
2. Decode-1 == based on set of hardware-based rules instructions are decoded. Branch prediction happens here.
3. Decode-2 == instruction that require the mcirocode ROM are decdoed here and address computation
4. Execute == the integer hardware ALU executes the instruction.
5. Write-back == wirte the result in register file.
Branch Prediction
sepculative execution == used to keep the delays associated with evalutating bracnhes from introducing bubbles into the pipeline.
if prediction goes wrong , pipeline needs to be flushed and it takes long time to fill it in.
static branch prediction == simple , relies on the assumption that it will be loop.
dynamic branch prediction == based on program's past behavior. use 2 tables
branch history table(BHT) == creates an entry for each conditional branch that the BU ahs encountered on its last few cycles. also includes some bits that indicate th elikelihood taht the branch willl be taken based on its past history.
branch target buffer(BHB) == stores the brach target.
if entry is not in BHT then then use static branch prediction.
The Floating-Point ALU
to calculate 2 floating-point numbers one of the number must be in the stack top an dthe other can be in any of the other registers.
fadd ST, ST(5) == ST = ST +ST(5)
compiler alone couldn't overcome two-perand lmmit and the stack-based limit.
so microacrchitectural hack comes in "fxch" almost free , what this does is swap any element of the stack with stack top.
Pentium couldn't perfom well because 30% of transistors are for legacy
The Intel P6 Microarchitecture: The Pentium Pro
this orginal pentium has 2 major drawbacks (static execution).
1. adapts poorly to the dynamic and ever-changing code stream.
2. makes poor use of wider superscalar hardware.
make up of codestream , application changes but rules for sceduling execution on Pentium's backend is fixed.
unlike orginal Pro has a Reorder buffer. when the buffer is adequately full , we can use dynamic scheduling logic.
Issue phase == instruction wait there for moment and go to execute phase and they may be out of program order also eliminate bubble from frontend if only frontend is faster than backend
completion phase == instructions that have finished executing wait in a second buffer to have their results wrriten back to the register file in program order.
Reservation station == where newly decoded instructions go. wait until all of its execution requirements are met. by using buffer we can dispatch 3 instructions per cycle. also 5 is possilbe . flexible.
Reorder Buffer == it's job is to ensure that the finished instructions get put back in program order.
The Instruction window == ability to see some future
P6's long pipepine has 2 effects
1. since each of the stages is shroter an dsimpler an dcan be completed quicker Intel can crank up the processor's clock speed.
2. allows the processor to hide hiccups in the fetch and decode stages. but on the downside if something goes wrong whole thing needs to be flushed
The use of such register-to-memory and memory-to-memory format let progammer focus on coding more because processor takes care of scheduling memeroy traffic
In case of RISC ISA compiler took care of memeroy accesses and other tpyes of code not the processor so it can be more fast and efficient
P6 complex decoder works in conjunction with the microcode ROM to handle the really complex legacy instructions which are translated into sequences of micro-ops that are read directly from the ROM.
'Computer Architecture > Inside the Machine' 카테고리의 다른 글
Intel's Pentium 4 vs. Motorola's G4e: approaches and design philosophies (0) | 2021.01.09 |
---|---|
PowerPC Processor :600 series , 7 0 0 Series ,and 7400 (0) | 2021.01.05 |
Superscalar Execution (0) | 2020.12.29 |
Pipelined Execution (Chapter3) (0) | 2020.12.27 |
The Mechanics of Program Execution (Chapter 2) (0) | 2020.12.26 |