Lect5 Pipelining1

Lecture5 Lecture5
Pipelining+Hazards
Pipelining
Basics
EE/CS520 Comp.Archi.
9/12/201
2
2
FundamentalExecutionCycle
Instruction
Fetch
Obtaininstructionfrom
InstMemory
Fetch
Instruction
Decode
Determinerequired
actionsand
instruction size
&Operand
Fetch
instructionsize
Locateandobtain
operanddata
Execute
Result
Computeresultvalueor
statusorcondition
Deposit results in
Result
Store
Next
Depositresultsin
storageforlateruse
Determine successor
Instruction
Determinesuccessor
instruction
9/12/2012
3
HowtoImprovePerformance
Basicideaistoreducetheexecutiontime
Increasetheclockfrequency
Workinparallelonmultipledata
Parallelism
Serialize the operations like an assembly line Serializetheoperationslikeanassemblyline
Pi li i Pipelining
9/12/2012
4
Example:CarAssemblyLine
T1 T2 T3 T4 T5 T6
Oneworkerdoingallthework
Latency:6timeunit
Thruput:1carevery6timeunits
Oneworkerdoingallthework
Latency:6timeunit
Thruput:1carevery6timeunits
T
i
m
ee
9/12/2012
5
Howto increase the production? Howto increase the production? Howtoincreasetheproduction? Howtoincreasetheproduction?
Dedicate one worker for each elementary task Dedicateoneworkerforeachelementarytask
9/12/2012
6
T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T6
workdividedinto6worker
Latency:6timeunit
Thruput:1carevery1timeunit
workdividedinto6worker
Latency:6timeunit
Thruput:1carevery1timeunit
9/12/2012
7
ProcessorPipelining
Atechniquewherebymultipleinstructionsare
overlappedinexecution
Takesadvantageoftheparallelismexistedamong
different actions needed to execute an instruction differentactionsneededtoexecuteaninstruction
Processorcycle:Timerequiredbyoneinstructionto
moveaheadapipelinestage p p g
Sinceallstagesaresynchronized,thesloweststage
determinestheprocessorcycle
N ll it i 1 l k l ( ti 2) Normallyitis1clockcycle(sometimes2)
9/12/2012
8
ProcessorPipelining
VonNeumannexecutioncycle:
IF:InstructionFetch
ID:InstructionDecode
E I t ti E t Ex:InstructionExecute
WB:WriteBacktheResult
4 cc
IF IF ID ID Ex Ex WB WB
NonpipelinedController:
4cc
1cc 1cc 1cc 1cc
PipelinedController: i
i+1
1cc
1cc
9/12/2012
i+2
i+3
9
ProcessorPipelining
Furtherdividingtasksintosubtasks:
1)Wedecreasethecycletime,
2) We increase the no of stages in the pipeline
i
i+1
2)Weincreasetheno.ofstagesinthepipeline
IF IF ID ID Ex Ex WB WB IF IF WB WB Ex Ex ID ID
IF IF ID ID EE WB WB IF IF WB WB EE ID ID
0.5cc 0.5cc
0.5cc
i+1
i+2
i+3
i+4
i 5
p
i
p
e
l
i
n
e
IF IF ID ID EE WB WB IF IF WB WB EE ID ID
i+5
8
s
t
a
g
e
Timeperinstruction
Timeperinstruction
pipelinedmachine
=
Timeperinstruction
unpipelinedmachine
Numberofpipelinestages
9/12/2012
10
MIPS5StagePipeline
9/12/201
2
11
MIPS5StagePipeline
InstFetch(IF):
SendthePCtoinst.memoryandfetchthecurrentinst.
I t D d /R F t h (ID) InstDecode/Reg.Fetch(ID):
Decodetheinst.
Read the required src registers Readtherequiredsrc.registers.
Doequalitytestonregs(forpossiblebranch)
Computepossiblebranchaddressbyaddingsign
extendedoffsettoPC
Signextendedimmediateisalsocalculated
9/12/2012
12
MIPS5StagePipeline
Execute(EX):3differenttypes
memoryreference:ALUaddsbaseregisterandoffsetto
formtheeffectiveaddress
regreginst:ALUperformstherequiredarith/logic
operation on src regs operationonsrcregs.
regimminst:ALUperformsthesameonthesrcregand
signextendedimmvalue
9/12/2012
13
MIPS5StagePipeline
Mem.Access(MEM):
Ifload,readdatamemoryfromtheaddresscalculatedinEX
Ifstore,writesrcregtomemory
WriteBack(WB):
Reg Reg or Load type RegRegorLoadtype
Writetheresulttoregfile
9/12/2012
14
MIPS5StagePipeline
Time(clockcycles)
C l 1 C l 2 C l 3 C l 4 C l 6 C l 7 C l 5
I
n
s
Cycle1 Cycle2 Cycle3 Cycle4 Cycle6 Cycle7 Cycle5
Reg
A
L
U
DMem Ifetch
Reg
s
t
r.
Reg
A
L
U
DMem Ifetch
Reg
O
r
d
e
Reg
A
L
U
DMem Ifetch
Reg
R
L
U
DM If t h
Reg e
r
Reg
A
L
DMem Ifetch
Reg
9/12/2012
15
MIPS5StagePipeline
Time(clockcycles)
C l 1 C l 2 C l 3 C l 4 C l 6 C l 7 C l 5
I
n
s
Reg
A
L
U
DMem Ifetch
Reg
s
t
r.
Reg
A
L
U
DMem Ifetch
Reg
O
r
d
e
Reg
A
L
U
DMem Ifetch
Reg
e
r Reg
A
L
U
DMem Ifetch
Reg
Insertionofpipeliningregisterstoavoidinterstageinterference Insertionofpipeliningregisterstoavoidinterstageinterference
9/12/2012
16
BasicPerformanceIssues
Pipeliningincreasesthethroughputbutthelatency
remainsunchanged
Infact,slightlyincreasedduetocontroloverhead
Puts the limit on practical depth of a pipeline Putsthelimitonpracticaldepthofapipeline
Cantaffordhugelatencyonasingleinstruction,ifitpasses
through(say)100stagesofapipeline
Subdividingpipelinestagesdecreasestheperstage
executiontime
Buttowhichextent??
Dictatedbythepipelineregisterdelay+clockskew y p p g y
9/12/2012
17
BasicPerformanceIssues
Pipelineregisterdelay:
Setuptimeneededbypipelineregisterforitsinputto
becomestablebeforeitcouldbewritten
Clock skew: Clockskew:
Sameclocksignalarrivingatdifferentpartsofthe
designwithdifferentphasesisknownasskew. g p
9/12/2012
18
Pipelining:Example
Unpipelinedprocessor
OpType Freq. Exe. Time
ALU ops 40% 4 CC
ClockCycle=1ns
Pipelineoverhead=0.2ns
ALUops 40% 4 CC
Branches 20% 4CC
Memops 40% 5CC
Speedupwhenpipelined?
Avg.instexecutiontime=clockcyclexavg.CPI
=1nsx((0.4+0.2)x4+0.4x5)
= 4.4 ns
Avg.instexecutiontime=clockcyclexavg.CPI
=1nsx((0.4+0.2)x4+0.4x5)
= 4.4 ns 4.4ns
Avg.executiontimewhenpipelined=1ns+0.2ns=1.2ns
Speedup = 4 4/1 2 = 3 7 times
4.4ns
Avg.executiontimewhenpipelined=1ns+0.2ns=1.2ns
Speedup = 4 4/1 2 = 3 7 times Speedup=4.4/1.2=3.7times Speedup=4.4/1.2=3.7times
9/12/2012
19
PipelineHazards
9/12/201
2
20
MajorHurdlestoPipelining
Asimplepipelinewouldworkjustfineif
Alltheinstructionswereindependentofeachother p
Doesnothappeninreallife!!!
Hazards prevent next insts execution during its Hazards preventnextinst sexecutionduringits
designatedclockcycle
Structuralhazards:attempttousethesamehardwaretodotwo p
differentthingsatonce
Datahazards:Instructiondependsonresultofpriorinstruction
still in the pipeline stillinthepipeline
Controlhazards:Arisefromthepipeliningofbranchesthat
changethePC g
9/12/2012
21
StructuralHazards
Somecombinationofinstructionscantbeexecuteddue
toresourceconflicts
Examples:
If a funct unit is not fully pipelined such as a multiplier or divider Ifafunct.unitisnotfullypipelined,suchasamultiplierordivider
Ifaresourcehasnotbeenduplicatedenough,suchasaregfilehas
onlyonereadportbutpipelineneedstworeadsinonecycle.
A single shared memory for insts and data Asinglesharedmemoryforinstsanddata
9/12/2012
22
StructuralHazards
I
n
Load
Reg
A
L
U
DMem Ifetch
Reg
s
t
O
Inst1
I 2
Reg
A
L
U
DMem Ifetch
Reg
U
O
r
d
e
Inst2
Inst3
Reg
A
L
U
DMem Ifetch
Reg
Reg
A
L
U
DMem Ifetch
Reg
e
r
Inst4
9/12/2012
23
SolutiontoStructuralHazards
Time(clockcycles)
I
n
Load Reg
A
L
U
DMem Ifetch
Reg
s
t
O
Inst1
I t 2
Reg
A
L
U
DMem Ifetch
Reg
Bubble Bubble Bubble Bubble Bubble
O
r
d
e
Inst2
Stall
Reg
A
L
U
DMem Ifetch
Reg
Bubble Bubble Bubble Bubble Bubble e
r
Inst3 Reg
A
L
U
DMem Ifetch
Reg
9/12/2012
24
WhyAllowStructuralHazards?
Reductionofoverallcost
A1portmemoryismuchcheaperthana2portmemory
B th i Sili A d P C ti BothinSiliconAreaandPowerConsumption
Key point: If structural hazard is rare it may not be Keypoint:Ifstructuralhazardisrare,itmaynotbe
worththecosttoavoidit
9/12/2012
25
DataHazards
Occurwhenpipelinechangestheorderof
read/writeaccessestooperandsascomparedto
unpipelinedexecution
Example:
DADD R1,R2,R3
DSUB R4 R1 R5
DADD R1,R2,R3
DSUB R4 R1 R5 DSUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11
DSUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11 , , , ,
9/12/2012
26
DataHazards
I
n
DADDR1,R2,R3
Reg
A
L
U
DMem Ifetch
Reg
s
t
O
Reg
A
L
U
DMem Ifetch
Reg
U
DSUBR4,R1,R5
O
r
d
e
ANDR6,R1,R7 Reg
A
L
U
DMem Ifetch
Reg
Reg
L
U
DMem Ifetch
Reg
e
r
ORR8,R1,R9
XORR10,R1,R11
Reg
A
L
DMem Ifetch
Reg
Reg
A
L
U
DMem Ifetch
Reg
9/12/2012
27
ThreeGenericDataHazards
ReadAfterWrite(RAW)
Inst
J
triestoreadoperandbeforeInst
I
writesit
J
p
I
I: add r1,r2,r3
J : sub r4 r1 r3 J : sub r4,r1,r3
CausedbyaDependence(incompilernomenclature).
Thishazardresultsfromanactualneedfor
communication.
9/12/2012
28
WriteAfterRead(WAR)
Inst
J
writesoperandbefore Inst
I
readsit
J
p f
I
I: sub r4,r1,r3
J : add r1,r2,r3
Causedbyanantidependence
This results from reuse of the name r1
K: mul r6,r1,r7
Thisresultsfromreuseofthename r1 .
CanthappeninMIPS5stagepipeline:
Allinstructionstake5stages
Readsarealwaysinstage2
Writesarealwaysinstage5 y g
9/12/2012
29
WriteAfterWrite(WAW)
Inst writes operand before Inst writes it Inst
J
writesoperandbefore Inst
I
writesit.
I: mul r1,r4,r3
J : add r1,r2,r3
Causedbyanoutputdependence
K: sub r6,r1,r7
Duetoreuseofthenamer1
CanthappeninMIPS5stagepipeline:
Allinststake5stages,and Writesarealwaysinstage5
9/12/2012
30
Solution:RAW DataHazards
Onesolution:Compilermustcheckthedependences:
Ifneeded,addsNOPinstructions(stalls):
i1:add R2,R1,R3 #R2:=R1+R3
i2:NOP
i3:NOP
i4 : sub R7 R2 3 #R7 := R2 3 i4:sub R7,R2,3 #R7:=R2 3
9/12/2012
31
PipelinePerformancewithStalls
i li d i i A
d unpipeline inst time Average
Speedup
pipelined inst time Average
p p
i li d ti l Cl k i li d CPI
d unpipeline time cyle Clock x d unpipeline CPI

pipelined time cyle Clock x pipelined CPI
AssumeClockcycletimeremainsunchanged
pipelined CPI
d unpipeline CPI
Speedup
CPIpipelined=IdealCPI+StallCPI
1 S ll CPI
CPI St ll 1
d unpipeline CPI
Speedup
=1+StallCPI
CPI Stall 1
9/12/2012
32
PipelinePerformancewithStalls
Assumeeveryinsttakessameno.ofcycle=theno.ofpipelinestages(depthofpipeline)
CPI unpipelined = Pipeline depth
4cc
CPIunpipelined Pipelinedepth
CPI Stall 1
depth Pipeline
Speedup
Hence,
CPI Stall 1
Ideallyiftherearenostallsthen,
Speedup=Pipelinedepth
9/12/2012
33
Solution:DataForwarding
DADD R1,R2,R3
DSUB R4,R1,R5
AND R6 R1 R7
DADD R1,R2,R3
DSUB R4,R1,R5
AND R6 R1 R7
Key insight: Result is not really needed by DSUB
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11
Keyinsight:ResultisnotreallyneededbyDSUB
untilafteritisproducedbyDADD
Basic idea: Basicidea:
Cantheresultbemovedfromthepipelineregister
whereDADDstoresittowhereDSUBneedsit?Ifyes,
f d h l forwardtheresult
TheresultfromEX/MEMandMEM/WBpipelineregisterisfed
backtoALUinputs
IfwedetectthatpreviousALUoperationsdstregisterissame
asthecurrentoperationssrcregister,theforwardingcontrol
logicselectstheforwardedvalue
9/12/2012
34
DataForwarding
I
n
Reg
A
L
U
DMem Ifetch
Reg
DADDR1,R2,R3
s
t
O
Reg
A
L
U
DMem Ifetch
Reg
U
DSUBR4,R1,R5
O
r
d
e
Reg
A
L
U
DMem Ifetch
Reg
Reg
L
U
DMem Ifetch
Reg
ANDR6,R1,R7
e
r
Reg
A
L
DMem Ifetch
Reg
Reg
A
L
U
DMem Ifetch
Reg
ORR8,R1,R9
XORR10,R1,R11
9/12/2012
35
DataForwarding
NextPC
M
E
A
m
u
x
R
e
g
i
s
M
E
M
/
W
R
I
D
/
E
X
X
/
M
E
M
Data
Memory
A
L
U
m
u
x
s
t
e
r
F
i
l
e
Immediate
m
u
x
9/12/2012
36
DataForwarding
DADD R1,R2,R3
LD R4,0(R1)
DADD R1,R2,R3
LD R4,0(R1)
SD R4,12(R1) SD R4,12(R1)
I
n
DADDR1,R2,R3
Reg
A
L
U
DMem Ifetch
Reg
s
t
O
LDR4,0(R1)
Reg
A
L
U
DMem Ifetch
Reg
U
O
r
d
e
SDR4,12(R1)
Reg
A
L
U
DMem Ifetch
Reg
e
r
SDimmediatelyfollowingLDneeds:
data forwarding from MEM/WB pipeline register to DMEMinput as well
SDimmediatelyfollowingLDneeds:
data forwarding from MEM/WB pipeline register to DMEMinput as well dataforwardingfromMEM/WBpipelineregistertoDMEMinputaswell dataforwardingfromMEM/WBpipelineregistertoDMEMinputaswell
9/12/2012
37
DataForwarding
NextPC
M
E
A
m
u
x
R
e
g
i
s
m
u
x
M
E
M
/
W
R
I
D
/
E
X
X
/
M
E
M
Data
Memory
A
L
U
m
u
x
s
t
e
r
F
i
l
e
y
Immediate
m
u
x
9/12/2012
38
DataForwarding
Canwecompletelyavoiddatahazardsusingforwarding?
NO:TherewillbeLoadUseDelays(LUD)inacode NO:TherewillbeLoadUseDelays(LUD)inacode
9/12/2012
39
DataHazardsrequiringStalls
LD R1,0(R2)
DSUB R4,R1,R5
AND R6 R1 R7
LD R1,0(R2)
DSUB R4,R1,R5
AND R6 R1 R7
I
n
LD
Reg
A
L
U
DMem Ifetch
Reg
AND R6,R1,R7
OR R8,R1,R9
AND R6,R1,R7
OR R8,R1,R9
s
t
O
DSUB
AND
Reg
A
L
U
DMem Ifetch
Reg
U
O
r
d
e
AND
OR
Reg
A
L
U
DMem Ifetch
Reg
Reg
L
U
DMem Ifetch
Reg
e
r
Reg
A
L
DMem Ifetch
Reg
9/12/2012
40
Control(Branch)Hazards
Cancausegreaterperformancedegradationthan
thedatahazards
O ll l f b h 10% 30% Onestallcycleforeverybranchcauses10%to30%
performanceloss
Depends of branch inst frequency Dependsofbranchinstfrequency
9/12/2012
41
RecallBranches
Taken Branch
IfthebranchchangesthePCtoitstargetaddress
Untaken Branch
If h b h d h h PC l PC 4 IfthebranchdoesnotchangethePC,usualPC+4
Not sure if a branch is taken or not until the end of ID NotsureifabranchistakenornotuntiltheendofID
Resultsinabranchstall
9/12/2012
42

Lect5 Pipelining1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect5 Pipelining1

Uploaded by

Copyright:

Available Formats

Lecture5 Lecture5

You might also like