General Principles of Pipelining: Andrew Warfield CS313
General Principles of Pipelining: Andrew Warfield CS313
General Principles of Pipelining: Andrew Warfield CS313
Principles
of
Pipelining
Andrew
Wareld
CS313
Distrac;ons
News
ash:
If
you
havent
started
Assignment
1
yet,
you
may
be
in
trouble.
Learning
Goals
Now
that
we
understand
how
the
sequen;al
CPU
works,
lets
talk
about
how
to
make
it
go
faster.
This
lecture
will
talk
about
the
basic
ideas
behind
pipelining,
performance
ramica;ons,
and
the
challenges
that
result.
Slides
for
this
unit
are
slightly
modied
versions
of
Bryant
and
OHallarons
Chapter
4
mini-course
at
hSp://www.cs.cmu.edu/afs/
cs/academic/class/15349-s02/www/lectures.html
Mo;va;on
Whats
wrong
with
the
sequen;al
y86?
Pipelined
Parallel
Processor
Eciency
How
can
we
measure
it?
Latency:
Throughput?
Computa;onal
Example
300 ps
Combinational
logic
20 ps
R
Delay = 320 ps
e
Throughput = 3.12 GOPS
g
Clock
System
Computa;on
requires
total
of
300
picoseconds
Addi;onal
20
picoseconds
to
save
result
in
register
Must
have
clock
cycle
of
at
least
320
ps
20 ps
R
e
g
100 ps
20 ps
Comb.
logic
B
R
e
g
100 ps
Comb.
logic
C
20 ps
R
Latency = ?
e
Throughput = ?
g
Clock
System
Pipeline
Diagrams
Unpipelined
OP1
OP2
OP3
Time
3-Way
Pipelined
OP1
A
OP2
OP3
Time
Opera;ng
a
Pipeline
239
241
300
359
Clock
OP1
OP2
OP3
0
120
240
360
C
480
640
Time
100 ps
Comb.
logic
A
20 ps
R
e
g
100 ps
Comb.
logic
B
20 ps
R
e
g
100 ps
Comb.
logic
C
20 ps
R
e
g
Clock
PIPE-
Hardware
Pipeline
registers
hold
intermediate
values
from
instruc;on
execu;on
Forward
(Upward)
Paths
Values
passed
from
one
stage
to
next
Cannot
jump
past
stages
e.g.,
valC
passes
through
decode
50 ps
Comb.
logic
20 ps
R
e
g
150 ps
20 ps
Comb.
logic
B
OP1
OP2
R
e
g
100 ps
Comb.
logic
C
R
Delay = 510 ps
e
Throughput = 5.88 GOPS
g
Clock
OP3
20 ps
Time
R
e
g
Comb.
logic
R
e
g
Comb.
logic
R
e
g
Clock
Comb.
logic
R
e
g
Comb.
logic
R
e
g
Comb.
logic
R
e
g
6.25%
16.67%
28.57%
Pipeline Depths
Instruc;on-Level
Parallelism
Instruc;on-level
parallelism
Sequen;al
Consistency
Programming
languages
like
C,
C++
and
Java
are
based
on
the
sequen;al
consistency
model:
The
eect
of
execu;ng
the
program
must
be
the
same
as
if
instruc;ons
were
executed
one
by
one
in
the
order
they
are
wriSen.
If
people
were
smarter
and
there
was
only
one
CPU
implementa;on,
we
could
go
faster.
Dependencies
Types
of
dependencies:
Data
dependencies:
Causal:
A
B
if
B
reads
a
value
wriSen
by
A.
Output:
A
B
if
B
writes
to
a
loca;on
wriSen
by
A.
Alias
(an;):
A
B
if
B
writes
to
a
loca;on
read
by
A.
Control
dependencies:
Whether
a
branch
is
taken
or
not
taken
( jmp,
jxx,
call,
ret).
When
an
instruc;on
writes
to
instruc;on
memory
(self-
modifying
code).
Data
Dependencies
Combinational
logic
R
e
g
Clock
OP1
OP2
OP3
Time
System
Each
opera;on
depends
on
result
from
preceding
one
Data
Hazards
Comb.
logic
A
R
e
g
OP1
OP2
Comb.
logic
B
R
e
g
Comb.
logic
C
Clock
OP3
OP4
R
e
g
Time
addl %eax ,
%ebx
%edx
Control
Hazards
Condi;onal
branches.
Self
modifying
code.
Learning
Goals
Now
that
we
understand
how
the
sequen;al
CPU
works,
lets
talk
about
how
to
make
it
go
faster.
This
lecture
will
talk
about
the
basic
ideas
behind
pipelining,
performance
ramica;ons,
and
the
challenges
that
result.
Slides
for
this
unit
are
slightly
modied
versions
of
Bryant
and
OHallarons
Chapter
4
mini-course
at
hSp://www.cs.cmu.edu/afs/
cs/academic/class/15349-s02/www/lectures.html