Unit 4
Unit 4
The average time required to reach a storage location in memory and obtain its
contents is called the access time. In electromechanical devices with moving parts
such as disks and tapes, the access time consists of a seek time required to position
the read-write head to a location and a transfer time required to transfer data to or
from the device. Because the seek time is usually much longer than the transfer time,
auxiliary storage is organized in records or blocks. A record is a specified number of
characters or words. Reading or writing is always done on entire records. The transfer
rate is the number of characters or words that the device can transfer per second,
after it has been positioned at the beginning of the record.
Magnetic drums and disks are quite similar in operation. Both consist of high-speed
rotating surfaces coated with a magnetic recording medium. The rotating surface of
the drum is a cylinder and that of the disk, a round flat plate. The recording surface
rotates at uniform speed and is not started or stopped during access operations. Bits
are recorded as magnetic spots on the surface as it passes a stationary mechanism
called a write head. Stored bits are detected by a change in magnetic field produced
by a recorded spot on the surface as it passes through a read head. The amount of
surface available for recording in a disk is greater than in a drum of equal physical
size. Therefore, more information can be stored on a disk than on a drum of
Some units use a single read/write head for each disk surface. In this type of unit,
the track address bits are used by a mechanical assembly to move the head into the
specified track position before reading or writing. In other disk systems, separate
read/write heads are provided for each track in each surface. The address bits can
then select a particular track electronically through a decoder circuit. This type of
unit is more expensive and is found only in very large computer systems.
Physical characteristics - If there is a fixed head disk, then it will contain one read-
write head per track. All of these heads are mounted on a rigid arm, which has the
ability to extend across all tracks. If there is a movable head disk, then it will contain
only one read-write head. Here the head is also mounted on the arm. The head can
position above any track. Due to this purpose, the arm can be retracted or extended.
The disk drive always or permanently contains a non-removable disk. For example, in
the personal computer, the hard disk can never be removed, or we can say that it is a
non-removable disk. The removable disk is a type of disk that can be removed and
replaced with other disks. Both sides of the platter contain the magnetizable coating
for most of the disks, which will also be referred to as the double side. The single side
disks are used in some less expensive disk systems.
This type of mechanism is mostly used in a floppy disk. This type of disk is the least
expensive, small, also contains a flexible platter. The sealed drive assemblies are
almost free of contaminants, and it contains the Winchester's heads. IBM uses the
term Winchester as a code name and it was used for the 3340 disk model prior to its
announcement in IBM. The workstations and personal computers commonly contain
a built-in disk, which is known as Winchester disk. This disk is also referred to as
a hard disk.
On a movable system, there will be a seek time which can be defined as the time taken
to position the head at the track. There will also be a rotation latency or rotation
delay, which can be defined as the time taken from the starting of the sector to reach
the head. The time it takes to get into a position to write or read is known as access
Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 3
time which is equal to the sum of rotational delay and seeks time, if any. Once the
head gets its position, we are able to perform the read or write operation as the sector
moves under the head. This process can be called the data transfer portion of the
operation, and the time taken while transferring the data will be known as the transfer
time.
Magnetic Read and Write Memory - The most important component of external
memory is still magnetic disks. Many systems, such as supercomputers, personal
computers, and mainframes computers, contain both removable and fixed hard disks.
We can conduct a coil named as the head so that we can recover the data on and later
and then retrieve it from the disk. A lot of systems contain two heads that are read
head and write head. While the operation of reading and writing, the platter is rotating
while the head is stationary.
If the electricity is flowing through the coil, the write mechanism will exploit the fact
that the coil will generate a magnetic field. The write head will receive the electric
pulses, and the below surface will record the resulting magnetic pattern. It will be
recorded into different patterns for negative and positive currents. If the electricity is
flowing through the coil, the read mechanism will exploit the fact that it will generate
an electric current in the coil. When the disk's surface passes under the head, it will
produce a current with the same polarity as the already recorded one.
In this case, the structure of head is the same for reading and writing. Therefore, we
can use the same head for both. These types of single heads can be used in older rigid
disk systems and in floppy disk systems. A type of partially shielded magneto-
resistive (MR) sensor consists in the read head. The electric resistance is contained in
the MR material, which depends on the direction of magnetization of the medium
moving under it.
Data Organization and formatting - The head is known as a small device, which is
able to read from or write to the portion of the platter rotating beneath it. The width of
each track is the same as head. We have thousands of tracks per surface. The gaps are
used to show the separation of adjacent tracks. This can prevent or minimize the error
The modern hard disks introduce a technique to increase the density, which is
called Multiple zone recording. Using this technique, the surface is able to divide into
a number of concentric zones, which is typically equal to 16, which means 16 zones.
The number of bits per track is constant within a zone. The zones which are closer to
the centre have fewer amounts of bits or sectors as compared to the zones which are
farther from the centre.
A tape drive can be accessed as a sequential access device. If the current position of
the head is beyond the desired result, we have to rewind the tape at a certain distance
and starting reading forward. During the operation of reading and writing only, the
tape is in motion. The difference between tape and disk drive is that the disk drive can
be referred to as a direct access device. A disk drive is able to get the desired result
without sequentially reading all sectors on a disk. It has to only wait until the
Access time - The access time of a record on a disk includes three components
such as seek time, latency time, and data transfer time.
• Seek time − The time required to arrange the read/write head at the
desired track is called seek time.
• Rotational delay or latency time − The time required to position the
read/write head on a specific sector when the head has already been
placed on the desired track is called rotational delay. The rotational delay
is based on the speed of rotation of the disk. On average the latency will
be half of one revolution time.
• Data transfer time − Data transfer time is the actual time needed to send
the data.
Optical Disk
The optical memory was released in 1982, and Sony and Philips developed it. These
memories perform their operations with the help of light beams, and it also needs
option drive for the operations. We can use optical memory to store backup, audio,
video, and also for caring data. The speed of a flash drive and the hard drive is faster
as compared to the read/write speed.
Optical Disks Working - Optical disks rely on a red or blue laser to record and read
data. Most of today's optical disks are flat, circular and 12 centimeters in diameter.
Data is stored on the disk in the form of microscopic data pits and lands. The pits
are etched into a reflective layer of recording material. The lands are the flat,
unindented areas surrounding the pits.
The type of material selected for the recording material depends on how the disk is
used. Prerecorded disks such as those created for audio and video recordings can use
cheaper material like aluminum foil. Write-once disks and rewritable disks require a
more expensive layer of material to accommodate other types of digital data storage.
Data is written to an optical disk in a radial pattern starting near the center. An
optical disk drive uses a laser beam to read the data from the disk as it is spinning.
It distinguishes between the pits and lands based on how the light reflects off the
recording material. The drive uses the differences in reflectivity to determine the 0 and
1 bits that represent the data.
The optical disk storage system includes a rotating disk coated with a diminished
layer of metal that facilitates a reflective surface and a laser beam, which is used as
a read/write head for recording information onto the disk. Unlike magnetic disk, the
optical layer consists of a single long track in the form of a spiral shape. The spiral
shape of the track produces the optical disk applicable for reading huge blocks of
sequential information onto it, including music.
Types of Optical Disks - There are two types of optical disks which are as follows –
• Compact Disk (CD) − The terminology CD used for audio stands for Compact
Disks. For use in digital computers similar terminology is used. The disks used
for data storage are known as Compact Disk Read-Only Memory (CD-ROM). A
compact disk is a round disk of clear polycarbonate plastic, coated with a
very thin reflective layer of aluminum. During the manufacturing process of
this 4.8 inches disk, pits are created on the surface of the disk. The portions
between these pits are called lands. A typical CD can store data up to 700MB.
Such high storage capacity is only possible due to a very high data density.
The compact disk (CD) and compact disk read-only memory (CD-ROM) contain
one spiral track, beginning from the track's centre and spiral out towards the
outer edge. CD-ROM uses the blocks or sectors to store the data. On the basis
of tracks, the number of sectors varies. The inner tracks of the compact disk
contain fewer sectors, and the outer track of the compact disk contains more
sectors. The length of the sectors at the inner edge and the outer edge of disk is
the same.
When the disk is rotating, the low power laser beam will be used to scan the
sectors at the same rate. There can be a variation in the rotating speed of disk.
If we are trying to access the sectors which are near to the centre of the disk, the
disk will be rotated comparatively faster. If the sectors are present near the outer
edge, the disk will be rotated slower as compared to the sectors near to centre of
the disk.
Types of Compacts Disks - There are three types of CDs which are as follows –
• WORM disks − WORM means write once and read many. The audio CDs that
purchase from the market are WORM disks which are recorded by the company
and can be played many times.
When the alloy is heated and cooled down, then it will show some interesting
behavior. If there is a melting point and the alloy is heated above that point
and cooled down, in this case, it will turn into a state which is known as
the amorphous state, which is used to absorb light. If there is a case in which
alloy is heated at 200o C and that temperature is maintained for a certain period,
then a process known as annealing will occur, and it will turn alloy into the
crystalline state.
The area of non-crystalline and crystalline will be formed with the help of
controlling the temperature of a laser. The crystalline area is used to reflect
the laser, while the non-crystalline is used to absorb it. These differences will
be registered as digital data. We can further use the annealing process to delete
the stored data.
• DVD Disks − The DVD (digital versatile disk) technology was first launched
in 1996. The appearance of the CD (compact disk) and the DVD (digital versatile
disk) has the same. The storage size is the main difference between CD and
DVD. So the storage size of a DVD is much larger than the CD. While designing
DVDs, there are several changes that are done in their design to make the storage
larger.
DVD uses the shorter wavelength of a laser beam to imprint the data than the
CDs laser beam wavelength. With the help of a shorter laser beam wavelength,
the lights are able to focus on a small spot. Pits of CDs are much larger than
the pits of DVDs. The tracks on DVD is placed very close than the tracks on a
CD. By doing all the changes in the design of a DVD, it has a 4.7GB storage size.
We can more increase the storage capability by using the two-sided disk and
two-layered disk.
Two Layered Disk - The first base of the two-layered disk is the same as CD that
means it is also composed of circular plastic. But in this disk, we use translucent
Blu-Ray DVD - A Blu-ray disk is a type of high capacity optical disk medium, which
is used to store a huge amount of data and to record and playback high definition
video. Blu-ray was designed to supersede the DVD. While a CD is able to store 700 MB
of data and a DVD is able to store 4.7 GB of data, a single Blu-ray disk is able to store
up to 25 GB of data. The dual-layer Blu-ray disks can hold 50 GB of data. That amount
of storage is equivalent to 4 hours of HDTV. There is also a double-sided dual-layer
DVD, which is commonly used and able to store 17 GB of data. Blu-ray disk uses the
blue lasers, which help them to hold more information as compared to other optical
media. The laser is actually known as 'blue-violet', but the developer rolls off the tongue
to make 'Blue-violet-ray' a little earlier as 'Blu-ray'. On the basis of standard definition,
a DVD can provide a definition of 720x480 pixels. In contrast, the Blu-ray high
definition contains 1920X1080 pixel resolution.
Reuse Highly reusable and used for random Most of the optical disks are
8
read/write operations. read-only once written.
RAID Architectures
RAID, or “Redundant Arrays of Independent Disks” is a technique which makes
use of a combination of multiple disks instead of using a single disk for increased
performance, data redundancy or both. The term was coined by David Patterson,
Garth A. Gibson, and Randy Katz at the University of California, Berkeley in
1987. RAID disks is a data storage virtualization technology that combines multiple
physical disk drive components into one or more logical units for data redundancy,
performance improvement, or both. It is a way of storing the same data in different
places on multiple hard disks or solid-state drives to protect data in the case of a drive
failure. A RAID system consists of two or more drives working in parallel. These can
be hard discs, but there is a trend to use SSD technology (Solid State Drives).
RAID combines several independent and relatively small disks into single storage of a
large size. The disks included in the array are called array members. The disks can
combine into the array in different ways, which are known as RAID levels. Each of
RAID levels has its own characteristics of:
RAID systems can use with several interfaces, including SATA, SCSI, IDE, or FC (fiber
channel.) Some systems use SATA disks internally but that have a FireWire or SCSI
interface for the host system. Sometimes disks in a storage system are defined
as JBOD, which stands for Just a Bunch of Disks. This means that those disks do not
use a specific RAID level and acts as stand-alone disks. This is often done for drives
that contain swap files or spooling data.
How RAID Works - RAID works by placing data on multiple disks and allowing
input/output operations to overlap in a balanced way, improving performance.
Because various disks increase the mean time between failures (MTBF), storing data
redundantly also increases fault tolerance. RAID arrays appear to the operating
system as a single logical drive. RAID employs the techniques of disk mirroring or
disk striping.
o Disk Mirroring will copy identical data onto more than one drive.
o Disk Striping partitions help spread data over multiple disk drives.
o Disk mirroring and disk striping can also be combined in a RAID array.
In a single-user system where significant records are stored; the stripes are typically
set up to be small (512 bytes) so that a single record spans all the disks and can be
accessed quickly by reading all the disks at the same time. In a multi-user system,
better performance requires a stripe wide enough to hold the typical or maximum
size record, allowing overlapped disk I/O across drives.
Evaluation:
RAID-1 (Mirroring) - It duplicates data across two disks in the array, providing full
redundancy. Both disks are store exactly the same data, at the same time, and at all
times. Data is not lost as long as one disk survives. The total capacity of the array
equals the capacity of the smallest disk in the array. At any given instant, the contents
of both disks in the array are identical. RAID 1 is capable of a much more complicated
configuration. The point of RAID 1 is primarily for redundancy.
If you completely lose a drive, you can still stay up and running off the other drive. If
either drive fails, you can then replace the broken drive with little to no downtime.
RAID 1 also gives you the additional benefit of increased read performance, as data
can read off any of the drives in the array. The downsides are that you will have slightly
higher write latency. Since the data needs to be written to both drives in the array,
you'll only have a single drive's available capacity while needing two drives.
• More than one copy of each block is stored in a separate disk. Thus, every block
has two (or more) copies, lying on
different disks.
• The above figure shows a RAID-1
system with mirroring level 2.
• RAID 0 was unable to tolerate any disk
failure. But RAID 1 is capable of
reliability.
RAID levels 2 and 3 - RAID-2 consists of bit-level stripping using a Hamming Code
parity. RAID-3 consists of byte-level striping with dedicated parity. These two are
less commonly used. RAID-6 is a recent advancement that contains a distributed
double parity, which involves block-level stripping with 2 parity bits instead of just
1 distributed across all the disks. There are also hybrid RAIDs, which make use of
more than one RAID levels nested one after the other, to fulfil specific requirements.
Evaluation:
RAID-5 (Block-Level Stripping with Distributed Parity) - RAID 5 requires the use
of at least three drives. It combines these disks to protect data against loss of any
one disk; the array's storage capacity is reduced by one disk. It strips data across
• This is a slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.
• In the figure, we can notice how the parity bit “rotates”.
• This was introduced to make the
random write performance better.
Evaluation:
Nested RAID levels - Some RAID levels are referred to as nested RAID because they
are based on a combination of RAID levels, such as:
3. RAID 03 (0+3, also known as RAID 53 or RAID 5+3) - This level uses striping
similar to RAID 0 for RAID 3's virtual disk blocks. This offers higher performance than
RAID 3 but at a higher cost.
4. RAID 50 (5+0) - This configuration combines RAID 5 distributed parity with RAID
0 striping to improve RAID 5 performance without reducing data protection.
Non-standard RAID levels - Non-standard RAID levels vary from standard RAID levels,
and they are usually developed by companies or organizations for mainly
proprietary use, such as:
1. RAID 7 - A non-standard RAID level is based on RAID 3 and RAID 4 that adds
caching. It includes a real-time embedded OS as a controller, caching via a high-
speed bus, and other stand-alone computer characteristics.
2. Adaptive RAID - This level enables the RAID controller to decide how to store
the parity on disks. It will choose between RAID 3 and RAID 5, depending on
which RAID set type will perform better with the kind of data being written to the
disks.
3. Linux MD RAID 10 - The Linux kernel provides this level. It supports the
creation of nested and non-standard RAID arrays. Linux software RAID can
also support standard RAID 0, RAID 1, RAID 4, RAID 5, and RAID 6
configurations.
Pipelining
Pipelining is a technique of decomposing a sequential process into suboperations,
with each subprocess being executed in a special dedicated segment that operates
concurrently with all other segments. A pipeline can be visualized as a collection of
processing segments through which binary information flows. Each segment
performs partial processing dictated by the way the task is partitioned. The result
Simplest way of viewing the pipeline structure is to imagine that each segment
consists of an input register followed by a combinational circuit. The register holds
the data and the combinational circuit performs the suboperation in the particular
segment. The output of the combinational circuit in a given segment is applied to the
input register of the next segment. A clock is applied to all registers after enough time
has elapsed to perform all segment activity. In this way the information flows through
the pipeline one step at a time. The pipeline organization will be demonstrated by
means of a simple example. Suppose that we want to perform the combined multiply
and add operations with a stream of numbers. Ai * Bi + Ci for i _ 1, 2, 3, . . . , 7
The five registers are loaded with new data every clock pulse. The effect of each clock
is shown in above table. The first clock pulse transfers A1 and B1 into R1 and R2. The
second clock pulse transfers the product of R1 and R2 into R3 and C1 into R4. The
same clock pulse transfers A2 and B2 into R1 and R2. The third clock pulse operates
on all three segments simultaneously. It places A3 and B3 into R1 and R2, transfers
the product of R1 and R2 into R3, transfers C2 into R4, and places the sum of R3 and
Any operation that can be decomposed into a sequence of suboperations of about the
same complexity can be implemented by a pipeline processor. The technique is
efficient for those applications that need to repeat the same task many times with
different sets of data. The behaviour of a pipeline can be illustrated with a space-
time diagram. This is a diagram that shows the segment utilization as a function of
time.
The diagram shows six tasks T1 through T6 executed in four segments. Initially, task
T1 is handled by segment 1. After the first clock, segment 2 is busy with T1, while
segment 1 is busy with task T2. Continuing in this manner, the first task T1 is
completed after the fourth clock cycle. From then on, the pipe completes a task every
clock cycle. No matter how many segments there are in the system, once the pipeline
is full, it takes only one clock period to obtain an output.
The first task T1 requires a time equal to ktp to complete its operation since there
are k segments in the pipe. The remaining n - 1 tasks emerge from the pipe at the
rate of one task per clock cycle and they will be completed after a time equal to (n -
1)tp. Therefore, to complete n tasks using a k-segment pipeline requires k + (n - 1)
clock cycles
Arithmetic Pipeline - Pipeline arithmetic units are usually found in very high-speed
computers. They are used to implement floating-point operations, multiplication of
fixed-point numbers, and similar computations encountered in scientific problems. A
pipeline multiplier is essentially an array multiplier, with special adders designed to
minimize the carry propagation time through the partial products. Floating-point
operations are easily decomposed into suboperations. Example of a pipeline unit for
floating-point addition and subtraction. The inputs to the floating-point adder pipeline
are two normalized floating-point binary numbers.
X = A x 2a and Y = B x 2b
A and B are two fractions that represent the mantissas and a and b are the
exponents. The floating-point addition and subtraction can be performed in four
segments. The registers R are placed between the segments to store intermediate
results. The suboperations that are performed in the four segments are:
The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result. The exponent difference
determines how many times the mantissa associated with the smaller exponent must
be shifted to the right. This produces an alignment of the two mantissas. It should be
noted that the shift must be designed as a combinational circuit to reduce the shift
time. The two mantissas are added or subtracted in segment 3. The result is normalized
in segment 4. When an overflow occurs, the mantissa of the sum or difference is shifted
right and the exponent incremented by one. If an underflow occurs, the number of
leading zeros in the mantissa determines the number of left shifts in the mantissa and
the number that must be subtracted from the exponent.
Instruction Pipeline - Pipeline processing can occur not only in the data stream
but in the instruction stream as well. An instruction pipeline reads consecutive
instructions from memory while previous instructions are being executed in other
segments. This causes the instruction fetch and execute phases to overlap and
perform simultaneous operations. One possible digression associated with such a
scheme is that an instruction may cause a branch out of sequence. In that case the
pipeline must be emptied and all the instructions that have been read from memory
after the branch instruction must be discarded.
There are certain difficulties that will prevent the instruction pipeline from operating
at its maximum rate. Different segments may take different times to operate on the
incoming information. Some segments are skipped for certain operations. For example,
a register mode instruction does not need an effective address calculation. Two or more
segments may require memory access at the same time, causing one segment to wait
until another is finished with the memory. Memory access conflicts are sometimes
resolved by using two memory buses for accessing instructions and data in separate
modules. In this way, an instruction word and a data word can be read simultaneously
from two different modules. The design of an instruction pipeline will be most efficient
if the instruction cycle is divided into segments of equal duration. The time that each
step takes to fulfill its function depends on the instruction and the way it is executed.
Data Hazards
Data Hazards occur when an instruction depends on the result of previous
instruction and that result of instruction has not yet been computed. whenever
two different instructions use the same storage. the location must appear as if it is
executed in sequential order. Alternatively, we can say that when the execution of an
instruction is dependent on the results of a prior instruction that’s still being processed
in a pipeline, data hazards occur. Consider the following scenario.
The result of the ADD instruction is written into the register X3 at t5 in the example
above. If bubbles aren’t used to postpone the following SUB instruction, all the three
operations will use data from X3, that is earlier in the ADD process.
Data hazards are divided into four types according to the order in which READ or
WRITE operations are performed on the register: There are four types of data
dependencies: Read after Write (RAW), Write after Read (WAR), Write after Write
(WAW), and Read after Read (RAR). These are explained as follows below.
4. Read after Read (RAR) : It occurs when the instruction both read from the
same register. Since reading a register value does not change the register
value, these Read after Read (RAR) hazards don’t cause a problem for the
processor. For example,
Handling Data Hazards : These are various methods we use to handle hazards:
Forwarding, Code recording, and Stall insertion. These are explained as follows
below.
3. Stall Insertion: it inserts one or more installs (no-op instructions) into the
pipeline, which delays the execution of the current instruction until the
required operand is written to the register file, but this method decreases
pipeline efficiency and throughput.
The following are some of the probable solutions of problem discussed in top:
Solution 2: When generating executable code, the compiler can recognise data
dependencies and reorganise the instructions appropriately. This will make the
device easier to use. If the reordering described above is not possible, the compiler
can detect and insert a no operation (or NOP) instruction(s). NOP refers to a software-
generated dummy instruction equivalent bubble. During the code optimization stage
of the compilation process, the compiler examines data dependencies.
Solution 3: At the IF stage of the SUB instruction, add three bubbles. This will make
it easier for SUB – ID to work at t6. As a result, all subsequent instructions in the
pipe are similarly delayed.
Example 2: Consider
the following sequence
of instructions:
Branch Hazards
As soon as we branch to a new instruction, all the instructions that are in the pipeline
behind the branch become invalid!
lw R5, (400)R14
beq R3, R2,100
add R7, R8, R9
.. +100 sub R7, R8, R9
Either the add or sub instruction after the beq will be executed depending on the
contents of regs 2 and 3. We can include extra hardware to calculate the branch offset
lw R5, (400)R14
beq R3, R2, 100
can be re-ordered:
Instructional
hazards
Pipeline execution of
instructions will reduce
the time and improves
the performance.
Whenever this stream is
interrupted, the pipeline
stalls. A branch instruction may also cause the pipeline to stall. The effect of branch
instructions and the techniques that can be used for mitigating their impact are
discussed with unconditional branches and conditional branches. Boards are designed
to control the flow of data between registers and multiple arithmetic units in the
presence of conflicts caused by hardware resource limitations (structural hazards) and
by dependencies between instructions (data hazards). Data hazards can be classified
as flow dependencies (Read-After-Write), output dependencies (Write-After-Write)
and anti-dependencies (Write-After-Read).
Structural Hazards - Structural hazards occur when two or more instructions try
to access the same hardware resource (e.g. two instructions try to write their results
to registers in the same clock period). This would occur, for example, if an instruction
were to be issued to an arithmetic unit which takes three clocks periods to execute its
operation in the clock period immediately following the issue of a previous instruction
to a different arithmetic unit which takes four clock periods to execute.
Unconditional Branches: A
sequence of instructions being
executed in a two-stage
pipeline is shown in Figure.
Instructions I1 to I3 are stored
at successive memory
addresses, and I2 is a branch
instruction.
(a) (b)
Figure Branch timing
Either a cache miss or a branch instruction stalls the pipeline for one or more clock
cycles. To reduce the effect of these interruptions, many processors employ
sophisticated fetch units that can fetch instructions before they are needed and put
them in a queue. Typically, the
instruction queue can store several
instructions.
To be effective, the fetch unit must have sufficient decoding and processing capability
to recognize and execute branch instructions. It attempts to keep the instruction queue
filled at all times to reduce the impact of occasional delays when fetching instructions.
If there is a delay in fetching instructions because of a branch or a cache miss, the
dispatch unit continues to issue instructions from the instruction queue. The fetch
unit continues to fetch instructions and add them to the queue.
A technique called delayed branching can minimize the penalty incurred as a result of
conditional branch instructions. The idea is simple. The instructions in the delay slots
are always fetched. Therefore, we would like to arrange for them to be fully executed
whether or not the branch is taken. The objective is to be able to place useful
instructions in these slots. If no useful instructions can be placed in the delay slots,
these slots must be filled with NOP instructions.
LOOP Shift_Left R1
Decrement R2
Branch=O LOOP
NEXT Add R1,R3
a counter to determine the number of times the contents of register Rl are shifted left.
For a processor with one delay slot, the instructions can be reordered as
The shift instruction is fetched while the branch instruction is being executed. After
The effectiveness of the delayed branch approach depends on how often it is possible
to reorder instructions like,
LOOP Decrement R2
Branch=O LOOP
Shift_Left R1
NEXT Add R1,R3
II Branching Prediction (Static): Another technique for reducing the branch penalty
associated with conditional branches is to attempt to predict whether or not a
particular branch will be taken. The simplest form of branch prediction is to assume
that the branch will not take place and to continue to fetch instructions in sequential
address order. Until the branch condition is evaluated, instruction execution along the
predicted path must be done on a speculative basis.
Speculative execution means that instructions are executed before the processor is
certain that they are in the correct execution sequence. Hence, care must be taken that
no processor registers or memory locations are updated until it is confirmed that these
instructions should indeed be executed. If the branch decision indicates otherwise, the
A decision on which way to predict the result of the branch may be made in hardware
by observing whether the target address of the branch is lower than or higher than the
address of the branch instruction. A more flexible approach is to have the Compiler
decide whether a given branch instruction should be predicted taken or not taken. The
branch instructions of some processors, such as SPARC, include a branch prediction
bit, which is set to 0 or 1 by the compiler to indicate the desired behavior. The
instruction fetch unit checks this bit to predict whether the branch will be taken or not
taken. The branch prediction decision is always the same every time a given instruction
is executed. Any approach that has this characteristic is called static branch
prediction.