Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

UNIT 4

Syllabus: External storage systems and Pipelining: Organization and structure of


disk drives: Electronic- magnetic and optical technologies, RAID Architectures.
Pipelining – Data Hazards – Instructional hazards – Performance, Case Study on RAID
architectures used in Industry

Organization and structure of disk drives: Electronic


Magnetic and Optical Technologies
Auxiliary Memory - The most common auxiliary memory devices used in
computer systems are magnetic disks and tapes. Other components used, but not
as frequently, are magnetic drums, magnetic bubble memory, and optical disks.
To understand fully the physical mechanism of auxiliary memory devices one must
have a knowledge of magnetic, electronics, and electromechanical systems.
Although the physical properties of these storage devices can be quite complex, their
logical properties can be characterized and compared by a few parameters. The
important characteristics of any device are its access mode, access time, transfer
rate, capacity, and cost.

The average time required to reach a storage location in memory and obtain its
contents is called the access time. In electromechanical devices with moving parts
such as disks and tapes, the access time consists of a seek time required to position
the read-write head to a location and a transfer time required to transfer data to or
from the device. Because the seek time is usually much longer than the transfer time,
auxiliary storage is organized in records or blocks. A record is a specified number of
characters or words. Reading or writing is always done on entire records. The transfer
rate is the number of characters or words that the device can transfer per second,
after it has been positioned at the beginning of the record.

Magnetic drums and disks are quite similar in operation. Both consist of high-speed
rotating surfaces coated with a magnetic recording medium. The rotating surface of
the drum is a cylinder and that of the disk, a round flat plate. The recording surface
rotates at uniform speed and is not started or stopped during access operations. Bits
are recorded as magnetic spots on the surface as it passes a stationary mechanism
called a write head. Stored bits are detected by a change in magnetic field produced
by a recorded spot on the surface as it passes through a read head. The amount of
surface available for recording in a disk is greater than in a drum of equal physical
size. Therefore, more information can be stored on a disk than on a drum of

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 1


comparable size. For this reason, disks have replaced drums in more recent
computers.

Magnetic Disks - A magnetic disk is a circular plate constructed of metal or plastic


coated with magnetized material. Often both sides of the disk are used and several
disks may be stacked on one spindle with read/write heads available on each surface.
All disks rotate together at high speed and are not stopped or started for access
purposes. Bits are stored in the magnetized surface in spots along concentric circles
called tracks. The tracks are commonly divided into sections called sectors. In most
systems, the minimum quantity of information which can be transferred is a sector.
The subdivision of one disk surface into tracks and sectors is shown in Fig 6.

arm assembly rotation


Figure 6 Magnetic Disk

Some units use a single read/write head for each disk surface. In this type of unit,
the track address bits are used by a mechanical assembly to move the head into the
specified track position before reading or writing. In other disk systems, separate
read/write heads are provided for each track in each surface. The address bits can
then select a particular track electronically through a decoder circuit. This type of
unit is more expensive and is found only in very large computer systems.

Physical characteristics - If there is a fixed head disk, then it will contain one read-
write head per track. All of these heads are mounted on a rigid arm, which has the
ability to extend across all tracks. If there is a movable head disk, then it will contain
only one read-write head. Here the head is also mounted on the arm. The head can
position above any track. Due to this purpose, the arm can be retracted or extended.

The disk drive always or permanently contains a non-removable disk. For example, in
the personal computer, the hard disk can never be removed, or we can say that it is a
non-removable disk. The removable disk is a type of disk that can be removed and
replaced with other disks. Both sides of the platter contain the magnetizable coating
for most of the disks, which will also be referred to as the double side. The single side
disks are used in some less expensive disk systems.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 2


A movable head is employed by
the multiple platter disks with
one head of read-write per platter
surface. Form the centre of the
disk, all the heads contain the
same distance and move together
because all the heads are mechanically fixed. In the platter, a set of all tracks in the
same relative position will be known as a cylinder.

This type of mechanism is mostly used in a floppy disk. This type of disk is the least
expensive, small, also contains a flexible platter. The sealed drive assemblies are
almost free of contaminants, and it contains the Winchester's heads. IBM uses the
term Winchester as a code name and it was used for the 3340 disk model prior to its
announcement in IBM. The workstations and personal computers commonly contain
a built-in disk, which is known as Winchester disk. This disk is also referred to as
a hard disk.

On a movable system, there will be a seek time which can be defined as the time taken
to position the head at the track. There will also be a rotation latency or rotation
delay, which can be defined as the time taken from the starting of the sector to reach
the head. The time it takes to get into a position to write or read is known as access
Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 3
time which is equal to the sum of rotational delay and seeks time, if any. Once the
head gets its position, we are able to perform the read or write operation as the sector
moves under the head. This process can be called the data transfer portion of the
operation, and the time taken while transferring the data will be known as the transfer
time.

Magnetic Read and Write Memory - The most important component of external
memory is still magnetic disks. Many systems, such as supercomputers, personal
computers, and mainframes computers, contain both removable and fixed hard disks.
We can conduct a coil named as the head so that we can recover the data on and later
and then retrieve it from the disk. A lot of systems contain two heads that are read
head and write head. While the operation of reading and writing, the platter is rotating
while the head is stationary.

If the electricity is flowing through the coil, the write mechanism will exploit the fact
that the coil will generate a magnetic field. The write head will receive the electric
pulses, and the below surface will record the resulting magnetic pattern. It will be
recorded into different patterns for negative and positive currents. If the electricity is
flowing through the coil, the read mechanism will exploit the fact that it will generate
an electric current in the coil. When the disk's surface passes under the head, it will
produce a current with the same polarity as the already recorded one.

In this case, the structure of head is the same for reading and writing. Therefore, we
can use the same head for both. These types of single heads can be used in older rigid
disk systems and in floppy disk systems. A type of partially shielded magneto-
resistive (MR) sensor consists in the read head. The electric resistance is contained in
the MR material, which depends on the direction of magnetization of the medium
moving under it.

Data Organization and formatting - The head is known as a small device, which is
able to read from or write to the portion of the platter rotating beneath it. The width of
each track is the same as head. We have thousands of tracks per surface. The gaps are
used to show the separation of adjacent tracks. This can prevent or minimize the error

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 4


which is generated because of the interference of magnetic fields or misalignment of
the head. The sectors are used to transfer the data from and to the disks.

The fixed-length sectors will be used


in the most contemporary systems
with 512 bytes, which is nearly a
universal sector size. Intersector
gaps separate the adjacent sectors so
that we can avoid imposing
unreasonable precision
requirements on the systems. At the
same rate, we can scan the
information with the help of rotating
the disk at a fixed speed, which is
called constant angular velocity
(CAV).

There are various things in which


disks can be divided. So it can divide
into a series of concentric tracks and
into a number of pie-shaped sectors.

The CAV has an advantage in that the


tracks and sectors are able to directly address the data with the help of CAV. The CAV
also has a disadvantage in that the amount of data that is stored on the short inner
tracks and the long outer tracks are the same.

The modern hard disks introduce a technique to increase the density, which is
called Multiple zone recording. Using this technique, the surface is able to divide into
a number of concentric zones, which is typically equal to 16, which means 16 zones.
The number of bits per track is constant within a zone. The zones which are closer to
the centre have fewer amounts of bits or sectors as compared to the zones which are
farther from the centre.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 5


Magnetic Tape - A magnetic tape transport consists of the electrical, mechanical,
and electronic components to provide the parts and control mechanism for a
magnetic-tape unit. The tape itself is a strip of plastic coated with a magnetic recording
medium. Bits are recorded as magnetic spots on the tape along several tracks.
Usually, seven or nine bits are recorded simultaneously to form a character together
with a parity bit. Read/write heads are mounted one in each track so that data can
be recorded and read as a sequence of characters.

Magnetic tape units can be stopped, started to move forward or in reverse, or


can be rewound. However, they cannot be started or stopped fast enough between
individual characters. For this reason, information is recorded in blocks referred to
as records. Reading and writing techniques in the tape system is the same as the disk
system. In this, the medium is flexible polyester tape coated with a magnetizable
material. The tape's data can be structured as a number of parallel tracks that will be
run lengthwise. In this form, the recording of data can be called a parallel
recording. Instead of the parallel recording, most of the modern system uses serial
recording. The serial recording uses the sequence of bits along with each track to lay
of the data. It is done with the help of a magnetic disk. In the serial recoding, the disk
contains the physical record on the tape, which can be described as the data which
are read and write in the
contiguous blocks.

The gaps are used to


separate the blocks on the
tape, which can also be
known as inter-record
gaps. With the disk, we
format the tape so that we
can assist in locating
physical records. When the
data are being recorded in
the techniques of serial tape,
we record the first set of bits
along with the whole tape's
length. When we reach the
end of a tape, the head will
be repositioned so that they
can record a new track. This
time, the tape will follow the
opposite direction to again
record its whole length. This process will be continued until the tape is full.

A tape drive can be accessed as a sequential access device. If the current position of
the head is beyond the desired result, we have to rewind the tape at a certain distance
and starting reading forward. During the operation of reading and writing only, the
tape is in motion. The difference between tape and disk drive is that the disk drive can
be referred to as a direct access device. A disk drive is able to get the desired result
without sequentially reading all sectors on a disk. It has to only wait until the

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 6


intervening sectors have arrived within one track. After that, it is able to successive
access to any track. The magnetic tape can also be known as a type of second memory.
It can also be used as the slowest speed and lowest cost member of the memory
hierarchy. There is also a linear tape technology, which is a type of cartridge system.
It was developed in late the 1990s.

Access time - The access time of a record on a disk includes three components
such as seek time, latency time, and data transfer time.

• Seek time − The time required to arrange the read/write head at the
desired track is called seek time.
• Rotational delay or latency time − The time required to position the
read/write head on a specific sector when the head has already been
placed on the desired track is called rotational delay. The rotational delay
is based on the speed of rotation of the disk. On average the latency will
be half of one revolution time.
• Data transfer time − Data transfer time is the actual time needed to send
the data.

Advantages of Magnetic Disk

• Access time − With a magnetic disk, it is achievable to access a record


explicitly. Therefore access time is less in this case.
• Flexibility − Magnetic disk has to be the flexibility of being used as a
sequential as well as direct access storage device.
• Transmission Speed − The rate of data transfer is fast in a magnetic disk.
• Reusable − It can remove a specific data and save another data at the
same place.
• Storage Capacity − It can store a very large amount of data.

Disadvantages of Magnetic Disk

• Cost − The cost of per character storage is much higher as compared to


magnetic tape.
• Non-Portability − Portability of it is very less as compared to magnetic
tape.
• Limited size record − Duration of record which can be saved on it is
limited by the size of disk track or disk sector.
• Non-human readable − Data stored on it is not in human-readable form,
therefore manual encoding is not possible at all.

Optical Disk
The optical memory was released in 1982, and Sony and Philips developed it. These
memories perform their operations with the help of light beams, and it also needs
option drive for the operations. We can use optical memory to store backup, audio,
video, and also for caring data. The speed of a flash drive and the hard drive is faster
as compared to the read/write speed.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 7


An optical disk is an electronic data storage medium that can be written to and
read from using a low-powered laser beam. Most of today's optical disks are available
in three formats: compact disks (CDs), digital versatile disks (DVDs) -- also referred to
as digital video disks -- and Blu-ray disks, which provide the highest capacities and
data transfer rates of the three.

Optical Disks Working - Optical disks rely on a red or blue laser to record and read
data. Most of today's optical disks are flat, circular and 12 centimeters in diameter.
Data is stored on the disk in the form of microscopic data pits and lands. The pits
are etched into a reflective layer of recording material. The lands are the flat,
unindented areas surrounding the pits.

The type of material selected for the recording material depends on how the disk is
used. Prerecorded disks such as those created for audio and video recordings can use
cheaper material like aluminum foil. Write-once disks and rewritable disks require a
more expensive layer of material to accommodate other types of digital data storage.
Data is written to an optical disk in a radial pattern starting near the center. An
optical disk drive uses a laser beam to read the data from the disk as it is spinning.
It distinguishes between the pits and lands based on how the light reflects off the
recording material. The drive uses the differences in reflectivity to determine the 0 and
1 bits that represent the data.

The optical disk storage system includes a rotating disk coated with a diminished
layer of metal that facilitates a reflective surface and a laser beam, which is used as
a read/write head for recording information onto the disk. Unlike magnetic disk, the
optical layer consists of a single long track in the form of a spiral shape. The spiral
shape of the track produces the optical disk applicable for reading huge blocks of
sequential information onto it, including music.

Types of Optical Disks - There are two types of optical disks which are as follows –

• Compact Disk (CD) − The terminology CD used for audio stands for Compact
Disks. For use in digital computers similar terminology is used. The disks used
for data storage are known as Compact Disk Read-Only Memory (CD-ROM). A
compact disk is a round disk of clear polycarbonate plastic, coated with a
very thin reflective layer of aluminum. During the manufacturing process of
this 4.8 inches disk, pits are created on the surface of the disk. The portions
between these pits are called lands. A typical CD can store data up to 700MB.
Such high storage capacity is only possible due to a very high data density.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 8


If there is some error in the audio and video appliance, it will ignore that error,
and that error does not reflect in the produced video and audio. But if the
computer data contains any error, then CD-ROM will not tolerate it, and that
error will reflect in the produced data. At the time of indenting pills on the
compact disks, it is impossible to prevent physical imperfection. So in order
to detect and correct the error, we have to add some extra bits.

The compact disk (CD) and compact disk read-only memory (CD-ROM) contain
one spiral track, beginning from the track's centre and spiral out towards the
outer edge. CD-ROM uses the blocks or sectors to store the data. On the basis
of tracks, the number of sectors varies. The inner tracks of the compact disk
contain fewer sectors, and the outer track of the compact disk contains more
sectors. The length of the sectors at the inner edge and the outer edge of disk is
the same.

When the disk is rotating, the low power laser beam will be used to scan the
sectors at the same rate. There can be a variation in the rotating speed of disk.
If we are trying to access the sectors which are near to the centre of the disk, the
disk will be rotated comparatively faster. If the sectors are present near the outer
edge, the disk will be rotated slower as compared to the sectors near to centre of
the disk.

Types of Compacts Disks - There are three types of CDs which are as follows –

• WORM disks − WORM means write once and read many. The audio CDs that
purchase from the market are WORM disks which are recorded by the company
and can be played many times.

• CD-Recordable − CD-R is also known as CD-Recordable. It is a type of write


only once read many, or we can say that it allows single time recording on a
disk. It is used in these types of applications that require one or a small number

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 9


of copies of a set of data. CD recordable composed of polycarbonate plastic
substrate, coating of thin reflective metal, and a protective outer coating.
Between the metal layers and the polycarbonate, there is a layer of
organic polymer dye, which serves as a recording medium. With the help of
dye, the reflectivity can be changed. When there is exposure to a specific
frequency of light, the dye will be permanently transformed. The high-intensity
laser is used to activate the dye. In the dye, marks are created by the laser that
is used to mimic the reflective properties of lands (highest area) and pills (lower
area) of the traditional CD.

• CD-Rewritable − CD-RW is also known as CD-Rewritable. It is a type of


compact disk format which allow us to repeatedly recording on a disk. CD
rewritable and CD recordable both are composed of the same material. So it is
also composed of polycarbonate plastic substrate, coating of thin reflective
metal, and a protective outer coating. The dye will be replaced by an alloy in
the CD-RW.

When the alloy is heated and cooled down, then it will show some interesting
behavior. If there is a melting point and the alloy is heated above that point
and cooled down, in this case, it will turn into a state which is known as
the amorphous state, which is used to absorb light. If there is a case in which
alloy is heated at 200o C and that temperature is maintained for a certain period,
then a process known as annealing will occur, and it will turn alloy into the
crystalline state.

The area of non-crystalline and crystalline will be formed with the help of
controlling the temperature of a laser. The crystalline area is used to reflect
the laser, while the non-crystalline is used to absorb it. These differences will
be registered as digital data. We can further use the annealing process to delete
the stored data.

• DVD Disks − The DVD (digital versatile disk) technology was first launched
in 1996. The appearance of the CD (compact disk) and the DVD (digital versatile
disk) has the same. The storage size is the main difference between CD and
DVD. So the storage size of a DVD is much larger than the CD. While designing
DVDs, there are several changes that are done in their design to make the storage
larger.

DVD uses the shorter wavelength of a laser beam to imprint the data than the
CDs laser beam wavelength. With the help of a shorter laser beam wavelength,
the lights are able to focus on a small spot. Pits of CDs are much larger than
the pits of DVDs. The tracks on DVD is placed very close than the tracks on a
CD. By doing all the changes in the design of a DVD, it has a 4.7GB storage size.
We can more increase the storage capability by using the two-sided disk and
two-layered disk.

Two Layered Disk - The first base of the two-layered disk is the same as CD that
means it is also composed of circular plastic. But in this disk, we use translucent

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 10


material rather than aluminum to cover the lands and pits of the first base. This
material is able to solve the purpose of a reflector. The program of a translucent layer
is doing in a way that it can store the data with the help of indenting pits onto it. The
second layer of lands and pits contains the reflective material. In order to retrieve the
binary pattern, when the laser beam is focused on the first layer, then sufficient light
will be reflected by the translucent material, which will be captured by the detector.
After that, the second layer will reflect a small light, and that light is a noise. That's
why it will be cancelled by the detector. Similarly, the focus of a laser is on the second
layer and wants to read it, the first layer will reflect a small light, and that light will be
cancelled with the help of detector.

Two-Sided Disk - In a two-sided disk, the implementation of tracks will be applied


on both sides of the DVDs. This structure is also known as two single-sided disks.
These disks will be put together so that they can form a sandwich. But the topmost
disk will be turned upside down.

Blu-Ray DVD - A Blu-ray disk is a type of high capacity optical disk medium, which
is used to store a huge amount of data and to record and playback high definition
video. Blu-ray was designed to supersede the DVD. While a CD is able to store 700 MB
of data and a DVD is able to store 4.7 GB of data, a single Blu-ray disk is able to store
up to 25 GB of data. The dual-layer Blu-ray disks can hold 50 GB of data. That amount
of storage is equivalent to 4 hours of HDTV. There is also a double-sided dual-layer
DVD, which is commonly used and able to store 17 GB of data. Blu-ray disk uses the
blue lasers, which help them to hold more information as compared to other optical
media. The laser is actually known as 'blue-violet', but the developer rolls off the tongue
to make 'Blue-violet-ray' a little earlier as 'Blu-ray'. On the basis of standard definition,
a DVD can provide a definition of 720x480 pixels. In contrast, the Blu-ray high
definition contains 1920X1080 pixel resolution.

Sr. Key Magnetic Disk Optical Disk


No.

1 Media Type Multiple Fixed Type Single Removable Disk

Signal To Intermediate S/N ratio Excellent S/N ratio


2
Noise Ratio

Sampling Low High


3
Rate

Usage Used where random access is Used where a regular data


4
needed. streaming is needed.

Track Circular Spiral


5
Structure

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 11


Sr. Key Magnetic Disk Optical Disk
No.

6 Data Access Random Access. Sequential Access.

Examples Hard Disk, Floppy Disk, Magnetic CD, DVD, Blu-ray.


7
Tape.

Reuse Highly reusable and used for random Most of the optical disks are
8
read/write operations. read-only once written.

9 Cost Costly per MB Cheaper per MB

RAID Architectures
RAID, or “Redundant Arrays of Independent Disks” is a technique which makes
use of a combination of multiple disks instead of using a single disk for increased
performance, data redundancy or both. The term was coined by David Patterson,
Garth A. Gibson, and Randy Katz at the University of California, Berkeley in
1987. RAID disks is a data storage virtualization technology that combines multiple
physical disk drive components into one or more logical units for data redundancy,
performance improvement, or both. It is a way of storing the same data in different
places on multiple hard disks or solid-state drives to protect data in the case of a drive
failure. A RAID system consists of two or more drives working in parallel. These can
be hard discs, but there is a trend to use SSD technology (Solid State Drives).

RAID combines several independent and relatively small disks into single storage of a
large size. The disks included in the array are called array members. The disks can
combine into the array in different ways, which are known as RAID levels. Each of
RAID levels has its own characteristics of:

o Fault-tolerance is the ability to survive one or several disk failures.


o Performance shows the change in the read and writes speed of the entire array
compared to a single disk.
o The array's capacity is determined by the amount of user data written to the
array. The array capacity depends on the RAID level and does not always match
the sum of the RAID member disks' sizes.

RAID systems can use with several interfaces, including SATA, SCSI, IDE, or FC (fiber
channel.) Some systems use SATA disks internally but that have a FireWire or SCSI
interface for the host system. Sometimes disks in a storage system are defined
as JBOD, which stands for Just a Bunch of Disks. This means that those disks do not
use a specific RAID level and acts as stand-alone disks. This is often done for drives
that contain swap files or spooling data.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 12


Why data redundancy? - Data redundancy, although taking up extra space, adds
to disk reliability. This means, in case of disk failure, if the same data is also backed
up onto another disk, we can retrieve the data and go on with the operation. On the
other hand, if the data is spread across just multiple disks without the RAID
technique, the loss of a single disk can affect the entire data.

How RAID Works - RAID works by placing data on multiple disks and allowing
input/output operations to overlap in a balanced way, improving performance.
Because various disks increase the mean time between failures (MTBF), storing data
redundantly also increases fault tolerance. RAID arrays appear to the operating
system as a single logical drive. RAID employs the techniques of disk mirroring or
disk striping.

o Disk Mirroring will copy identical data onto more than one drive.
o Disk Striping partitions help spread data over multiple disk drives.
o Disk mirroring and disk striping can also be combined in a RAID array.

In a single-user system where significant records are stored; the stripes are typically
set up to be small (512 bytes) so that a single record spans all the disks and can be
accessed quickly by reading all the disks at the same time. In a multi-user system,
better performance requires a stripe wide enough to hold the typical or maximum
size record, allowing overlapped disk I/O across drives.

Key evaluation points for a RAID System

• Reliability: How many disk faults can the system tolerate?


• Availability: What fraction of the total session time is a system in uptime
mode, i.e. how available is the system for actual use?
• Performance: How good is the response time? How high is the throughput
(rate of processing work)? Note that performance contains a lot of parameters
and not just the two.
• Capacity: Given a set of N disks each with B blocks, how much useful
capacity is available to the user?

Different RAID levels


RAID-0 (Stripping) - RAID 0 is taking any
number of disks and merging them into one
large volume. It will increase speeds as you're
reading and writing from multiple disks at a
time. But all data on all disks is lost if any one
disk fails. An individual file can then use the
speed and capacity of all the drives of the
array. The downside to RAID 0, though, is that
it is NOT redundant. The loss of any individual
disk will cause complete data loss. This RAID
type is very much less reliable than having a
single disk. There is rarely a situation where

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 13


you should use RAID 0 in a server environment.
You can use it for cache or other purposes
where speed is essential, and reliability or data
loss does not matter at all.

• Blocks are “stripped” across disks.


• In the figure, blocks “0,1,2,3” form a
stripe.
• Instead of placing just one block into a
disk at a time, we can work with two (or more) blocks placed into a disk before
moving on to the next one.

Evaluation:

• Reliability: 0 - There is no duplication of data. Hence, a block once lost cannot


be recovered.
• Capacity: N*B - The entire space is being used to store data. Since there is no
duplication, N disks each having B blocks are fully utilized.

RAID-1 (Mirroring) - It duplicates data across two disks in the array, providing full
redundancy. Both disks are store exactly the same data, at the same time, and at all
times. Data is not lost as long as one disk survives. The total capacity of the array
equals the capacity of the smallest disk in the array. At any given instant, the contents
of both disks in the array are identical. RAID 1 is capable of a much more complicated
configuration. The point of RAID 1 is primarily for redundancy.

If you completely lose a drive, you can still stay up and running off the other drive. If
either drive fails, you can then replace the broken drive with little to no downtime.
RAID 1 also gives you the additional benefit of increased read performance, as data
can read off any of the drives in the array. The downsides are that you will have slightly
higher write latency. Since the data needs to be written to both drives in the array,
you'll only have a single drive's available capacity while needing two drives.

• More than one copy of each block is stored in a separate disk. Thus, every block
has two (or more) copies, lying on
different disks.
• The above figure shows a RAID-1
system with mirroring level 2.
• RAID 0 was unable to tolerate any disk
failure. But RAID 1 is capable of
reliability.

Evaluation: Assume a RAID system with


mirroring level 2.

• Reliability: 1 to N/2 - 1 disk failure


can be handled for certain, because
blocks of that disk would have

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 14


duplicates on some other disk. If we are
lucky enough and disks 0 and 2 fail, then
again this can be handled as the blocks
of these disks have duplicates on disks 1
and 3. So, in the best case, N/2 disk
failures can be handled.
• Capacity: N*B/2 - Only half the space is
being used to store data. The other half is
just a mirror to the already stored data.

RAID levels 2 and 3 - RAID-2 consists of bit-level stripping using a Hamming Code
parity. RAID-3 consists of byte-level striping with dedicated parity. These two are
less commonly used. RAID-6 is a recent advancement that contains a distributed
double parity, which involves block-level stripping with 2 parity bits instead of just
1 distributed across all the disks. There are also hybrid RAIDs, which make use of
more than one RAID levels nested one after the other, to fulfil specific requirements.

RAID-4 (Block-Level Stripping with Dedicated


Parity) -

• Instead of duplicating data, this adopts a


parity-based approach.
• One column (disk) dedicated to
parity.
• Parity is calculated using a simple
XOR function. If the data bits are
0,0,0,1 the parity bit is XOR (0,0,0,1)
= 1. If the data bits are 0,1,1,0 the
parity bit is XOR (0,1,1,0) = 0. A
simple approach is that even number
of one’s results in parity 0, and an odd
number of ones results in parity 1.
• Assume that in the above figure, C3 is lost due to some disk failure. Then, we
can recompute the data bit stored in C3 by looking at the values of all the other
columns and the parity bit. This allows us to recover lost data.

Evaluation:

• Reliability: 1 - RAID-4 allows recovery of at most 1 disk failure (because of the


way parity works). If more than one disk fails, there is no way to recover the
data.
• Capacity: (N-1)*B - One disk in the system is reserved for storing the parity.
Hence, (N-1) disks are made available for data storage, each disk having B
blocks.

RAID-5 (Block-Level Stripping with Distributed Parity) - RAID 5 requires the use
of at least three drives. It combines these disks to protect data against loss of any
one disk; the array's storage capacity is reduced by one disk. It strips data across

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 15


multiple drives to increase performance. But it also adds the aspect of redundancy
by distributing parity information across the disks.

• This is a slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.
• In the figure, we can notice how the parity bit “rotates”.
• This was introduced to make the
random write performance better.

Evaluation:

• Reliability: 1 - RAID-5 allows recovery of at most 1 disk failure (because of the


way parity works). If more than one disk fails, there is no way to recover the
data. This is identical to RAID-4.
• Capacity: (N-1)*B - Overall, space equivalent to one disk is utilized in storing
the parity. Hence, (N-1) disks are made available for data storage, each disk
having B blocks.

RAID 6 (Striped disks with double


parity) - RAID 6 is similar to RAID 5, but
the parity data are written to two
drives. The use of additional parity
enables the array to continue to
function even if two disks fail
simultaneously. However, this extra
protection comes at a cost. RAID 6 has
a slower write performance than RAID
5. The chances that two drives break
down at the same moment are minimal.
However, if a drive in a RAID 5 system
died and was replaced by a new drive, it
takes a lot of time to rebuild the
swapped drive. If another drive dies during that time, you still lose all of your data.
With RAID 6, the RAID array will even survive that second failure also.

Nested RAID levels - Some RAID levels are referred to as nested RAID because they
are based on a combination of RAID levels, such as:

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 16


1. RAID 10 (1+0) - This level Combines RAID 1 and RAID 0 in a single system, which
offers higher performance than RAID
1, but at a much higher cost. This is a
nested or hybrid RAID configuration. It
provides security by mirroring all data
on secondary drives while using striping
across each set of drives to speed up
data transfers.

2. RAID 01 (0+1) - RAID 0+1 is similar


to RAID 1+0, except the data
organization method is slightly
different. Rather than creating a mirror
and then striping the mirror, RAID 0+1
creates a stripe set and then mirrors the stripe set.

3. RAID 03 (0+3, also known as RAID 53 or RAID 5+3) - This level uses striping
similar to RAID 0 for RAID 3's virtual disk blocks. This offers higher performance than
RAID 3 but at a higher cost.

4. RAID 50 (5+0) - This configuration combines RAID 5 distributed parity with RAID
0 striping to improve RAID 5 performance without reducing data protection.

Non-standard RAID levels - Non-standard RAID levels vary from standard RAID levels,
and they are usually developed by companies or organizations for mainly
proprietary use, such as:

1. RAID 7 - A non-standard RAID level is based on RAID 3 and RAID 4 that adds
caching. It includes a real-time embedded OS as a controller, caching via a high-
speed bus, and other stand-alone computer characteristics.

2. Adaptive RAID - This level enables the RAID controller to decide how to store
the parity on disks. It will choose between RAID 3 and RAID 5, depending on
which RAID set type will perform better with the kind of data being written to the
disks.

3. Linux MD RAID 10 - The Linux kernel provides this level. It supports the
creation of nested and non-standard RAID arrays. Linux software RAID can
also support standard RAID 0, RAID 1, RAID 4, RAID 5, and RAID 6
configurations.

Pipelining
Pipelining is a technique of decomposing a sequential process into suboperations,
with each subprocess being executed in a special dedicated segment that operates
concurrently with all other segments. A pipeline can be visualized as a collection of
processing segments through which binary information flows. Each segment
performs partial processing dictated by the way the task is partitioned. The result

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 17


obtained from the computation in each segment is transferred to the next segment
in the pipeline. The final result is obtained after the data have passed through all
segments. The name “pipeline” implies a flow of information analogous to an
industrial assembly line. It is characteristic of pipelines that several computations
can be in progress in distinct segments at the same time. The overlapping of
computation is made possible by associating a register with each segment in the
pipeline. The registers provide isolation between each segment so that each can operate
on distinct data simultaneously.

Simplest way of viewing the pipeline structure is to imagine that each segment
consists of an input register followed by a combinational circuit. The register holds
the data and the combinational circuit performs the suboperation in the particular
segment. The output of the combinational circuit in a given segment is applied to the
input register of the next segment. A clock is applied to all registers after enough time
has elapsed to perform all segment activity. In this way the information flows through
the pipeline one step at a time. The pipeline organization will be demonstrated by
means of a simple example. Suppose that we want to perform the combined multiply
and add operations with a stream of numbers. Ai * Bi + Ci for i _ 1, 2, 3, . . . , 7

Each suboperation is to be implemented in a segment within a pipeline. Each


segment has one or two registers and a combinational circuit as shown in Fig. below.
R1 through R5 are registers that receive new
data with every clock pulse. The multiplier
and adder are combinational circuits. The
suboperations performed in each segment
of the pipeline are as follows:

R1  Ai, R2  Bi Input Ai and Bi


R3  R1 * R2, R4  Ci Multiply and input Ci
R5  R3 + R4 Add Ci to product

The five registers are loaded with new data every clock pulse. The effect of each clock
is shown in above table. The first clock pulse transfers A1 and B1 into R1 and R2. The
second clock pulse transfers the product of R1 and R2 into R3 and C1 into R4. The
same clock pulse transfers A2 and B2 into R1 and R2. The third clock pulse operates
on all three segments simultaneously. It places A3 and B3 into R1 and R2, transfers
the product of R1 and R2 into R3, transfers C2 into R4, and places the sum of R3 and

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 18


R4 into R5. It takes three clock pulses to fill up the pipe and retrieve the first output
from R5. From there on, each clock produces a new output and moves the data one
step down the pipeline. This happens as long as new input data flow into the system.
When no more input data are available, the clock must continue until the last output
emerges out of the pipeline.

Any operation that can be decomposed into a sequence of suboperations of about the
same complexity can be implemented by a pipeline processor. The technique is
efficient for those applications that need to repeat the same task many times with
different sets of data. The behaviour of a pipeline can be illustrated with a space-
time diagram. This is a diagram that shows the segment utilization as a function of
time.

The general structure of a four-segment pipeline is illustrated in below figure. The


operands pass through all four segments in a fixed sequence. Each segment consists
of a combinational circuit Si that performs a suboperation over the data stream
flowing through the pipe. The segments are separated by registers Ri that hold the
intermediate results between the stages. Information flows between adjacent stages
under the control of a common clock applied to all the registers simultaneously. We
define a task as the total operation performed going through all the segments in the
pipeline.

The diagram shows six tasks T1 through T6 executed in four segments. Initially, task
T1 is handled by segment 1. After the first clock, segment 2 is busy with T1, while
segment 1 is busy with task T2. Continuing in this manner, the first task T1 is
completed after the fourth clock cycle. From then on, the pipe completes a task every
clock cycle. No matter how many segments there are in the system, once the pipeline
is full, it takes only one clock period to obtain an output.

The first task T1 requires a time equal to ktp to complete its operation since there
are k segments in the pipe. The remaining n - 1 tasks emerge from the pipe at the
rate of one task per clock cycle and they will be completed after a time equal to (n -
1)tp. Therefore, to complete n tasks using a k-segment pipeline requires k + (n - 1)
clock cycles

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 19


There are two areas of computer design where the pipeline organization is applicable.
An arithmetic pipeline divides an arithmetic operation into suboperations for
execution in the pipeline segments. An instruction pipeline operates on a stream of
instructions by overlapping the fetch, decode, and execute phases of the instruction
cycle.

Arithmetic Pipeline - Pipeline arithmetic units are usually found in very high-speed
computers. They are used to implement floating-point operations, multiplication of
fixed-point numbers, and similar computations encountered in scientific problems. A
pipeline multiplier is essentially an array multiplier, with special adders designed to
minimize the carry propagation time through the partial products. Floating-point
operations are easily decomposed into suboperations. Example of a pipeline unit for
floating-point addition and subtraction. The inputs to the floating-point adder pipeline
are two normalized floating-point binary numbers.

X = A x 2a and Y = B x 2b

A and B are two fractions that represent the mantissas and a and b are the
exponents. The floating-point addition and subtraction can be performed in four
segments. The registers R are placed between the segments to store intermediate
results. The suboperations that are performed in the four segments are:

1. Compare the exponents.


2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.

The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result. The exponent difference
determines how many times the mantissa associated with the smaller exponent must
be shifted to the right. This produces an alignment of the two mantissas. It should be
noted that the shift must be designed as a combinational circuit to reduce the shift
time. The two mantissas are added or subtracted in segment 3. The result is normalized
in segment 4. When an overflow occurs, the mantissa of the sum or difference is shifted
right and the exponent incremented by one. If an underflow occurs, the number of
leading zeros in the mantissa determines the number of left shifts in the mantissa and
the number that must be subtracted from the exponent.

Instruction Pipeline - Pipeline processing can occur not only in the data stream
but in the instruction stream as well. An instruction pipeline reads consecutive
instructions from memory while previous instructions are being executed in other
segments. This causes the instruction fetch and execute phases to overlap and
perform simultaneous operations. One possible digression associated with such a
scheme is that an instruction may cause a branch out of sequence. In that case the
pipeline must be emptied and all the instructions that have been read from memory
after the branch instruction must be discarded.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 20


Consider a computer with an instruction fetch unit and an instruction execution unit
designed to provide a two-segment pipeline. The instruction fetch segment can be
implemented by means of a first-in, first-out (FIFO) buffer. This is a type of unit that
forms a queue rather than a stack. Whenever the execution unit is not using memory,
the control increments the program counter and uses its address value to read
consecutive instructions from memory. The instructions are inserted into the FIFO
buffer so that they can be executed on a first-in, first-out basis. Thus an instruction
stream can be placed in a queue, waiting for decoding and processing by the execution
segment. The instruction stream queuing mechanism provides an efficient way for
reducing the average access time to memory for reading instructions. Whenever there
is space in the FIFO buffer, the control unit initiates the next instruction fetch phase.
The buffer acts as a queue from which control then extracts the instructions for the
execution unit. Computers with complex instructions require other phases in addition
to the fetch and execute to process an instruction completely. In the most general case,
the computer needs to process each instruction with the following sequence of steps.

1. Fetch the instruction from memory.


2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

There are certain difficulties that will prevent the instruction pipeline from operating
at its maximum rate. Different segments may take different times to operate on the
incoming information. Some segments are skipped for certain operations. For example,
a register mode instruction does not need an effective address calculation. Two or more
segments may require memory access at the same time, causing one segment to wait
until another is finished with the memory. Memory access conflicts are sometimes
resolved by using two memory buses for accessing instructions and data in separate
modules. In this way, an instruction word and a data word can be read simultaneously
from two different modules. The design of an instruction pipeline will be most efficient
if the instruction cycle is divided into segments of equal duration. The time that each
step takes to fulfill its function depends on the instruction and the way it is executed.

Data Hazards
Data Hazards occur when an instruction depends on the result of previous
instruction and that result of instruction has not yet been computed. whenever
two different instructions use the same storage. the location must appear as if it is
executed in sequential order. Alternatively, we can say that when the execution of an
instruction is dependent on the results of a prior instruction that’s still being processed
in a pipeline, data hazards occur. Consider the following scenario.

The result of the ADD instruction is written into the register X3 at t5 in the example
above. If bubbles aren’t used to postpone the following SUB instruction, all the three
operations will use data from X3, that is earlier in the ADD process.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 21


Classification of Data Hazards

Data hazards are divided into four types according to the order in which READ or
WRITE operations are performed on the register: There are four types of data
dependencies: Read after Write (RAW), Write after Read (WAR), Write after Write
(WAW), and Read after Read (RAR). These are explained as follows below.

1. Read after Write (RAW) : It is also known as True dependency or Flow


dependency. It occurs when the value produced by an instruction is required
by a subsequent instruction. Stalls are required to handle these hazards. For
example,

ADD R1, --, --;


SUB --, R1, --;

2. Write after Read (WAR) : It is also known as anti-dependency. These


hazards occur when the output register of an instruction is used right after
read by a previous instruction. For example,

ADD --, R1, --;


SUB R1, --, --;

3. Write after Write (WAW) : It is also known as output dependency. These


hazards occur when the output register of an instruction is used for write
after written by previous instruction. For example,

ADD R1, --, --;

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 22


SUB R1, --, --;

4. Read after Read (RAR) : It occurs when the instruction both read from the
same register. Since reading a register value does not change the register
value, these Read after Read (RAR) hazards don’t cause a problem for the
processor. For example,

ADD --, R1, --;


SUB --, R1, --;

Handling Data Hazards : These are various methods we use to handle hazards:
Forwarding, Code recording, and Stall insertion. These are explained as follows
below.

1. Forwarding: It adds special circuitry to the pipeline. This method works


because it takes less time for the required values to travel through a wire
than it does for a pipeline segment to compute its result.

2. Code reordering: We need a special type of software to reorder code. We


call this type of software a hardware-dependent compiler.

3. Stall Insertion: it inserts one or more installs (no-op instructions) into the
pipeline, which delays the execution of the current instruction until the
required operand is written to the register file, but this method decreases
pipeline efficiency and throughput.

The following are some of the probable solutions of problem discussed in top:

Solution 1: Forwarding of Data – Data forwarding is the process of sending a result


straight to that functional unit which needs it: a result is transferred from one unit’s
output to another’s input. The goal is to have the solution ready for the next instruction
as soon as possible. In this scenario, the ADD result can be found at the ALU output
in ADD –IE, that is the t3 end.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 23


In case the control unit can control and forward this to the SUB-IE stage at t4 just
before writing to the output register X3, the pipeline will proceed without halting. This
necessitates additional processing to detect and respond to this data hazard. It’s worth
noting that, though Operand Fetch normally occurs in the ID stage, it’s only utilised
in the IE stage. As a result, the IE stage receives forwarding as an input. OR and AND
instructions can also be used to forward data in a similar way.

Solution 2: When generating executable code, the compiler can recognise data
dependencies and reorganise the instructions appropriately. This will make the
device easier to use. If the reordering described above is not possible, the compiler
can detect and insert a no operation (or NOP) instruction(s). NOP refers to a software-
generated dummy instruction equivalent bubble. During the code optimization stage
of the compilation process, the compiler examines data dependencies.

Solution 3: At the IF stage of the SUB instruction, add three bubbles. This will make
it easier for SUB – ID to work at t6. As a result, all subsequent instructions in the
pipe are similarly delayed.

Example 2: Consider
the following sequence
of instructions:

sub R2, R1, R3


and R12, R2, R5

sub reads regs1 and 3


in cycle 2 and passes
them to the ALU. It is
not until the cycle 5
that it writes the
answer to the Reg File. and reads regs 2 and 5 in cycle 3 and passes them to the ALU.
Reg 2 has not yet been updated! When and reads the Reg File it gets the wrong value
for Reg 2. This is called a data hazard. Data hazards occur if an instruction reads a
Register that a previous instruction overwrites in a future cycle. We must eliminate
data hazards or pipelining produces incorrect results.

Branch Hazards

As soon as we branch to a new instruction, all the instructions that are in the pipeline
behind the branch become invalid!

lw R5, (400)R14
beq R3, R2,100
add R7, R8, R9
.. +100 sub R7, R8, R9

Either the add or sub instruction after the beq will be executed depending on the
contents of regs 2 and 3. We can include extra hardware to calculate the branch offset

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 24


in the decode cycle.
Data forwarding then
makes it possible to do
the branch just one
cycle later - insert
a nop.

A clever compiler can


eliminate the effect of
the delay by inserting
an
instruction after the
branch! This can be
the previous
instruction! (If it is not
involved in the
branch.)

lw R5, (400)R14
beq R3, R2, 100

can be re-ordered:

beq R3, R2,100


lw R5, (400)R14

Instructional
hazards
Pipeline execution of
instructions will reduce
the time and improves
the performance.
Whenever this stream is
interrupted, the pipeline
stalls. A branch instruction may also cause the pipeline to stall. The effect of branch
instructions and the techniques that can be used for mitigating their impact are
discussed with unconditional branches and conditional branches. Boards are designed
to control the flow of data between registers and multiple arithmetic units in the
presence of conflicts caused by hardware resource limitations (structural hazards) and
by dependencies between instructions (data hazards). Data hazards can be classified
as flow dependencies (Read-After-Write), output dependencies (Write-After-Write)
and anti-dependencies (Write-After-Read).

Read-After-Write (RAW) Hazards - A Read-After-Write hazard occurs when an


instruction requires the result of a previously issued, but as yet uncompleted
instruction. RAW Example

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 25


R6 = R1 * R2
R7 = R5 + R6

Write-After-Write (WAW) Hazards - A Write-After-Write hazard occurs when an


instruction tries to write its result to the same register as a previously issued, but
as yet uncompleted instruction. WAW Example
R6 = R1 * R2
R6 = R4 + R5

Write-After-Read (WAR) Hazards - A Write-After-Read hazard occurs when an


instruction tries to write to a register which has not yet been read by a previously
issued, but as yet uncompleted instruction. This hazard cannot occur in most systems,
but could occur in the CDC 6600 because of the way instructions were issued to the
arithmetic units.WAR Example
X3 = X1 / X2
X5 = X4 * X3
X4 = X0 + X6

Structural Hazards - Structural hazards occur when two or more instructions try
to access the same hardware resource (e.g. two instructions try to write their results
to registers in the same clock period). This would occur, for example, if an instruction
were to be issued to an arithmetic unit which takes three clocks periods to execute its
operation in the clock period immediately following the issue of a previous instruction
to a different arithmetic unit which takes four clock periods to execute.

Unconditional Branches: A
sequence of instructions being
executed in a two-stage
pipeline is shown in Figure.
Instructions I1 to I3 are stored
at successive memory
addresses, and I2 is a branch
instruction.

Figure - An idle cycle caused by


a branch instruction.

Let the branch target be


instruction Ik. In clock cycle 3,
the fetch operation for
instruction I3 is in progress at the same time that the branch instruction is being
decoded and the target address computed. In clock cycle 4, the processor must discard
I3, which has been incorrectly fetched, and fetch instruction Ik. In the meantime, the
hardware unit responsible for the Execute (E) step must be told to do nothing during
that clock period. Thus, the pipeline is stalled for one clock cycle. The time lost as
a result of a branch instruction is often referred to as the branch penalty (Time loss).
In above Figure, the branch penalty is one clock cycle. For a longer pipeline, the
branch penalty may be higher.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 26


For examp1e, Figure (a) shows the effect of a branch instruction on a four-stage
pipeline. The branch address is computed in step E2. Instructions I3 and I4 must be
discarded, and the target instruction, Ik, is fetched in clock cycle 5. Thus, the branch
penalty is two clock cycles. Reducing the branch penalty requires the branch address
to be computed earlier in the pipeline. Typically, the instruction fetch unit has
dedicated hardware to identify a branch instruction and compute the branch target
address as quickly as possible after an instruction is fetched. With this additional
hardware, both of these tasks can be performed in step D2, leading to the sequence of
events shown in Figure (b). In this case, the branch penalty is only one clock cycle.

(a) (b)
Figure Branch timing

Either a cache miss or a branch instruction stalls the pipeline for one or more clock
cycles. To reduce the effect of these interruptions, many processors employ
sophisticated fetch units that can fetch instructions before they are needed and put
them in a queue. Typically, the
instruction queue can store several
instructions.

Figure - Use of instruction queue in


hardware organization.

A separate unit, which we call the


dispatch unit, takes instructions
from the front of the queue and sends
them to the execution unit. This leads
to the organization shown in Figure
(idle cycle). The dispatch unit also performs the decoding function. To be effective, the
fetch unit must have sufficient decoding and processing capability to recognize and
execute branch instructions. It attempts to keep the instruction queue filled at all times
to reduce the impact of occasional delays when fetching instructions. If there is a delay

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 27


in fetching instructions because of a branch or a cache miss, the dispatch unit
continues to issue instructions from the instruction queue. The fetch unit continues
to fetch instructions and add them to the queue.

To be effective, the fetch unit must have sufficient decoding and processing capability
to recognize and execute branch instructions. It attempts to keep the instruction queue
filled at all times to reduce the impact of occasional delays when fetching instructions.
If there is a delay in fetching instructions because of a branch or a cache miss, the
dispatch unit continues to issue instructions from the instruction queue. The fetch
unit continues to fetch instructions and add them to the queue.

CONDITIONAL BRANCHES AND BRANCH PREDICTION: A conditional branch


instruction introduces the added hazard caused by the dependency of the branch
condition on the result of a preceding instruction. The decision to branch cannot be
made until the execution of that instruction has been completed. The branch
instruction will introduce branch penalty which would reduce the gain in performance
expected from pipelining. Branch instructions can be handled in several ways to reduce
their negative impact on the rate of execution of instructions. They are:

(i) delayed branching


(ii) branch prediction and
(iii) Instruction pre-fetch.

Delayed branching: The processor fetches next instructions before it determines


whether the current instruction is a branch instruction. When execution of current
instruction is completed and a branch is to be made, the processor must discard
remaining instructions and fetch the new branched instruction at the branch target.
The location following a branch instruction is called a branch delay slot. There may be
more than one branch delay slot, depending on the time it takes to execute a branch
instruction.

A technique called delayed branching can minimize the penalty incurred as a result of
conditional branch instructions. The idea is simple. The instructions in the delay slots
are always fetched. Therefore, we would like to arrange for them to be fully executed
whether or not the branch is taken. The objective is to be able to place useful
instructions in these slots. If no useful instructions can be placed in the delay slots,
these slots must be filled with NOP instructions.

Consider the instruction sequence

LOOP Shift_Left R1
Decrement R2
Branch=O LOOP
NEXT Add R1,R3

a counter to determine the number of times the contents of register Rl are shifted left.
For a processor with one delay slot, the instructions can be reordered as
The shift instruction is fetched while the branch instruction is being executed. After

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 28


evaluating the branch condition, the processor fetches the instruction at LOOP or at
NEXT, depending on whether the branch condition is true or false, respectively. In
either case, it completes execution of the shift instruction. The sequence of events
during the last two passes in the loop is illustrated below. Pipelined operation is not
interrupted at any time, and there are no idle cycles. Logically, the program is executed
as if the branch instruction were placed after the shift instruction. That is, branching
takes place one instruction later than where the branch instruction appears in the
instruction sequence in the memory, hence the name "delayed branch".

The effectiveness of the delayed branch approach depends on how often it is possible
to reorder instructions like,

LOOP Decrement R2
Branch=O LOOP
Shift_Left R1
NEXT Add R1,R3

Figure - Execution timing showing the


delay slot being filled during the last
two processes through the loop.

Experimental data collected from many


programs indicate that sophisticated
compilation techniques can use one
branch delay slot in as many as 85
percent of the cases. For a processor
with two branch delay slots, the
Compiler attempts to find two
instructions preceding the branch
instruction that it can move into the
delay slots without introducing a
logical error. The chances of finding
two such instructions are considerably
less than the chances of finding one. Thus, if increasing the number of pipeline stages
involves an increase in the number of branch delay slots, the potential gain in
performance may not be fully realized.

II Branching Prediction (Static): Another technique for reducing the branch penalty
associated with conditional branches is to attempt to predict whether or not a
particular branch will be taken. The simplest form of branch prediction is to assume
that the branch will not take place and to continue to fetch instructions in sequential
address order. Until the branch condition is evaluated, instruction execution along the
predicted path must be done on a speculative basis.

Speculative execution means that instructions are executed before the processor is
certain that they are in the correct execution sequence. Hence, care must be taken that
no processor registers or memory locations are updated until it is confirmed that these
instructions should indeed be executed. If the branch decision indicates otherwise, the

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 29


instructions and all their associated data in the execution units must be purged, and
the correct instructions fetched and executed.

An incorrectly predicted branch is illustrated in Figure below for a four-stage pipeline.


The figure shows a Compare instruction followed by a Branch>0 instruction. Branch
prediction takes place in cycle 3, while instruction I3 is being fetched. The fetch unit
predicts that the branch will not be taken, and it continues to fetch instruction I4 as
I3 enters the Decode stage. The results of the compare operation are available at the
end of cycle 3. Assuming that they are forwarded immediately to the instruction fetch
unit, the branch condition is evaluated in cycle 4. At this point, the instruction fetch
unit realizes that the prediction was incorrect, and the two instructions in the
execution pipe are purged. A new
instruction, Ik, is fetched from the
branch target address in clock cycle 5.

Figure - Timing when branch decision


has been incorrectly predicted as not
taken.

However, better performance can be


achieved if we arrange for some
branch instructions to be predicted as
taken and others as not taken,
depending on the expected program
behavior. For example, a branch
instruction at the end of a loop causes a branch to the start of the loop for every pass
through the loop except the last one. Hence, it is advantageous to assume that this
branch will be taken and to have the instruction fetch unit start to fetch instructions
at the branch target address. On the other hand, for a branch instruction at the
beginning of a program loop, it is advantageous to assume that the branch will not be
taken.

A decision on which way to predict the result of the branch may be made in hardware
by observing whether the target address of the branch is lower than or higher than the
address of the branch instruction. A more flexible approach is to have the Compiler
decide whether a given branch instruction should be predicted taken or not taken. The
branch instructions of some processors, such as SPARC, include a branch prediction
bit, which is set to 0 or 1 by the compiler to indicate the desired behavior. The
instruction fetch unit checks this bit to predict whether the branch will be taken or not
taken. The branch prediction decision is always the same every time a given instruction
is executed. Any approach that has this characteristic is called static branch
prediction.

Prepared By: Dr. Dheresh Soni VIT Bhopal University Page 30

You might also like