Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 150

MODULE I

 Introduction to distributed operating


systems
 Communication in distributed systems
WHAT IS A DISTRIBUTED SYSTEM?
 computing systems composed of large numbers of
CPUs connected by a high-speed network.

 is a collection of independent computers that


appear to the users of the system as a single
computer.

 This definition has two aspects. The first one deals


with hardware: the machines are autonomous. The
second one deals with software: the users think of
the system as a single computer.
 E.g. Core banking
GOALS
 Advantages of Distributed Systems over
Centralized Systems
• These systems potentially have a much
better price/performance ratio.
• Normal performance at low cost or extremely
high performance at greater cost.
• Some applications are inherently distributed
and hence require DOS.
• Higher reliability.
• Incremental growth is also potentially a big
plus.
GOALS..
 Advantages of Distributed Systems over
Independent PCs.
GOALS..
 Disadvantages of Distributed Systems
HARDWARE CONCEPTS
 Flynn's taxonomy – based on number of instruction

streams, data streams


SISD- single instruction stream, multiple data stream
uniprocessor, personal computers, large
mainframes.
SIMD-single instruction stream, multiple data stream
supercomputers
MISD-multiple instruction stream, single data stream
MIMD-multiple instruction stream, multiple data
stream.
distributed systems
HARDWARE CONCEPTS..
HARDWARE CONCEPTS..
 Bus-Based Multiprocessors.
HARDWARE CONCEPTS..
 bus has 32 or 64 address lines, 32 or 64 data lines, and 32 or
more control lines, all of which operate in parallel.
 Coherent memory.
 4 or 5 CPUs, performance degrades.
 Cache is introduced.
 The memory is now incoherent, and the system is difficult to
program.

solutions.
1) write-through cache - a word is written to the
cache, is written to memory as well.
all writes, hits and misses, cause bus traffic,
introducing snoopy cache (snooping
cache)(eavesdropping)
cache continuously monitors bus for updates
HARDWARE CONCEPTS..
 Switched Multiprocessors

Limitation

n CPUs and n memories, require n^2 crosspoint switches.


For large n, this number can be prohibitive
HARDWARE CONCEPTS..
The omega network
n(..4) CPUs and n(..4) memories,
switching stages - log2n(..2),
switches per stage - n/2 (..2),
total switches - (n log2n)/2 (..4)
HARDWARE CONCEPTS..
 Limitation
 Leads to delay and expensive

 Solution
 NUMA (NonUniform Memory Access) machine.
Each CPU can access its own local memory
quickly, but accessing anybody else's memory is
slower.

 Limitation of NUMA - placement of the


programs and data becomes critical in order to
make most access go to the local memory.
HARDWARE CONCEPTS..
 Bus-Based Multicomputers.

 a collection of workstations on a LAN than a collection of


CPU cards inserted into a fast bus.

 since it is only for CPU-to-CPU communication, the volume


of traffic will be of lower magnitude
HARDWARE CONCEPTS..
 Switched Multicomputers
 Various interconnection networks have been
proposed and built, but all have the property
that each CPU has direct and exclusive access
to its own, private memory.

GRID
best suited to problems that have an inherent two-dimensional
nature, such as graph theory or vision
HARDWARE CONCEPTS..

HYPERCUBE

 A hypercube is an n–dimensional cube


 For an n–dimensional hypercube, each CPU has n connections
to other CPUs.
 the complexity of the wiring increases logarithmically with the
size.
 Since only nearest neighbours are connected, many messages
have to make several hops to reach their destination.
 the longest possible path also grows logarithmically with the
size, in contrast to the grid, where it grows as the square root
of the number of CPUs.
SOFTWARE CONCEPTS..
 Loosely-coupled software allows machines
and users of a distributed system to be
fundamentally independent of one another,
but still to interact to a limited degree
where that Is necessary.

 Tightly-coupled
SOFTWARE CONCEPTS..
 Network Operating Systems

 loosely-coupled software on loosely-coupled hardware


it is sometimes possible for a user to log into another workstation
remotely by using a command such as rlogin machine.
 Networks of workstations often also have a remote copy command to

copy files from one machine to another.


rcp machine1:file1 machine2:file2
SOFTWARE CONCEPTS..
 File servers generally maintain hierarchical
file systems, each with a root directory
containing subdirectories and files.

 Workstations can import or mount these file


systems, augmenting their local file systems
with those located on the servers.
SOFTWARE CONCEPTS..

/work/news.
/games/work/news
SOFTWARE CONCEPTS..
 True Distributed Systems
 tightly-coupled software on the same
loosely-coupled (i.e., multicomputer)
hardware.

 distributed system is one that runs on a


collection of networked machines but acts
like a virtual uniprocessor.

 users should not have to be aware of the


existence of multiple CPUs in the system.
SOFTWARE CONCEPTS..
 single, global inter process communication
mechanism so that any process can talk to any
other process.

 The file system must look the same everywhere


and should be visible at every location, subject
to protection and security constraints.

 identical kernels run on all the CPUs in the


system, each kernel can have considerable
control over its own local resources.
SOFTWARE CONCEPTS..
 Multiprocessor Timesharing Systems.
 tightly-coupled software on tightly-coupled
hardware.
 existence of a single run queue: a list of all the
processes in the system that are logically
unblocked and ready to run. The run queue is a
data structure kept in the shared memory
SOFTWARE CONCEPTS..
 The operating system normally contains a
traditional file system, including a single, unified
block cache.
DESIGN ISSUES
 Transparency
 Flexibility
 Reliability
 Performance
 Scalability
DESIGN ISSUES..
 Transparency

 how to achieve the single-system image.

 Transparency can be achieved at two


different levels. - User level, programmer
level
DESIGN ISSUES..

 The name of the resource must not secretly encode the location
of the resource.

 file or directory migrated from one server to another forces it to


acquire a new name because the system of remote mounts is not
migration transparent.
DESIGN ISSUES..
 the servers can decide by themselves to replicate any file on any or all
servers, without the users having to know about it. Such a scheme is
replication transparent because it allows the system to make copies of
heavily used files without the users even being aware that this is happening.

 users try to access the same resource at the same time? For example, what
happens if two users try to update the same file at the same time? If the
system is concurrency transparent, the users will not notice the existence of
other users. One mechanism for achieving this form of transparency would be
for the system to lock a resource automatically once someone had started to
use it, unlocking it only when the access was finished. In this manner, all
resources would only be accessed sequentially, never concurrently.

 Programmers who actually want to use multiple CPUs for a single problem


will have to program this explicitly.
DESIGN ISSUES..
 Flexibility
 Monolithic kernel- each machine should run a
traditional kernel that provides most services itself
 Most system calls are made by trapping to the kernel,
having the work performed there, and having the
kernel return the desired result to the user process
DESIGN ISSUES..
 Microkernel- kernel should provide as little as possible, with the bulk
of the operating system services available from user-level servers.

 It basically provides just four minimal services:


1. An inter process communication mechanism.
2. Some memory management.
3. A small amount of low-level process management and
scheduling.
4. Low-level input/output.

Trapping to the kernel and doing everything there may well be


faster than sending messages to remote servers

monolithic kernel(sprite) faster than microkernel (Ameoba)


DESIGN ISSUES..
 Reliability
 Availability refers to the fraction of time that the system
is usable. A highly reliable system must be highly available.
the more copies that are kept, the better the availability,
but the greater the chance that they will be inconsistent.

 Another aspect of overall reliability is security. Files and


other resources must be protected from unauthorized
usage. In a distributed system, when a message comes in to
a server asking for something, the server has no simple way
of determining who it is from. No name or identification
field in the message can be trusted, since the sender may
be lying. At the very least, considerable care is required
here.
DESIGN ISSUES..
 another issue relating to reliability is fault
tolerance. Suppose that a server crashes and
then quickly reboots. what happens? Does
the server crash bring users down with it? If
the server has tables containing important
information about ongoing activities,
recovery will be difficult at best.
DESIGN ISSUES..
 Performance
 performance metrics

1.Response time
2.throughput (number of jobs per hour)
3.system utilization, and
4.amount of network capacity consumed

 The performance problem is compounded by the fact


that communication, which is essential in a distributed
system (and absent in a single-processor system) is
typically quite slow.
 Thus to optimize performance, one often has to
minimize the number of messages .
DESIGN ISSUES..
 Scalability
Most current distributed systems are designed to work with a few
hundred CPUs. It is possible that future systems will be orders of
magnitude larger, and solutions that work well for 200 machines will
fail miserably for 2 million machines

network capacity,
not fault tolerant
saturate all the
communication lines

collecting and transporting all the input


and output information would again be
a bad idea.
DESIGN ISSUES..
 Solution
 Only decentralized algorithms should be used. These
algorithms generally have the following characteristics,
which distinguish them from centralized algorithms:

 1. No machine has complete information about the


system state.
 2. Machines make decisions based only on local
information.
 3. Failure of one machine does not ruin the algorithm.
 4. There is no implicit assumption that a global clock
exists.
COMMUNICATION IN DISTRIBUTED
SYSTEMS

 most interprocess communication implicitly


assumes the existence of shared memory.

 In a distributed system there is no shared


memory, so the entire nature of interprocess
communication is based on message passing.

 The set of rules that communicating


processes must adhere to, known as
protocols.
LAYERED PROTOCOLS
 Open Systems Interconnection Reference Model
(OSI).

 An open system is one that is prepared to


communicate with any other open system by
using standard rules that govern the format,
contents, and meaning of the messages sent and
received, called protocols.

 The collection of protocols used in a particular


system is called a protocol suite or protocol
stack as there are 7 layers
LAYERED PROTOCOLS..
LAYERED PROTOCOLS..

A typical message as it appears on the network.


LAYERED PROTOCOLS..
 The Physical Layer

key issues
 Transmits 0s and 1s.
 How many volts to use for 0 and 1,
 how many bits per second can be sent,
 whether transmission can take place in both
directions simultaneously
 the size and shape of the network connector (plug),
 the number of pins and meaning of each
 The physical layer protocol deals with standardizing
the electrical, mechanical, and signalling interfaces
LAYERED PROTOCOLS..
 The Data Link Layer
 groups the bits into units, called frames
 sees that each frame is correctly received
 Puts a special bit pattern on the start and end
of each frame, to mark them, as well as
computing a checksum by adding up all the
bytes in the frame in a certain way.
 When the frame arrives, the receiver re
computes the checksum from the data and
compares the result to the checksum following
the frame.
LAYERED PROTOCOLS..
LAYERED PROTOCOLS..
 The Network Layer
 Routing- For a message to get from the
sender to the receiver, have to make a
number of hops, at each one choosing an
outgoing line to use. choosing the best path
is called routing.

 the shortest route is not always the best


route, routing is affected by delay(amount of
traffic and the number of messages)
LAYERED PROTOCOLS..
Protocols used

connection-oriented connection-less

X.25 IP (Internet Protocol)

telephone companies Setup not required


European PTTs

Setup
required
LAYERED PROTOCOLS..
 The Transport Layer
 the session layer should be able to deliver a
message to the transport layer with the
expectation that it will be delivered without
loss.
 the transport layer breaks it into pieces small
enough for each to fit in a single packet, assigns
each one a sequence number, and then sends
them all.
 Protocols - X.25(packets arrive sequentially) or
IP (may arrive mix)
 Transport layer puts all messages in order
LAYERED PROTOCOLS..
 International Standards Organization (ISO)
protocols
TP0 to TP4 - differences relate to error handling
and the ability to send several transport
connections over a single X.25 connection

 Department of
Defense Network (DoD) Network Model
TCP(connection Oriented)
UDP(connection less)
LAYERED PROTOCOLS..
 The Session Layer
 enhanced version of the transport layer.
 It provides dialog control and it provides
synchronization facilities.
 The latter are useful to allow users to insert
checkpoints into long transfers, so that in the
event of a crash it is only necessary to go back to
the last checkpoint, rather than all the way back
to the beginning
LAYERED PROTOCOLS..
 The Presentation Layer
 is concerned with the meaning of the bits.

 it is possible to define records and then have the


sender notify the receiver that a message
contains a particular record in a certain format.

 machines with different internal representations


can communicate easily
LAYERED PROTOCOLS..
 The Application Layer

 a collection of miscellaneous protocols for


common activities such as electronic mail, file
transfer, and connecting remote terminals to
computers over a network.

 X.400, electronic mail protocol, X.500 directory


server
ASYNCHRONOUS TRANSFER MODE
NETWORKS
 voice traffic is smooth, needing a low, but
constant bandwidth, whereas data traffic is
bursty, usually needing no or small bandwidth.

 a hybrid form (circuit switching+packet


switching) using fixed-size blocks over virtual
circuits was chosen as a compromise that gave
reasonably good performance for both types of
traffic. - ATM (Asynchronous Transfer Mode)
Routing
information

Connection(virtual circuit)

Packets, chopped up by the hardware into


small, fixed-sized units called cells are sent
 a sender first establishes a connection (i.e., a virtual
circuit) to the receiver or receivers.

 During connection establishment, a route is


determined from the sender to the receiver(s)

 routing information is stored in the switches along the


way.

 Using this connection, packets can be sent, but they


are chopped up by the hardware into small, fixed-sized
units called cells. The cells for a given virtual circuit
all follow the path stored in the switches.

 When the connection is no longer needed, it is released


and the routing information purged from the switches.
ASYNCHRONOUS TRANSFER MODE
NETWORKS ..
 Advantages
 single network can now be used to transport an
arbitrary mix of voice, data, broadcast television,
videotapes, radio, and other information
efficiently, replacing what were previously separate
networks (telephone, X.25, cable TV, etc.).
 cost saving and simple.
 Cell switching helps in multicasting (one cell going
to many destinations)(broadcast tv)
 Fixed-size cells allow rapid switching.
 Eliminate delay of small packets.
ASYNCHRONOUS TRANSFER MODE
NETWORKS ..

transport connections

handles breaking packets


into cells and reassembling
them at the other end
deals with cells and cell
transport, including routing.

Same as in OSI model


ASYNCHRONOUS TRANSFER MODE
NETWORKS ..
 The ATM Physical Layer.

 stream of cells are sent onto a wire or fiber.


 The transmission stream is continuous. When
there are no data to be sent, empty cells are
transmitted.

 uses SONET (Synchronous Optical NETwork),


putting its cells into the payload portion of
SONET frames.
ASYNCHRONOUS TRANSFER MODE
NETWORKS .. 90 cols

9
rows

SONET is based on the Synchronous Transport Signal 1 (STS-1) frame.


To remain compatible with existing low-level carrier and channel
transmission rates, the frame rate of 8,000 frames per second (or one
frame every 125μs)
810 bytes x 8 bits per byte x 8000 frames per second = 51,840,000 bits per
second = 51.84 Mbps
ASYNCHRONOUS TRANSFER MODE
NETWORKS ..
 The ATM Layer
ASYNCHRONOUS TRANSFER MODE
NETWORKS ..
 GFC - flow control
 VPI and VCI - identify which path and virtual circuit a cell
belongs to. Routing tables along the way use this
information for routing
 VPI - groups together a collection of virtual circuits for
the same destination and make it possible for a carrier to
reroute all of them without having to examine the VCI
field.
 Payload type - distinguishes data cells from control cells,
and further identifies several kinds of control cells.
 CLP - can be used to mark some cells as less important
than others, so if congestion occurs, the least important
ones will be the ones dropped.
 1-byte checksum over the header (but not the data).
ASYNCHRONOUS TRANSFER MODE
NETWORKS ..
 The ATM Adaptation Layer
 Does disassembly/reassembly of packets into
cells and vice versa
ASYNCHRONOUS TRANSFER MODE
NETWORKS ..

AAL5 packet (SEAL)


ASYNCHRONOUS TRANSFER MODE
NETWORKS...
 SEAL (Simple and Efficient Adaptation Layer)
 It uses only one bit in the ATM header, one of the bits in the
Payload type field. This bit is normally 0, but is set to 1 in the
last cell of a packet.
 The last cell contains a trailer in the final 8 bytes.
 In most cases there will be some padding (with zeros) between
the end of the packet and the start of the trailer.
 With SEAL, the destination just assembles incoming cells for
each virtual circuit until it finds one with the end-of-packet bit
set.
 Then it extracts and processes the trailer.
 The trailer has four fields. The first two are each 1 byte long
and are not used. Then comes a 2-byte field giving the packet
length, and a 4-byte checksum over the packet, padding, and
trailer.
ASYNCHRONOUS TRANSFER MODE
NETWORKS...
 ATM Switching
ASYNCHRONOUS TRANSFER MODE
NETWORKS...
 When a cell arrives, its VPI and VCI fields are
examined.
 the cell is routed to the correct output port.
 when two cells arrive at the same time on
different input lines and need to go to the
same output port.
 throwing one of them away is allowed by the
standard
 An alternative scheme is to pick one of them
at random and forward it, holding the other
cell until later.
ASYNCHRONOUS TRANSFER MODE
NETWORKS...
 If two ports each have streams of cells for the same
destination, substantial input queues will build up,
blocking other cells behind them that want to go to
output ports that are free. This problem is known as
head-of-line blocking.

 Solutions
 Cells are copied into a queue associated with the
output buffer and lets it wait there.
 Can have pool of buffers that can be used for both
input and output buffering.
 buffer on the input side, but allow the second or third
cell in line to be switched, even if the first one cannot
be.
ASYNCHRONOUS TRANSFER MODE
NETWORKS...
 Some Implications of ATM for Distributed
Systems

 For high-speed wide-area distributed systems, new


protocols and system architectures will be needed to deal
with the latency in many applications, especially
interactive ones.

 Flow control and congestion control.

 switches are permitted to drop cells if they get congested.


Dropping even one cell probably means waiting for a
timeout and having the whole packet be retransmitted. For
services that need a uniform rate, such as playing music,
this could be a problem.
THE CLIENT-SERVER MODEL
 Motivation
 wide-area distributed system can probably
use the OSI or TCP/IP protocols without any
loss in (the already meager) performance.

 for a LAN-based distributed system, the


protocol overhead is often substantial. So
much CPU time is wasted running protocols
that the effective throughput over the LAN is
often only a fraction of what the LAN can do.
THE CLIENT-SERVER MODEL..
 Clients and Servers.
 servers, that offer services to the users, called clients.
 is usually based on a simple, connectionless
request/reply protocol.
THE CLIENT-SERVER MODEL..
 Advantages
 simplicity - No connection established before
use or torn down.
 efficiency - The protocol stack is shorter and
thus more efficient.

legal requests and the set of legal replies to these requests

gets the packets from client to server and back


THE CLIENT-SERVER MODEL..
 Example
THE CLIENT-SERVER MODEL..
THE CLIENT-SERVER MODEL..
THE CLIENT-SERVER MODEL..
ISSUES
 Addressing
 we have the following methods for addressing processes:
 1. Hardwire machine.number into client code - not transparent
 2. Let processes pick random addresses; locate them by
broadcasting. - generates extra load on the system
 3. Put ASCII server names in clients; look them up at run
time. - requires a centralized component, the name server

(a) Machine.process addressing. (b) Process addressing with broadcasting. (c) Address lookup via a name
THE CLIENT-SERVER MODEL..
ISSUES
 Blocking versus Nonblocking Primitives
Client blocked
In send primitive.
until message sent
In recieve primitive.
Until message
recieved

A blocking send primitive.

Client blocked
Only till message
is copied to
message buffer
A non blocking send primitive.
THE CLIENT-SERVER MODEL..
ISSUES
 Disadvantage of non blocking - the sender cannot modify the
message buffer until the message has been sent which Can lead to
overwriting..
 Since sending process has no idea of when the transmission is done
its not safe to use buffer

 Solutions
 1) kernel copy the message to an internal kernel buffer and then
allow the process to continue.
 Overhead(performance, extra copy) of copying message to kernel
buffer.

 2) interrupt the sender when the message has been sent to inform
it that the buffer is once again available.
 user-level interrupts make programming tricky.
THE CLIENT-SERVER MODEL..
ISSUES
 Buffered versus Unbuffered Primitives
 Unbuffered primitives
 receive(addr, &m) - tells the kernel of the machine on which it is running that the
calling process is listening to address addr and is prepared to receive one message
sent to that address and stored in a single message buffer, pointed to by m.
 problem
 works fine as long as the server calls receive (tells the server's kernel which address
the server is using to put the incoming message ) before the client calls send.

Unbuffered message passing.


THE CLIENT-SERVER MODEL..
ISSUES
 Solution
 discard the message, let the client time out, and hope
the server has called receive before the client
retransmits.

 have the receiving kernel keep incoming messages around


for a little while, just in case an appropriate receive is
done shortly. Whenever an "unwanted" message arrives, a
timer is started. If the timer expires before a suitable
receive happens, the message is discarded.

 this method reduces the chance that a message will have


to be thrown away, it introduces the problem of storing
and managing prematurely arriving messages.
THE CLIENT-SERVER MODEL..
ISSUES
 Buffered Primitives
 way of dealing with the buffer management is to define a
new data structure called a mailbox.
 A process that is interested in receiving messages tells the
kernel to create a mailbox for it, and specifies an address to
look for in network packets. Henceforth, all incoming
messages with that address are put in the mailbox.
 receive now just removes one message from the mailbox

Buffered message passing.


THE CLIENT-SERVER MODEL..
ISSUES
 Reliable versus Unreliable Primitives.
 Reliable model should not lose messages
3 Approaches

1) redefine the semantics of send to be unreliable. (try its best)

2) kernel on the receiving machine should send an acknowledgement


back to the kernel on the sending machine.
only then sending kernel free the user (client).

Individually acknowledged messages.


THE CLIENT-SERVER MODEL..
ISSUES
 Reliable versus Unreliable Primitives.
3) the client is blocked after sending a message. The server's kernel
sends the reply itself as the acknowledgement. Thus the sender
remains blocked until the reply comes in.

 If it takes too long, the sending kernel can resend the request to
guard against the possibility of a lost message.
 an acknowledgement from the client's kernel to the server's kernel is
sometimes used.

Reply being used as the acknowledgement of the request.


THE CLIENT-SERVER MODEL..

 Implementing the Client-Server Model


THE CLIENT-SERVER MODEL..
ISSUES
 Implementing the Client-Server Model

Some examples of packet exchanges for client-server communication.


REMOTE PROCEDURE CALL
 Motivation
 the basic paradigm around which all
communication is built in client server is
input/output.

 No message passing or I/O at all is visible to


the programmer in RPC.
REMOTE PROCEDURE CALL..
 conventional (single machine) procedure call
count = read(fd, buf, nbytes);
REMOTE PROCEDURE CALL..
REMOTE PROCEDURE CALL..
a remote procedure call occurs in the following steps:
1. The client procedure calls the client stub in the normal
way.
2. The client stub builds a message and traps to the kernel.
3. The kernel sends the message to the remote kernel.
4. The remote kernel gives the message to the server stub.
5. The server stub unpacks the parameters and calls the
server.
6. The server does the work and returns the result to the stub.
7. The server stub packs it in a message and traps to the
kernel.
8. The remote kernel sends the message to the client's kernel.
9. The client's kernel gives the message to the client stub.
10. The stub unpacks the result and returns to the client.
REMOTE PROCEDURE CALL..
 Parameter Passing
 The function of the client stub is to take its parameters, pack them into a
message, and send it to the server stub.
 Packing parameters into a message is called parameter marshaling.
REMOTE PROCEDURE CALL..
 Parameter Passing
 in a large distributed system, it is common that multiple machine types
are present. Each machine often has its own representation for
numbers, characters, and other data items.
 how should information be represented in the messages?

 Solution
 to devise a network standard or canonical form for integers, characters,
booleans, floating-point numbers, and so on, and require all senders to
convert their internal representation to this form while marshaling.
 Sometimes unnecessary conversions required.

 Solution
 the client uses its own native format and indicates the format in the
first byte of the message.
REMOTE PROCEDURE CALL..
 Parameter Passing
 How are pointers passed?.

 call-by-reference has been replaced by


copy/restore.
 If the stubs know whether the buffer is an input
parameter or an output parameter to the
server, one of the copies can be eliminated,
since they need not be copied back to client.
REMOTE PROCEDURE CALL..
 Dynamic binding
 how the client locates the server?

 One method is just to hardwire the network address


of the server into the client.

 extremely inflexible if the server moves or if the


server is replicated or if the interface changes.

 Use dynamic binding to match up clients and


servers.
REMOTE PROCEDURE CALL..
 Dynamic binding
 1) server's formal specification.
REMOTE PROCEDURE CALL..
 Dynamic binding
 formal specification is given as input to the stub generator,
 It produces both the client stub and the server stub.
 Both are then put into the appropriate libraries.
 When a user (client) program calls any of the procedures
defined by this specification, the corresponding client stub
procedure is linked into its binary.
 when the server is compiled, the server stubs are linked
with it too.
 When the server begins executing, the call to initialize
outside the main loop exports the server interface means
the server sends a message to a program called a binder,
to make its existence known.(registering to server).
REMOTE PROCEDURE CALL..
 Dynamic binding
 When the client calls one of the remote procedures for the first
time, say, read, the client stub sees that it is not yet bound to a
server, so it sends a message to the binder asking to import
version 3.1 of the file server interface.
 The binder checks to see if one or more servers have already
exported an interface with this name and version number. If no
currently running server is willing to support this interface, the
read call fails.
 On the other hand, if a suitable server exists, the binder gives
its handle and unique identifier to the client stub. The client
stub uses the handle as the address to send the request message
to. The message contains the parameters and the unique
identifier, which the server's kernel uses to direct the incoming
message to the correct server in the event that several servers
are running on that machine.
REMOTE PROCEDURE CALL..
REMOTE PROCEDURE CALL..
 RPC Semantics in the Presence of Failures
 failures that can occur in RPC systems

1.The client is unable to locate the server.


2.The request message from the client to the
server is lost.
3.The reply message from the server to the
client is lost.
4.The server crashes after receiving a request.
5.The client crashes after sending a request
REMOTE PROCEDURE CALL..
 RPC Semantics in the Presence of Failures
 Client Cannot Locate the Server

 With the server each of the procedures returns a


value, with the code –1 conventionally used to
indicate failure.
 a global variable, errno, is also assigned a value
indicating the error type. In such a system, adding
a new error type "Cannot locate server" is simple

 solution is not general enough.


REMOTE PROCEDURE CALL..
 RPC Semantics in the Presence of Failures
 One possible candidate is to have the error raise an
exception.
 (e.g., Ada), - special procedures - invoked upon
specific errors, such as division by zero.
 In C, signal handlers - SIGNOSERVER, - allow it to
be handled in the same way as other signals.

 drawbacks
 not every language has exceptions or signals(PASCAL).
 having to write an exception or signal handler destroys
the transparency.
REMOTE PROCEDURE CALL..
 Lost Request Messages.
 kernel start a timer when sending the request. If
the timer expires before a reply or
acknowledgement comes back, the kernel sends the
message again.

 If message was truly lost, the server will not be


able to tell the difference between the
retransmission and the original.
REMOTE PROCEDURE CALL..
 Lost Reply messages
 Rely on timers
 If no reply is received within a reasonable period,
just send the request once more.
 client's kernel is not really sure why there was no
answer.
 some operations can safely be repeated as often as
necessary with no damage being done (idempotent)
eg request for some memory
 Non idempotent – money transfer
REMOTE PROCEDURE CALL..
 Lost Reply messages

 Solution

 try to structure all requests in an idem-potent way.


 have the client's kernel assign each request a
sequence number.
 to have a bit in the message header that is used to
distinguish initial requests from retransmissions.
REMOTE PROCEDURE CALL..
 Server Crashes

Actions to be taken
(b) the system has to report failure back to the client (e.g., raise an
exception)

(c)Client can just retransmit the request.


the client's kernel cannot cant differentiate and it knows is that its
timer has expired.
REMOTE PROCEDURE CALL..
 Server Crashes
 Solution
 1)to wait until the server reboots (or rebinds to a new server) and
try the operation again.
 This technique is called at least once semantics and guarantees
that the RPC has been carried out at least one time

 2)gives up immediately and reports back failure. This way is called


at most once semantics and guarantees that the rpc has been
carried out at most one time.

 3) is to guarantee nothing. When a server crashes, the client gets


no help and no promises. The RPC may have been carried out
anywhere from 0 to a large number of times
 it is easy to implement.
REMOTE PROCEDURE CALL..
 Client Crashes
 a computation is active and no parent is waiting for
the result. Such an unwanted computation is called
an orphan.

 Result
 they waste CPU cycles.
 They can also lock files or otherwise tie up valuable
resources.
 if the client reboots and does the RPC again, but the
reply from the orphan comes back immediately
afterward, confusion can result.
REMOTE PROCEDURE CALL..
 Client Crashes
 Solutions
 1) extermination
 before a client stub sends an RPC message, it makes a log entry
telling what it is about to do which is kept on disk(medium that
survives crashes).
 After a reboot, the log is checked and the orphan is explicitly
killed off.

 Disadvantage
 the horrendous expense of writing a disk record for every RPC.
 the network may be partitioned, due to a failed gateway,
making it impossible to kill them, even if they can be located.
REMOTE PROCEDURE CALL..
 Client Crashes
 Solutions
 2) reincarnation
 divide time up into sequentially numbered epochs.
When a client reboots, it broadcasts a message to
all machines declaring the start of a new epoch.
 When such a broadcast comes in, all remote
computations are killed.
 if the network is partitioned, some orphans may
survive. However, when they report back, their
replies will contain an obsolete epoch number,
making them easy to detect.
REMOTE PROCEDURE CALL..
 Client Crashes
 Solutions
 3) gentle reincarnation

 When an epoch broadcast comes in, each machine


checks to see if it has any remote computations,
and if so, tries to locate their owner.

 Only if the owner cannot be found is the


computation killed.
REMOTE PROCEDURE CALL..
 Client Crashes
 Solutions
 4) expiration
 each RPC is given a standard amount of time, T, to
do the job. If it cannot finish, it must explicitly ask
for another quantum, which is a nuisance.

 if after a crash the server waits a time T before


rebooting, all orphans are sure to be gone. The
problem to be solved here is choosing a reasonable
value of T in the face of RPCs with wildly differing
requirements.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 I) RPC Protocols.
 1)connection-oriented protocol or a connectionless protocol. ?
 Advantages of connection.
 communication becomes much easier.
 When a kernel sends a message, it does not have to worry
about it getting lost, nor does it have to deal with
acknowledgements. All that is handled at a lower level, by
the software that supports the connection.

 Disadvantage
 performance loss.
 All that extra software gets in the way. Besides, the main
advantage (no lost packets) is hardly needed on a LAN.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 RPC Protocols.

 2)whether to use a standard general-purpose


protocol or one specifically designed for RPC?

 there are no standards in this area, using a custom


RPC protocol often means designing your own
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 RPC Protocols.

 Some distributed systems use IP (or UDP, which is


built on IP) as the basic protocol.
 1.The protocol is already designed, saving
considerable work.
 2. Many implementations are available, again saving
work.
 3.These packets can be sent and received by nearly
all UNIX systems.
 4. IP and UDP packets are supported by many
existing networks.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 RPC Protocols.

 Disadavantages of IP – performance.
 IP was not designed as an end-user protocol.
 Packet switching is not required in LAN.
 Too much overhead on header of the packet.

 3) packet and message length.


 It is important that the protocol and network allow
large transmissions.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 II) Acknowledgements
 Should individual packets be acknowledged or not?
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 II) Acknowledgements
Stop and wait protocol Blast protocol

1. Error control
easy to implement requires more administration

2. Flow control
overrun errors are impossible receiver overrun is a possibility
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 III) Critical Path
 The sequence of instructions that is executed on
every RPC.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 III) Critical Path
 As per Schroeder and Burrows (1990)

 Considering the following assumptions


 Firefly is a multiprocessor, with five VAX CPUs and
UDP .
 the kernel and user share the same address space,
eliminating the need for context switches and for
copying between kernel and user spaces .
 the entire RPC system has been carefully coded in
assembly language and hand optimized.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 III) Critical Path
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 IV Copying
 The number of times a message must be copied varies
from one to about eight, depending on the hardware,
software, and type of call.
 Best case
 the network chip can DMA the message directly out of the
client stub's address space onto the network (copy 1),
depositing it in the server kernel's memory in real time

 Then the kernel inspects the packet and maps the page
containing it into the server's address space. If this type
of mapping is not possible, the kernel copies the packet
to the server stub (copy 2).
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 IV Copying
 Worst case
 the kernel copies the message from the client stub into a kernel buffer for
subsequent transmission, either because it is not convenient to transmit
directly from user space or the network is currently busy (copy 1).

 the kernel copies the message, in software, to a hardware buffer on the


network interface board (copy 2).

 At this point, the hardware is started, causing the packet to be moved over the
network to the interface board on the destination machine (copy 3).

 When the packet-arrived interrupt occurs on the server's machine, the kernel
copies it to a kernel buffer, probably because it cannot tell where to put it
until it has examined it, which is not possible until it has extracted it from the
hardware buffer (copy 4).

 the message has to be copied to the server stub (copy 5).


REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 V Timer Management
 most protocols set a timer whenever a message is
sent and an answer (reply or acknowledgement) is
expected.

 requires building a data structure (sorted on time)


specifying when the timer is to expire and what is
to be done when that happens.

 When an acknowledgement or reply arrives before


the timer expires, the timeout entry must be
located and removed from the list.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 V Timer Management
 very few timers actually expire, so most of the work of
entering and removing a timer from the list is wasted
effort.

 timers need not be especially accurate. The timeout


value chosen is usually a wild guess in the first place ("a
few seconds sounds about right"). Besides, using a poor
value does not affect the correctness of the protocol,
only the performance. Too low a value will cause timers
to expire too often, resulting in unnecessary
retransmissions. Too high a value will cause a needlessly
long delay in the event that a packet is actually lost.
REMOTE PROCEDURE CALL..
IMPLEMENTATION ISSUES
 V Timer Management

The kernel scans the entire process table, checking each timer value
against the current time. Any nonzero value that is less than or equal to
the current time corresponds to an expired timer, which is then processed
and reset.
REMOTE PROCEDURE CALL..

 Problem Areas
 allowing local procedures unconstrained access to remote global
variables, and vice versa, cannot be implemented, yet prohibiting
this access violates the transparency principle (that programs
should not have to act differently due to RPC).

 In a strongly-typed language, like Pascal, the compiler, and thus


the stub procedure, knows everything there is to know about all
the parameters. This knowledge allows the stub to marshal the
parameters without difficulty. In C, however, it is perfectly legal
to write a procedure that computes the inner product of two
vectors (arrays), without specifying how large either one is. Each
could be terminated by a special value known only to the calling
and called procedure. Under these circumstances, it is essentially
impossible for the client stub to marshal the parameters: it has no
way of determining how large they are.
GROUP COMMUNICATION.
 RPC cannot handle communication from one
sender to many receivers, other than by
performing separate RPCs with each one.

 A group is a collection of processes that act


together in some system or user-specified way
GROUP COMMUNICATION.
 The purpose of introducing groups is to allow
processes to deal with collections of processes
as a single abstraction.

 How group communication is implemented


depends to a large extent on the hardware.
 multicasting.
special network address to which multiple
machines can listen. When a packet is sent to
one of these addresses, it is automatically
delivered to all machines listening to the
address.
GROUP COMMUNICATION.
 Broadcasting
packets containing a certain address (e.g., 0)
are delivered to all machines.

 group communication can still be implemented


by having the sender transmit separate
packets to each of the members of the group.
For a group with n members, n packets are
required, instead of one packet when either
multicasting or broadcasting is used.
GROUP COMMUNICATION.
 Design Issues

Group communication has many of the same


design possibilities as regular message passing.
Closed Groups versus Open Groups
GROUP COMMUNICATION.
DES IGN ISS UES

depending on who can send to whom.


Closed Groups versus Open Groups
GROUP COMMUNICATION.
DES IGN ISS UES

 Closed groups are typically used for parallel


processing.
 They have their own goal and do not interact
with the outside world.

 it is important that processes that are not


members (clients) can send to the
group(open) to support replicated servers.
Peer Groups versus Hierarchical Groups
GROUP COMMUNICATION.
DES IGN ISS UES

distinction based on the internal structure of the


group.

Peer groups
 all the processes are equal.
 All decisions are made collectively.
Peer Groups versus Hierarchical Groups
GROUP COMMUNICATION.
DES IGN ISS UES

Advantage
 symmetric and has no single point of failure

Disadvantage
 decision making is more complicated. To
decide anything, a vote has to be taken,
incurring some delay and overhead.
Peer Groups versus Hierarchical Groups
GROUP COMMUNICATION.
DES IGN ISS UES

Hierarchical Groups
 one process is the coordinator and all the
others are workers.
 The coordinator then decides which worker is
best suited to carry it out.
Peer Groups versus Hierarchical Groups
GROUP COMMUNICATION.
DES IGN ISS UES

Advantage
 as long as it is running, it can make decisions
without bothering everyone else.

Disadvantage
 Loss of the coordinator brings the entire group

to a grinding halt
Group Membership GROUP COMMUNICATION.
DES IGN ISS UES

 I)The group server maintains a complete data


base of all the groups and their exact
membership.

 advantage
 is straightforward, efficient, and easy to
implement.

 Disadvantage
 a single point of failure at the group server.
Group Membership GROUP COMMUNICATION.
DES IGN ISS UES

 II) manage group membership in a distributed way. In an open


group, an outsider can send a message to all group members
announcing its presence.

 In a closed group, something similar is needed (in effect, even


closed groups have to be open with respect to joining). To
leave a group, a member just sends a goodbye message to
everyone.

 Issues
 1) if a member crashes, it leaves the group. The other members
have to discover this experimentally by noticing that the
crashed member no longer responds to anything.

 2)leaving and joining have to be synchronous with messages


being sent.
Group Addressing GROUP COMMUNICATION.

I)
DES IGN ISS UES
Group Addressing GROUP COMMUNICATION.
DES IGN ISS UES

II) Sender provides an explicit list of all


destinations (e.g., IP addresses).

 it is not transparent. Furthermore, whenever


group membership changes, the user processes
must update their membership lists.
Group Addressing GROUP COMMUNICATION.
DES IGN ISS UES

III) predicate addressing


 each message is sent to all members of the group

 Each message contains a predicate (Boolean


expression- machine number, its local variables,
or other factors) to be evaluated.

 If the predicate evaluates to TRUE, the message


is accepted. If it evaluates to FALSE, the message
is discarded.
Send and Receive Primitives GROUP COMMUNICATION.
DES IGN ISS UES

 To send a message, one of the parameters of send


indicates the destination.
 If it is a process address, a single message is sent to
that one process.
 If it is a group address (or a pointer to a list of
destinations), a message is sent to all members of
the group.
 A second parameter to send points to the message
 The call can be buffered or unbuffered, blocking or
nonblocking, reliable or not reliable, for both the
point-to-point and group cases
Send and Receive PrimitivesGROUP COMMUNICATION.
DES IGN ISS UES

 receive indicates a willingness to accept a


message, and possibly blocks until one is available.
 If the two forms of communication are merged,
receive completes when either a point-to-point
message or a group message arrives.

 some systems introduce new library procedures,


say, group_send and group_receive, so a process
can indicate whether it wants a point-to-point or a
group message.
Atomicity or atomic broadcast
GROUP COMMUNICATION.
DES IGN ISS UES

 all-or-nothing delivery.

 system should guarantee that every message is


delivered to all the members of the group, or if
that is not possible, that it is not delivered to
any, and that failure is reported back to the
sender so it can take appropriate action to
recover.

 receiver overrun – handled by acknowledgement


Atomicity or atomic broadcast
 Possible algorithm
GROUP COMMUNICATION.
DES IGN ISS UES

 The sender starts out by sending a message to all members of the group.

 Timers are set and retransmissions sent where necessary.

 When a process receives a message, if it has not yet seen this particular
message, it, too, sends the message to all members of the group (again
with timers and retransmissions if necessary).

 If it has already seen the message, this step is not necessary and the
message is discarded.

 No matter how many machines crash or how many packets are lost,
eventually all the surviving processes will get the message
Message Ordering GROUP COMMUNICATION.
DES IGN ISS UES

If processes 0 and 4 are both trying to update the same record in a data
base, 1 and 3 end up with different final values due to the order of
messages sent.
Message Ordering GROUP COMMUNICATION.
DES IGN ISS UES

 The best guarantee is to have all messages delivered


instantaneously and in the order in which they were
sent.

 Absolute time ordering is not always easy to


implement.

 consistent time ordering, in which if two messages,


say A and B, are sent close together in time, the
system picks one of them as being "first" and delivers
it to all group members, followed by the other.
Message ordering in Overlapping Groups
GROUP COMMUNICATION.
DES IGN ISS UES

Even with global time ordering within each group,


there is not necessarily any coordination among
multiple groups.
Scalability GROUP COMMUNICATION.
DES IGN ISS UES

 sophisticated algorithm involving keeping track of previous packets is


required to avoid exponential growth in the number of packets
multicast.

 some methods of group communication take advantage of the fact that

only one packet can be on a LAN at any instant. With gateways and
multiple networks, it is possible for two packets to be "on the wire"
simultaneously.

 some algorithms may not scale well due to their computational


complexity, their use of centralized components, or other factors.
Group Communication in ISIS GROUP COMMUNICATION.

 developed at Cornell (Birman, 1993; Birman and Joseph,


1987a, 1987b; and Birman and Van Renesse, 1994).

 is a toolkit for building distributed applications.

 widely described in the literature and has been used for


numerous real applications.

 The key idea in ISIS is synchrony and the key


communication primitives are different forms of atomic
broadcast.
Group Communication in ISIS GROUP COMMUNICATION.

 A synchronous system is one in which events happen strictly


sequentially.
 A loosely synchronous system is one like that of in which events
take a finite amount of time but all events appear in the same
order to all parties.
Group Communication in ISIS GROUP COMMUNICATION.

 two events are said to be causally related if the nature


or behavior of the second one might have been
influenced in any way by the first one
 A->B->C then B->C causally related

 Two events that are unrelated are said to be concurrent.


 A->B C->D – concurrent

 virtual synchrony really means is that if two messages are


causally related, all processes must receive them in the
same (correct) order. If, however, they are concurrent,
no guarantees are made.
Communication Primitives in ISIS GROUP COMMUNICATION.

 ABCAST provides loosely synchronous communication and is used for


transmitting data to the members of a group.

 CBCAST provides virtually synchronous communication and is also used


for sending data.

 GBCAST is somewhat like ABCAST, except that it is used for managing


group membership rather than for sending ordinary data.
Communication Primitives in ISIS GROUP COMMUNICATION.

 ABCAST used a form of two-phase commit protocol.


 The sender, A, assigned a timestamp (actually just a sequence number)
to the message and sent it to all the group members (by explicitly
naming them all).
 Each one picked its own timestamp, larger than any other time-stamp
number it had sent or received, and sent it back to A.
 When all of these arrived, A chose the largest one and sent a Commit
message to all the members again containing it.
 Committed messages were delivered to the application programs in
order of the timestamps.
 This protocol guarantees that all messages will be delivered to all
processes in the same order.

 this protocol is complex and expensive.


Communication Primitives in ISIS
GROUP COMMUNICATION.

The ISIS designers invented the CBCAST primitive,


which guarantees ordered delivery only for messages
that are causally related.

You might also like