Processes and Processors in Distributed Systems Distributed File Systems
Processes and Processors in Distributed Systems Distributed File Systems
Processes and Processors in Distributed Systems Distributed File Systems
Systems
Distributed File Systems
Distributed File Systems
The file service is the specification of what the file system offers to its clients, describes the primitives
available, what parameters they take, and what actions they perform.
the file service specifies the file system's interface to the clients.
A file server is a process that runs on some machine and helps implement the file service.
they should not know number, location or function of each file servers. When a particular service is
called, results should be generated without even knowing that the system is distributed.
File service Directory service
A files can have attributes(owner, size, creation date, and access permissions.), which are pieces of information about
the file but which are not part of the file itself and primitives to read and write some of the attributes.
the only file operations are CREATE and READ. Once a file has been created, it cannot be changed(immutable) which
makes it much easier to support file caching and replication.
The File Service Interface.
With capabilities, each user has a kind of ticket, called a capability, for each object to which it has access.
The capability specifies which kinds of accesses are permitted
All access control list schemes associate with each file a list of users who may access the file and how.
The File Service Interface.
read transfers an entire file from one of the file servers to the requesting client.
Write transfers an entire file the other way, from client to server. The files can be stored in memory or on a local disk, as
needed.
The File Service Interface.
1) upload/download model or a remote access model.
Advantages
conceptual simplicity.
whole file transfer is highly efficient.
Disadvantages
However, enough storage must be available on the client to store all the files required.
if only a fraction of a file is needed, moving the entire file is wasteful.
The File Service Interface.
defines some alphabet and syntax for composing file (and directory) names.
All distributed systems allow directories to contain subdirectories, to make it possible for users to group related files together,
leading to a tree of directories, often called a hierarchical file system.
The Directory Server Interface
A system in which different clients may have different views of the file system is flexible and straightforward to
implement, but it has the disadvantage of not making the entire system behave like a single old-fashioned
timesharing system.
In a timesharing system, the file system looks the same to any process(All clients have the same view of the file
system)
The Directory Server Interface
Naming Transparency.
The principal problem with this form of naming is that it is not fully transparent.
1) location transparency - the path name gives no hint as to where the file (or other object) is located.
A path like /server1/dir1/dir2/x tells everyone that x is located on server 1, but it does not tell where that server is located.
The server is free to move anywhere it wants to in the network without the path name having to be changed.
Location independence is not easy to achieve, but it is a desirable property to have in a distributed system .
The Directory Server Interface
Naming Transparency.
there are three common approaches to file and directory naming in a distributed system:
Files (and other objects) have symbolic names such as prog.c, for use by people, and also internal, binary names for
use by the system itself. directories provide a mapping between these two naming levels.
A more general naming scheme is to have the binary name indicate both a server and a specific file on that server,
alternatively a symbolic link (directory entry that maps onto a (server, file name) string, which can be looked up
on the server named to find the binary name.
The Directory Server Interface
Two-Level Naming.
another way is to use capabilities as the binary names.Looking up an ASCII name yields a capability.
A more general naming scheme is to have the binary name indicate both a server and a specific file on that
server, alternatively a symbolic link (directory entry that maps onto a (server, file name) string, which can
be looked up on the server named to find the binary name.
The Directory Server Interface
Semantics of File Sharing.
Four ways of dealing with the shared files in a distributed system.
File Usage.
Server caching has no effect on the file system semantics seen by the clients.
Client caching, offers better performance at the price of increased complexity and possibly fuzzier semantics.
Replication
multiple copies of selected files are maintained, with each copy on a separate file server.
Reasons
when a server reboots after a crash, a check can be made to see if any updates were in progress at the
time of the crash. If so, they can still be carried out. Sooner or later, all the secondaries will be updated.
Update Protocols
Disadvantage
if the primary is down, no updates can be performed.
Solution
Gifford proposed Voting algorithm.
The basic idea is to require clients to request and acquire the permission of multiple servers before either reading or writing a replicated
file.
Update Protocols
Voting algorithm.
suppose that a file is replicated on N servers.
to update a file, a client must first contact at least half the servers plus 1 (a majority) and get them to agree to do the update.
Once they have agreed, the file is changed and a new version number is associated with the new file. The version number is used to
identify the version of the file and is the same for all the newly updated files.
Update Protocols
Voting algorithm(contd).
To read a replicated file, a client must also contact at least half the servers plus 1 and ask them to send the version
numbers associated with the file.
If all the version numbers agree, this must be the most recent version because an attempt to update only the
remaining servers would fail because there are not enough of them.
Update Protocols
To modify a file, a write quorum of at least Nw servers is required. The values of Nr and Nw are subject to the constraint that Nr+Nw>N.
Only after the appropriate number of servers has agreed to participate can a file be read or written.
Update Protocols
NFS allows every machine to be both a client and a server at the same time.
Each NFS server exports one or more of its directories for access by remote clients, so these directories can be exported automatically whenever the server is booted.
Thus the basic architectural characteristic of NFS is that servers export directories and clients mount them remotely.
The shared files are just there in the directory hierarchy of multiple machines and can be read and written the usual way.
An Example: Sun's Network File System
NFS Protocols.
1) handles mounting.
A client can send a path name to a server and request permission to mount that directory somewhere in its directory hierarchy.
If the path name is legal and the directory specified has been exported, the server returns a file handle(fields uniquely identifying the file
system type, the disk, the i-node number of the directory, and security information.) to the client.
Subsequent calls to read and write files in the mounted directory use the file handle.
An Example: Sun's Network File System
NFS Protocols.
1) handles mounting.
A client can send a path name to a server and request permission to mount that directory somewhere in its directory hierarchy.
If the path name is legal and the directory specified has been exported, the server returns a file handle(fields uniquely identifying the file system type, the disk, the i-node number of the directory, and
security information.) to the client.
Subsequent calls to read and write files in the mounted directory use the file handle.
If the user does not even need that server at the moment, all that work is wasted.
By allowing the client to try a set of servers in parallel, a degree of fault tolerance can be achieved (because only one of them need to be up), and the
performance can be improved.
An Example: Sun's Network File System
NFS Protocols.
2) for directory and file access.
Clients can send messages to servers to manipulate directories and to read and write files. In addition, they can also
access file attributes, such as file mode, size, and time of last modification.
Most UNIX system calls supported by NFS, with the exception of OPEN and CLOSE.
An Example: Sun's Network File System
NFS Protocols.
2) for directory and file access.
To read a file, a client sends the server a message containing the file name, with a request to look it up and return a file handle,
which is a structure that identifies the file.
Unlike an OPEN call, this LOOKUP operation does not copy any information into internal system tables.
The READ call contains the file handle of the file to read, the offset in the file to begin reading, and the number of bytes desired
An Example: Sun's Network File System
NFS Protocols.
2) for directory and file access.
advantage
the server does not have to remember anything about open connections in between calls to it.
Thus if a server crashes and then recovers, no information about open files is lost, because there is none.
An Example: Sun's Network File System
NFS Implementation
An Example: Sun's Network File System
NFS Implementation
1) system call layer. - handles calls like OPEN, READ, and
CLOSE. After parsing the call and checking the parameters, it
invokes the second layer
2) virtual file system (VFS) layer - maintain a table with one entry
for each open file, analogous to the table of i-nodes for open files
in UNIX.
the VFS layer has an entry, called a v-node (virtual i-node), for
every open file. V-nodes are used to tell whether the file is local
or remote.
The kernel then constructs a v-node for the remote directory and
asks the NFS client code to create an r-node (remote i-node) in
its internal tables to hold the file handle. The v-node points to the
r-node. Each v-node in the VFS layer will ultimately contain either
a pointer to an r-node in the NFS client code, or a pointer to an i-
node in the local operating system.
An Example: Sun's Network File System
NFS Implementation
When a remote file is opened, at some point during the parsing of the path
name, the kernel hits the directory on which the remote file system is
mounted. It sees that this directory is remote and in the directory's v-node
finds the pointer to the r-node. It then asks the NFS client code to open the
file. The NFS client code looks up the remaining portion of the path name on
the remote server associated with the mounted directory and gets back a file
handle for it. It makes an r-node for the remote file in its tables and reports
back to the VFS layer, which puts in its tables a v-node for the file that
points to the r-node. Again here we see that every open file or directory has
a v-node that points to either an r-node or an i-node.
The caller is given a file descriptor for the remote file. This file descriptor is
mapped onto the v-node by tables in the VFS layer.
When a file handle is sent to it for file access, it checks the handle, and if it
is valid, uses it. Validation can include verifying an authentication key
contained in the RPC headers, if security is enabled..
An Example: Sun's Network File System
NFS Implementation
When the file descriptor is used in a subsequent system call, for example,
read, the VFS layer locates the corresponding v-node, and from that
determines whether it is local or remote and also which i-node or r-node
describes it
When a remote file is opened, at some point during the parsing of the path
name, the kernel hits the directory on which the remote file system is
mounted. It sees that this directory is remote and in the directory's v-node
finds the pointer to the r-node. It then asks the NFS client code to open the
file. The NFS client code looks up the remaining portion of the path name on
the remote server associated with the mounted directory and gets back a file
handle for it. It makes an r-node for the remote file in its tables and reports
back to the VFS layer, which puts in its tables a v-node for the file that
points to the r-node. Again here we see that every open file or directory has
a v-node that points to either an r-node or an i-node.
The caller is given a file descriptor for the remote file. This file descriptor is
mapped onto the v-node by tables in the VFS layer.
An Example: Sun's Network File System
NFS Implementation
When a file handle is sent to it for file access, it checks the handle, and if it
is valid, uses it. Validation can include verifying an authentication key
contained in the RPC headers, if security is enabled. When the file
descriptor is used in a subsequent system call, for example, read, the VFS
layer locates the corresponding v-node, and from that determines whether it
is local or remote and also which i-node or r-node describes it.
Transfers between client and server are done in large chunks, normally
8192 bytes, even if fewer bytes are requested. After the client's VFS layer
has gotten the 8K chunk it needs, it automatically issues a request for the
next chunk, so it will have it should it be needed shortly. This feature, known
as read ahead, improves performance considerably. For writes an
analogous policy is followed.
If a WRITE system call supplies fewer than 8192 bytes of data, the data are
just accumulated locally. Only when the entire 8K chunk is full is it sent to
the server. However, when a file is closed, all of its data are sent to the
server immediately.
An Example: Sun's Network File System
Lessons Learned
Satyanarayanan (1990b) has stated following general principles
that he believes distributed file system designers should follow