Distributed Database System For Exams
Distributed Database System For Exams
Database systems that run over multiprocessor systems are called parallel database systems
physical distribution does not necessarily imply that the computer systems be geographically far apart; they could
actually be in the same room. It simply implies that the communication between them is done over a network instead
of through shared memory or shared disk (as would be the case with multiprocessor systems), with the network as the
only
shared resource.
The database is centrally managed by one computer system and all the requests are routed to that site. The only
additional consideration has to do with transmission delays.
It is obvious that the existence of a computer network or a collection of files is not sufficient to form a distributed
database system. What we are interested in is an environment where data are distributed among a number of sites
Data Delivery Alternatives
In distributed databases, data are delivered from the sites where they are stored to where the query is posed.
data delivery alternatives along three orthogonal dimensions:
Delivery modes, Frequency and communication methods
delivery modes, - The alternative delivery modes are pull-only, push-only and hybrid.
In the pull-only mode of data delivery, the transfer of data from servers to clients is initiated by a client pull.
In the push-only mode of data delivery, the transfer of data from servers to clients is initiated by a server push
hybrid mode of data delivery combines the client-pull and server-push mechanisms.
frequency measurements that can be used to classify the regularity of data delivery. They are periodic, conditional,
and ad-hoc or irregular.
In periodic delivery, data are sent from the server to clients at regular intervals. Periodic delivery is carried out on a
regular and pre-specified repeating schedule.
conditional delivery, data are sent from servers whenever certain conditions installed by clients in their profiles are
satisfied.
Ad-hoc delivery is irregular and is performed mostly in a pure pull-based system. Data are pulled from servers to
clients in an ad-hoc fashion whenever clients request it.
communication methods: These methods determine the various ways in which servers and clients communicate for
delivering information to clients. The alternatives are unicast and one-to-many.
In unicast, the communication from a server to a client is one-to-one: the server sends data to one client using a
particular delivery mode with some frequency.
In one-to-many, as the name implies, the server sends data to a number of clients.
Classification of DCS
1.
2.
3.
4.
Interconnection of structure
Interdependence of components
Synchronization
Promises of DDBS
Data that are commonly accessed by one user can be placed on that users local machine as well as on the machine of
another user with the same access requirements. Furthermore, if one of the machines fails, a copy of the data are still
available on another machine on the network.
Replication transparency ensures that replication of databases are hidden from the users. Whenever a user updates a
data item, the update is reflected in all the copies of the table. However, this operation should not be known to the
user.
Fragmentation Transparency
fragmentation transparency makes the user unaware that data is fragmented. it hides the fact that the table the
user is querying on is actually a fragment or union of some fragments. It also conceals the fact that the fragments are
located at diverse sites.
fragmentation can reduce the negative effects of replication.
Each replica is not the full relation but only a subset of it;
Horizontal Fragmentation divides the relation into tuples called rows.
Vertical fragmentation divides the relation into attributes called columns.
Who Should Provide Transparency?
It is possible to identify three distinct layers at which the transparency services can be provided.
1. Compiler
2. The second layer at which transparency can be provided is the operating system
level.
3. The second layer at which transparency can be provided is the operating system
level. It is the responsibility of the DBMS to make all the necessary translations from the operating system to
the
higher-level user interface.
data may be unreachable, but with proper care, users may be permitted to access other parts of the distributed
database. The proper carecomes in the form of support for distributed transactions and application protocols.
distributed DBMS to ensure that their requests will be executed correctly no matter what happens in the system.
Correctly means that user applications do not need to be concerned with coordinating their accesses to individual
local databases nor do they need to worry about the possibility of site or communication link failures during the
execution of their transactions.
Improved Performance
Since each site handles only a portion of the database, contention for CPU and I/O services is not as severe as for
centralized databases.
Localization reduces remote access delays that are usually involved in wide
area networks
reduced communication overhead can be obtained only by a proper fragmentation and distribution of the database
parallelism of distributed systems
Inter-query parallelism results from the ability to execute multiple queries at the same time
intra-query parallelism is achieved by breaking up a single query into a number of subqueries each of which is
executed at a different site, accessing a different part of the distributed database.
Design Issues
Distributed Database design
The two fundamental design issues are fragmentation,
the separation of the database into partitions called fragments,
and distribution, the optimum distribution of fragments.
Query Processing
Query processing deals with designing algorithms that analyze queries and convert them into a series of data
manipulation operations
Concurrency Control
Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of
the database is maintained.
Deadlock Management
Deadlock is a state of a database system having two or more transactions, when each
transaction is waiting for a data item that is being locked by some other transaction.
(A2, D2, H1) which represents a (peer-to-peer) distributed, heterogeneous multidatabase system.
Architectural Alternatives
Super key
Candidate key
Primary key
Composite key
Secondary /
Alternative key
Non key attribute
Key that consist of two or more attributes that uniquely identify an entity
occurance is called Composite key.
The candidate key which are not selected for primary key are known as
secondary keys or alternative keys
Non-key attributes are attributes other than candidate key attributes in
a table.
Relation Instance
A relation is an instance of relation schema
Normalization
The aim of normalization is to eliminate various anomalies of a relation in order to obtain better relations.
Deletion anomaly: Delete Anomaly exists when certain attributes are lost because of the deletion of
other attributes
Update anomaly: Update Anomaly exists when one or more instances of duplicated data is updated,
Furthermore, if the primary key is made up of several attributes, every non-key attribute shall
depend on the entire set and not part of it.
For example, the primary key of the OrderDetails table comprising orderID and productID.
If unitPrice is dependent only on productID, it shall not be kept in the OrderDetails table (but
in theProducts table). On the other hand, if the unitPrice is dependent on the product as well as
the particular order, then it shall be kept in the OrderDetails table.
Third Normal Form (3NF):
A table is 3NF, if it is
2NF and the
Remove Transitive dependency (one non-key attribute are dependent on other non-key attribute)
Integrity Rules
You should also apply the integrity rules to check the integrity of your design:
Entity Integrity Rule: The primary key cannot contain NULL. Otherwise, it cannot uniquely identify
the row. For composite key made up of several columns, none of the column can contain NULL.
Referential Integrity Rule: Each foreign key value must be matched to a primary key value in the
table referenced (or parent table).
You can insert a row with a foreign key in the child table only if the value exists in the parent
table.
If the value of the key changes in the parent table (e.g., the row updated or deleted), all rows
with this foreign key in the child table(s) must be handled accordingly. You could either (a)
disallow the changes; (b) cascade the change (or delete the records) in the child tables
accordingly; (c) set the key value in the child tables to NULL.
Normalization Issues
reduced database performance. when a query or transaction request is sent to the database, there are
factors involved, such as CPU usage, memory usage, and input/output (I/O). normalized database requires
much more CPU, memory, and I/O to process transactions and database queries than does a
denormalized database.
Processing time may increase due to joins
It selects tuples that satisfy the given predicate from a relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use relational
operators like =, , , < , >, .
Project
It projects columns that satisfy a given predicate.
duplicate tuples may be deleted from the result relation.
Union
The union of two relations R and S (denoted as R S) is the set of all tuples that are in R, or in S, or in both.
the duplicate tuples are normally eliminated.
Set Difference
The set difference of two relations R and S (R S) is the set of all tuples that are in R but not in S. In this case,
not only should R and S be union compatible, but the operation is also asymmetric (i.e., R S 6= S R).
author (Books) author (Articles)
Cartesian Product
Combines information of two different relations into one. where each result tuple is a
concatenation of one tuple of R with one tuple of S, for all tuples of R and S. The Cartesian product of R and S
is denoted as RS.
author = 'Cohen'(Books Articles)
Rename Operation
The rename operation allows us to rename the output relation.
x E
Where the result of expression E is saved with name of x.
Intersection.
Intersection of two relations R and S (R S) consists of the set of all tuples that are in both R and S. In terms
of the basic operators, it can be specified as follows:
R S = R (R S)
Theta () Join
A theta-join is any Cartesian product that's filtered by a condition which compares values from
both Tables.
<Table_1.Column> relator <Table_2.Column>
Sellers.seller_name = Sales.seller_name
Example:
Student
Subjects
SID
Name
Std
101
Alex
10
102
Maria
11
STUDENT
Class
Subject
10
Math
10
English
11
Music
11
Sports
Student.Std = Subject.Class
SUBJECT
Student_detail
SID
Name
Std
Class
Subject
101
Alex
10
10
Math
101
Alex
10
10
English
102
Maria
11
11
Music
102
Maria
11
11
Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin.
The above example corresponds to equijoin.
b.