DBMS Unit-Ii
DBMS Unit-Ii
AND CALCULUS
Stand firm in your refusal to remain conscious during algebra. In real life, I assure
you, there is no such thing as algebra.
This chapter presents two formal query languages associated with the relational model.
Query languages are specialized languages for asking questions, or queries, that in-
volve the data in a database. After covering some preliminaries in Section 4.1, we
discuss relational algebra in Section 4.2. Queries in relational algebra are composed
using a collection of operators, and each query describes a step-by-step procedure for
computing the desired answer; that is, queries are specified in an operational manner.
In Section 4.3 we discuss relational calculus, in which a query describes the desired
answer without specifying how the answer is to be computed; this nonprocedural style
of querying is called declarative. We will usually refer to relational algebra and rela-
tional calculus as algebra and calculus, respectively. We compare the expressive power
of algebra and calculus in Section 4.4. These formal query languages have greatly
influenced commercial query languages such as SQL, which we will discuss in later
chapters.
4.1 PRELIMINARIES
We begin by clarifying some important points about relational queries. The inputs and
outputs of a query are relations. A query is evaluated using instances of each input
relation and it produces an instance of the output relation. In Section 3.4, we used
field names to refer to fields because this notation makes queries more readable. An
alternative is to always list the fields of a given relation in the same order and to refer
to fields by position rather than by field name.
91
92 Chapter 4
Due to these considerations, we use the positional notation to formally define relational
algebra and calculus. We also introduce simple conventions that allow intermediate
relations to ‘inherit’ field names, for convenience.
The key fields are underlined, and the domain of each field is listed after the field
name. Thus sid is the key for Sailors, bid is the key for Boats, and all three fields
together form the key for Reserves. Fields in an instance of one of these relations will
be referred to by name, or positionally, using the order in which they are listed above.
In several examples illustrating the relational algebra operators, we will use the in-
stances S1 and S2 (of Sailors) and R1 (of Reserves) shown in Figures 4.1, 4.2, and 4.3,
respectively.
Relational algebra is one of the two formal query languages associated with the re-
lational model. Queries in algebra are composed using a collection of operators. A
fundamental property is that every operator in the algebra accepts (one or two) rela-
tion instances as arguments and returns a relation instance as the result. This property
makes it easy to compose operators to form a complex query—a relational algebra
expression is recursively defined to be a relation, a unary algebra operator applied
Relational Algebra and Calculus 93
Each relational query describes a step-by-step procedure for computing the desired
answer, based on the order in which operators are applied in the query. The procedural
nature of the algebra allows us to think of an algebra expression as a recipe, or a
plan, for evaluating a query, and relational systems in fact use algebra expressions to
represent query evaluation plans.
Relational algebra includes operators to select rows from a relation (σ) and to project
columns (π). These operations allow us to manipulate data in a single relation. Con-
sider the instance of the Sailors relation shown in Figure 4.2, denoted as S2. We can
retrieve rows corresponding to expert sailors by using the σ operator. The expression
σrating>8 (S2)
evaluates to the relation shown in Figure 4.4. The subscript rating>8 specifies the
selection criterion to be applied while retrieving tuples.
sname rating
yuppy 9
sid sname rating age Lubber 8
28 yuppy 9 35.0 guppy 5
58 Rusty 10 35.0 Rusty 10
The selection operator σ specifies the tuples to retain through a selection condition.
In general, the selection condition is a boolean combination (i.e., an expression using
the logical connectives ∧ and ∨) of terms that have the form attribute op constant or
attribute1 op attribute2, where op is one of the comparison operators <, <=, =, #=, >=,
or >. The reference to an attribute can be by position (of the form .i or i) or by name
(of the form .name or name). The schema of the result of a selection is the schema of
the input relation instance.
The projection operator π allows us to extract columns from a relation; for example,
we can find out all sailor names and ratings by using π. The expression
πsname,rating (S2)
94 Chapter 4
evaluates to the relation shown in Figure 4.5. The subscript sname,rating specifies the
fields to be retained; the other fields are ‘projected out.’ The schema of the result of
a projection is determined by the fields that are projected in the obvious way.
Suppose that we wanted to find out only the ages of sailors. The expression
πage (S2)
evaluates to the relation shown in Figure 4.6. The important point to note is that
although three sailors are aged 35, a single tuple with age=35.0 appears in the result
of the projection. This follows from the definition of a relation as a set of tuples. In
practice, real systems often omit the expensive step of eliminating duplicate tuples,
leading to relations that are multisets. However, our discussion of relational algebra
and calculus assumes that duplicate elimination is always done so that relations are
always sets of tuples.
Since the result of a relational algebra expression is always a relation, we can substitute
an expression wherever a relation is expected. For example, we can compute the names
and ratings of highly rated sailors by combining two of the preceding queries. The
expression
πsname,rating (σrating>8 (S2))
produces the result shown in Figure 4.7. It is obtained by applying the selection to S2
(to get the relation shown in Figure 4.4) and then applying the projection.
The following standard operations on sets are also available in relational algebra: union
(∪), intersection (∩), set-difference (−), and cross-product (×).
Union: R ∪ S returns a relation instance containing all tuples that occur in either
relation instance R or relation instance S (or both). R and S must be union-
compatible, and the schema of the result is defined to be identical to the schema
of R.
Two relation instances are said to be union-compatible if the following condi-
tions hold:
– they have the same number of the fields, and
– corresponding fields, taken in order from left to right, have the same domains.
Relational Algebra and Calculus 95
Note that field names are not used in defining union-compatibility. For conve-
nience, we will assume that the fields of R ∪ S inherit names from R, if the fields
of R have names. (This assumption is implicit in defining the schema of R ∪ S to
be identical to the schema of R, as stated earlier.)
In the preceding definitions, note that each operator can be applied to relation instances
that are computed using a relational algebra (sub)expression.
We now illustrate these definitions through several examples. The union of S1 and S2
is shown in Figure 4.8. Fields are listed in order; field names are also inherited from
S1. S2 has the same field names, of course, since it is also an instance of Sailors. In
general, fields of S2 may have different names; recall that we require only domains to
match. Note that the result is a set of tuples. Tuples that appear in both S1 and S2
appear only once in S1 ∪ S2. Also, S1 ∪ R1 is not a valid operation because the two
relations are not union-compatible. The intersection of S1 and S2 is shown in Figure
4.9, and the set-difference S1 − S2 is shown in Figure 4.10.
Figure 4.8 S1 ∪ S2
96 Chapter 4
Figure 4.11 S1 × R1
4.2.3 Renaming
We have been careful to adopt field name conventions that ensure that the result of
a relational algebra expression inherits field names from its argument (input) relation
instances in a natural way whenever possible. However, name conflicts can arise in
some cases; for example, in S1 × R1. It is therefore convenient to be able to give
names explicitly to the fields of a relation instance that is defined by a relational
algebra expression. In fact, it is often convenient to give the instance itself a name so
that we can break a large algebra expression into smaller pieces by giving names to
the results of subexpressions.
For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation
that contains the tuples shown in Figure 4.11 and has the following schema: C(sid1:
integer, sname: string, rating: integer, age: real, sid2: integer, bid: integer,
day: dates).
It is customary to include some additional operators in the algebra, but they can all be
defined in terms of the operators that we have defined thus far. (In fact, the renaming
operator is only needed for syntactic convenience, and even the ∩ operator is redundant;
R ∩ S can be defined as R − (R − S).) We will consider these additional operators,
and their definition in terms of the basic operators, in the next two subsections.
4.2.4 Joins
The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
Although a join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products. Further, the
result of a cross-product is typically much larger than the result of a join, and it
is very important to recognize joins and implement them without materializing the
underlying cross-product (by applying the selections and projections ‘on-the-fly’). For
these reasons, joins have received a lot of attention, and there are several variants of
the join operation.1
Condition Joins
The most general version of the join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:
R $%c S = σc (R × S)
attribute of a relation, say R, can be by position (of the form R.i) or by name (of the
form R.name).
Equijoin
A common special case of the join operation R $% S is when the join condition con-
sists solely of equalities (connected by ∧) of the form R.name1 = S.name2, that is,
equalities between two fields in R and S. In this case, obviously, there is some redun-
dancy in retaining both attributes in the result. For join conditions that contain only
such equalities, the join operation is refined by doing an additional projection in which
S.name2 is dropped. The join operation with this refinement is called equijoin.
The schema of the result of an equijoin contains the fields of R (with the same names
and domains as in R) followed by the fields of S that do not appear in the join
conditions. If this set of fields in the result relation includes two fields that inherit the
same name from R and S, they are unnamed in the result relation.
We illustrate S1 $%R.sid=S.sid R1 in Figure 4.13. Notice that only one field called sid
appears in the result.
Natural Join
The equijoin expression S1 $%R.sid=S.sid R1 is actually a natural join and can simply
be denoted as S1 $% R1, since the only common field is sid. If the two relations have
no attributes in common, S1 $% R1 is simply the cross-product.
4.2.5 Division
The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.” Understanding how to use
the basic operators of the algebra to define division is a useful exercise. However,
the division operator does not have the same importance as the other operators—it
is not needed as often, and database systems do not try to exploit the semantics of
division by implementing it as a distinct operator (as, for example, is done with the
join operator).
Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, the x value is in the result of A/B.
An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A
and B, A/B is the largest relation instance Q such that Q × B ⊆ A.
Expressing A/B in terms of the basic algebra operators is an interesting exercise, and
the reader should try to do this before reading further. The basic idea is to compute
all x values in A that are not disqualified. An x value is disqualified if by attaching a
100 Chapter 4
y value from B, we obtain a tuple (x,y) that is not in A. We can compute disqualified
tuples using the algebra expression
πx ((πx (A) × B) − A)
To understand the division operation in full generality, we have to consider the case
when both x and y are replaced by a set of attributes. The generalization is straightfor-
ward and is left as an exercise for the reader. We will discuss two additional examples
illustrating division (Queries Q9 and Q10) later in this section.
We now present several examples to illustrate how to write queries in relational algebra.
We use the Sailors, Reserves, and Boats schema for all our examples in this section.
We will use parentheses as needed to make our algebra expressions unambiguous. Note
that all the example queries in this chapter are given a unique query number. The
query numbers are kept unique across both this chapter and the SQL query chapter
(Chapter 5). This numbering makes it easy to identify a query when it is revisited in
the context of relational calculus and SQL and to compare different ways of writing
the same query. (All references to a query can be found in the subject index.)
Relational Algebra and Calculus 101
In the rest of this chapter (and in Chapter 5), we illustrate queries using the instances
S3 of Sailors, R2 of Reserves, and B1 of Boats, shown in Figures 4.15, 4.16, and 4.17,
respectively.
(Q1) Find the names of sailors who have reserved boat 103.
We first compute the set of tuples in Reserves with bid = 103 and then take the
natural join of this set with Sailors. This expression can be evaluated on instances
of Reserves and Sailors. Evaluated on the instances R2 and S3, it yields a relation
that contains just one field, called sname, and three tuples (Dustin), (Horatio), and
(Lubber). (Observe that there are two sailors called Horatio, and only one of them has
reserved a red boat.)
We can break this query into smaller pieces using the renaming operator ρ:
Notice that because we are only using ρ to give names to intermediate relations, the
renaming list is optional and is omitted. T emp1 denotes an intermediate relation that
identifies reservations of boat 103. T emp2 is another intermediate relation, and it
denotes sailors who have made a reservation in the set T emp1. The instances of these
relations when evaluating this query on the instances R2 and S3 are illustrated in
Figures 4.18 and 4.19. Finally, we extract the sname column from T emp2.
The version of the query using ρ is essentially the same as the original query; the use
of ρ is just syntactic sugar. However, there are indeed several distinct ways to write a
query in relational algebra. Here is another way to write this query:
In this version we first compute the natural join of Reserves and Sailors and then apply
the selection and the projection.
This example offers a glimpse of the role played by algebra in a relational DBMS.
Queries are expressed by users in a language such as SQL. The DBMS translates an
SQL query into (an extended form of) relational algebra, and then looks for other
algebra expressions that will produce the same answers but are cheaper to evaluate. If
the user’s query is first translated into the expression
Further, the optimizer will recognize that the second expression is likely to be less
expensive to compute because the sizes of intermediate relations are smaller, thanks
to the early use of selection.
(Q2) Find the names of sailors who have reserved a red boat.
This query involves a series of two joins. First we choose (tuples describing) red boats.
Then we join this set with Reserves (natural join, with equality specified on the bid
column) to identify reservations of red boats. Next we join the resulting intermediate
relation with Sailors (natural join, with equality specified on the sid column) to retrieve
the names of sailors who have made reservations of red boats. Finally, we project the
sailors’ names. The answer, when evaluated on the instances B1, R2 and S3, contains
the names Dustin, Horatio, and Lubber.
The reader is invited to rewrite both of these queries by using ρ to make the interme-
diate relations explicit and to compare the schemas of the intermediate relations. The
second expression generates intermediate relations with fewer fields (and is therefore
likely to result in intermediate relation instances with fewer tuples, as well). A rela-
tional query optimizer would try to arrive at the second expression if it is given the
first.
This query is very similar to the query we used to compute sailors who reserved red
boats. On instances B1, R2, and S3, the query will return the colors gren and red.
(Q4) Find the names of sailors who have reserved at least one boat.
The join of Sailors and Reserves creates an intermediate relation in which tuples consist
of a Sailors tuple ‘attached to’ a Reserves tuple. A Sailors tuple appears in (some
tuple of) this intermediate relation only if at least one Reserves tuple has the same
sid value, that is, the sailor has made some reservation. The answer, when evaluated
on the instances B1, R2 and S3, contains the three tuples (Dustin), (Horatio), and
(Lubber). Even though there are two sailors called Horatio who have reserved a boat,
the answer contains only one copy of the tuple (Horatio), because the answer is a
relation, i.e., a set of tuples, without any duplicates.
At this point it is worth remarking on how frequently the natural join operation is
used in our examples. This frequency is more than just a coincidence based on the
set of queries that we have chosen to discuss; the natural join is a very natural and
widely used operation. In particular, natural join is frequently used when joining two
tables on a foreign key field. In Query Q4, for example, the join equates the sid fields
of Sailors and Reserves, and the sid field of Reserves is a foreign key that refers to the
sid field of Sailors.
104 Chapter 4
(Q5) Find the names of sailors who have reserved a red or a green boat.
ρ(T empboats, (σcolor=!red! Boats) ∪ (σcolor=!green! Boats))
πsname (T empboats $% Reserves $% Sailors)
We identify the set of all boats that are either red or green (Tempboats, which contains
boats with the bids 102, 103, and 104 on instances B1, R2, and S3). Then we join with
Reserves to identify sids of sailors who have reserved one of these boats; this gives us
sids 22, 31, 64, and 74 over our example instances. Finally, we join (an intermediate
relation containing this set of sids) with Sailors to find the names of Sailors with these
sids. This gives us the names Dustin, Horatio, and Lubber on the instances B1, R2,
and S3. Another equivalent definition is the following:
ρ(T empboats, (σcolor=!red! ∨color=!green! Boats))
πsname (T empboats $% Reserves $% Sailors)
(Q6) Find the names of sailors who have reserved a red and a green boat. It is tempting
to try to do this by simply replacing ∪ by ∩ in the definition of Tempboats:
ρ(T empboats2, (σcolor=!red! Boats) ∩ (σcolor=!green! Boats))
πsname (T empboats2 $% Reserves $% Sailors)
However, this solution is incorrect—it instead tries to compute sailors who have re-
served a boat that is both red and green. (Since bid is a key for Boats, a boat can
be only one color; this query will always return an empty answer set.) The correct
approach is to find sailors who have reserved a red boat, then sailors who have reserved
a green boat, and then take the intersection of these two sets:
ρ(T empred, πsid ((σcolor=!red! Boats) $% Reserves))
ρ(T empgreen, πsid ((σcolor=!green! Boats) $% Reserves))
πsname ((T empred ∩ T empgreen) $% Sailors)
The two temporary relations compute the sids of sailors, and their intersection identifies
sailors who have reserved both red and green boats. On instances B1, R2, and S3, the
sids of sailors who have reserved a red boat are 22, 31, and 64. The sids of sailors who
have reserved a green boat are 22, 31, and 74. Thus, sailors 22 and 31 have reserved
both a red boat and a green boat; their names are Dustin and Lubber.
This formulation of Query Q6 can easily be adapted to find sailors who have reserved
red or green boats (Query Q5); just replace ∩ by ∪:
ρ(T empred, πsid ((σcolor=!red! Boats) $% Reserves))
ρ(T empgreen, πsid ((σcolor=!green! Boats) $% Reserves))
πsname ((T empred ∪ T empgreen) $% Sailors)
Relational Algebra and Calculus 105
In the above formulations of Queries Q5 and Q6, the fact that sid (the field over which
we compute union or intersection) is a key for Sailors is very important. Consider the
following attempt to answer Query Q6:
This attempt is incorrect for a rather subtle reason. Two distinct sailors with the
same name, such as Horatio in our example instances, may have reserved red and
green boats, respectively. In this case, the name Horatio will (incorrectly) be included
in the answer even though no one individual called Horatio has reserved a red boat
and a green boat. The cause of this error is that sname is being used to identify sailors
(while doing the intersection) in this version of the query, but sname is not a key.
(Q7) Find the names of sailors who have reserved at least two boats.
First we compute tuples of the form (sid,sname,bid), where sailor sid has made a
reservation for boat bid; this set of tuples is the temporary relation Reservations.
Next we find all pairs of Reservations tuples where the same sailor has made both
reservations and the boats involved are distinct. Here is the central idea: In order
to show that a sailor has reserved two boats, we must find two Reservations tuples
involving the same sailor but distinct boats. Over instances B1, R2, and S3, the
sailors with sids 22, 31, and 64 have each reserved at least two boats. Finally, we
project the names of such sailors to obtain the answer, containing the names Dustin,
Horatio, and Lubber.
Notice that we included sid in Reservations because it is the key field identifying sailors,
and we need it to check that two Reservations tuples involve the same sailor. As noted
in the previous example, we can’t use sname for this purpose.
(Q8) Find the sids of sailors with age over 20 who have not reserved a red boat.
This query illustrates the use of the set-difference operator. Again, we use the fact
that sid is the key for Sailors. We first identify sailors aged over 20 (over instances B1,
R2, and S3, sids 22, 29, 31, 32, 58, 64, 74, 85, and 95) and then discard those who
106 Chapter 4
have reserved a red boat (sids 22, 31, and 64), to obtain the answer (sids 29, 32, 58, 74,
85, and 95). If we want to compute the names of such sailors, we must first compute
their sids (as shown above), and then join with Sailors and project the sname values.
(Q9) Find the names of sailors who have reserved all boats. The use of the word all
(or every) is a good indication that the division operation might be applicable:
The intermediate relation Tempsids is defined using division, and computes the set of
sids of sailors who have reserved every boat (over instances B1, R2, and S3, this is just
sid 22). Notice how we define the two relations that the division operator (/) is applied
to—the first relation has the schema (sid,bid) and the second has the schema (bid).
Division then returns all sids such that there is a tuple (sid,bid) in the first relation for
each bid in the second. Joining Tempsids with Sailors is necessary to associate names
with the selected sids; for sailor 22, the name is Dustin.
(Q10) Find the names of sailors who have reserved all boats called Interlake.
The only difference with respect to the previous query is that now we apply a selection
to Boats, to ensure that we compute only bids of boats named Interlake in defining the
second argument to the division operator. Over instances B1, R2, and S3, Tempsids
evaluates to sids 22 and 64, and the answer contains their names, Dustin and Horatio.
The variant of the calculus that we present in detail is called the tuple relational
calculus (TRC). Variables in TRC take on tuples as values. In another variant, called
the domain relational calculus (DRC), the variables range over field values. TRC has
had more of an influence on SQL, while DRC has strongly influenced QBE. We discuss
DRC in Section 4.3.2.2
2 The material on DRC is referred to in the chapter on QBE; with the exception of this chapter,
the material on DRC and TRC can be omitted without loss of continuity.
Relational Algebra and Calculus 107
When this query is evaluated on an instance of the Sailors relation, the tuple variable
S is instantiated successively with each tuple, and the test S.rating>7 is applied. The
answer contains those instances of S that pass this test. On instance S3 of Sailors, the
answer contains Sailors tuples with sid 31, 32, 58, 71, and 74.
We now define these concepts formally, beginning with the notion of a formula. Let
Rel be a relation name, R and S be tuple variables, a an attribute of R, and b an
attribute of S. Let op denote an operator in the set {<, >, =, ≤, ≥, #=}. An atomic
formula is one of the following:
R ∈ Rel
R.a op S.b
A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(R) denotes a formula in which the variable R appears:
¬p, p ∧ q, p ∨ q, or p ⇒ q
In the last two clauses above, the quantifiers ∃ and ∀ are said to bind the variable
R. A variable is said to be free in a formula or subformula (a formula contained in a
108 Chapter 4
larger formula) if the (sub)formula does not contain an occurrence of a quantifier that
binds it.3
We will not define types of variables formally, but the type of a variable should be clear
in most cases, and the important point to note is that comparisons of values having
different types should always fail. (In discussions of relational calculus, the simplifying
assumption is often made that there is a single domain of constants and that this is
the domain associated with each field of each relation.)
A TRC query is defined to be expression of the form {T | p(T)}, where T is the only
free variable in the formula p.
What does a TRC query mean? More precisely, what is the set of answer tuples for a
given TRC query? The answer to a TRC query {T | p(T)}, as we noted earlier, is the
set of all tuples t for which the formula p(T ) evaluates to true with variable T assigned
the tuple value t. To complete this definition, we must state which assignments of tuple
values to the free variables in a formula make the formula evaluate to true.
A query is evaluated on a given instance of the database. Let each free variable in a
formula F be bound to a tuple value. For the given assignment of tuples to variables,
with respect to the given database instance, F evaluates to (or simply ‘is’) true if one
of the following holds:
F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the tuples
assigned to R and S have field values R.a and S.b that make the comparison true.
F is of the form ¬p, and p is not true; or of the form p ∧ q, and both p and q are
true; or of the form p ∨ q, and one of them is true, or of the form p ⇒ q and q is
true whenever4 p is true.
F is of the form ∃R(p(R)), and there is some assignment of tuples to the free
variables in p(R), including the variable R,5 that makes the formula p(R) true.
F is of the form ∀R(p(R)), and there is some assignment of tuples to the free
variables in p(R) that makes the formula p(R) true no matter what tuple is
assigned to R.
We now illustrate the calculus through several examples, using the instances B1 of
Boats, R2 of Reserves, and S3 of Sailors shown in Figures 4.15, 4.16, and 4.17. We will
use parentheses as needed to make our formulas unambiguous. Often, a formula p(R)
includes a condition R ∈ Rel, and the meaning of the phrases some tuple R and for all
tuples R is intuitive. We will use the notation ∃R ∈ Rel(p(R)) for ∃R(R ∈ Rel ∧ p(R)).
Similarly, we use the notation ∀R ∈ Rel(p(R)) for ∀R(R ∈ Rel ⇒ p(R)).
(Q12) Find the names and ages of sailors with a rating above 7.
(Q13) Find the sailor name, boat id, and reservation date for each reservation.
{P | ∃R ∈ Reserves ∃S ∈ Sailors
(R.sid = S.sid ∧ P.bid = R.bid ∧ P.day = R.day ∧ P.sname = S.sname)}
For each Reserves tuple, we look for a tuple in Sailors with the same sid. Given a
pair of such tuples, we construct an answer tuple P with fields sname, bid, and day by
4 Whenever should be read more precisely as ‘for all assignments of tuples to the free variables.’
5 Note that some of the free variables in p(R) (e.g., the variable R itself) may be bound in F .
110 Chapter 4
copying the corresponding fields from these two tuples. This query illustrates how we
can combine values from different relations in each answer tuple. The answer to this
query on instances B1, R2, and S3 is shown in Figure 4.20.
(Q1) Find the names of sailors who have reserved boat 103.
{P | ∃S ∈ Sailors ∃R ∈ Reserves(R.sid = S.sid∧R.bid = 103∧P.sname = S.sname)}
This query can be read as follows: “Retrieve all sailor tuples for which there exists a
tuple in Reserves, having the same value in the sid field, and with bid = 103.” That
is, for each sailor tuple, we look for a tuple in Reserves that shows that this sailor has
reserved boat 103. The answer tuple P contains just one field, sname.
(Q2) Find the names of sailors who have reserved a red boat.
{P | ∃S ∈ Sailors ∃R ∈ Reserves(R.sid = S.sid ∧ P.sname = S.sname
∧∃B ∈ Boats(B.bid = R.bid ∧ B.color =$red$ ))}
This query can be read as follows: “Retrieve all sailor tuples S for which there exist
tuples R in Reserves and B in Boats such that S.sid = R.sid, R.bid = B.bid, and
B.color =$red$ .” Another way to write this query, which corresponds more closely to
this reading, is as follows:
{P | ∃S ∈ Sailors ∃R ∈ Reserves ∃B ∈ Boats
(R.sid = S.sid ∧ B.bid = R.bid ∧ B.color =$red$ ∧ P.sname = S.sname)}
(Q7) Find the names of sailors who have reserved at least two boats.
{P | ∃S ∈ Sailors ∃R1 ∈ Reserves ∃R2 ∈ Reserves
(S.sid = R1.sid ∧ R1.sid = R2.sid ∧ R1.bid #= R2.bid ∧ P.sname = S.sname)}
Relational Algebra and Calculus 111
Contrast this query with the algebra version and see how much simpler the calculus
version is. In part, this difference is due to the cumbersome renaming of fields in the
algebra version, but the calculus version really is simpler.
(Q9) Find the names of sailors who have reserved all boats.
{P | ∃S ∈ Sailors ∀B ∈ Boats
(∃R ∈ Reserves(S.sid = R.sid ∧ R.bid = B.bid ∧ P.sname = S.sname))}
This query was expressed using the division operator in relational algebra. Notice
how easily it is expressed in the calculus. The calculus query directly reflects how we
might express the query in English: “Find sailors S such that for all boats B there is
a Reserves tuple showing that sailor S has reserved boat B.”
{S | S ∈ Sailors ∧ ∀B ∈ Boats
(B.color =$red$ ⇒ (∃R ∈ Reserves(S.sid = R.sid ∧ R.bid = B.bid)))}
This query can be read as follows: For each candidate (sailor), if a boat is red, the
sailor must have reserved it. That is, for a candidate sailor, a boat being red must
imply the sailor having reserved it. Observe that since we can return an entire sailor
tuple as the answer instead of just the sailor’s name, we have avoided introducing a
new free variable (e.g., the variable P in the previous example) to hold the answer
values. On instances B1, R2, and S3, the answer contains the Sailors tuples with sids
22 and 31.
We can write this query without using implication, by observing that an expression of
the form p ⇒ q is logically equivalent to ¬p ∨ q:
{S | S ∈ Sailors ∧ ∀B ∈ Boats
(B.color #=$red$ ∨ (∃R ∈ Reserves(S.sid = R.sid ∧ R.bid = B.bid)))}
This query should be read as follows: “Find sailors S such that for all boats B, either
the boat is not red or a Reserves tuple shows that sailor S has reserved boat B.”
A domain variable is a variable that ranges over the values in the domain of some
attribute (e.g., the variable can be assigned an integer if it appears in an attribute
whose domain is the set of integers). A DRC query has the form {(x1 , x2 , . . . , xn ) |
p((x1 , x2 , . . . , xn ))}, where each xi is either a domain variable or a constant and
p((x1 , x2 , . . . , xn )) denotes a DRC formula whose only free variables are the vari-
ables among the xi , 1 ≤ i ≤ n. The result of this query is the set of all tuples
(x1 , x2 , . . . , xn ) for which the formula evaluates to true.
112 Chapter 4
A DRC formula is defined in a manner that is very similar to the definition of a TRC
formula. The main difference is that the variables are now domain variables. Let op
denote an operator in the set {<, >, =, ≤, ≥, #=} and let X and Y be domain variables.
An atomic formula in DRC is one of the following:
X op Y
X op constant, or constant op X
A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(X) denotes a formula in which the variable X appears:
¬p, p ∧ q, p ∨ q, or p ⇒ q
The reader is invited to compare this definition with the definition of TRC formulas
and see how closely these two definitions correspond. We will not define the semantics
of DRC formulas formally; this is left as an exercise for the reader.
We now illustrate DRC through several examples. The reader is invited to compare
these with the TRC versions.
This differs from the TRC version in giving each attribute a (variable) name. The
condition (I, N, T, A) ∈ Sailors ensures that the domain variables I, N , T , and A are
restricted to be fields of the same tuple. In comparison with the TRC query, we can
say T > 7 instead of S.rating > 7, but we must specify the tuple (I, N, T, A) in the
result, rather than just S.
(Q1) Find the names of sailors who have reserved boat 103.
Notice that only the sname field is retained in the answer and that only N is a free
variable. We use the notation ∃Ir, Br, D(. . .) as a shorthand for ∃Ir(∃Br(∃D(. . .))).
Very often, all the quantified variables appear in a single relation, as in this example.
An even more compact notation in this case is ∃(Ir, Br, D) ∈ Reserves. With this
notation, which we will use henceforth, the above query would be as follows:
The comparison with the corresponding TRC formula should now be straightforward.
This query can also be written as follows; notice the repetition of variable I and the
use of the constant 103:
(Q2) Find the names of sailors who have reserved a red boat.
(Q7) Find the names of sailors who have reserved at least two boats.
Notice how the repeated use of variable I ensures that the same sailor has reserved
both the boats in question.
(Q9) Find the names of sailors who have reserved all boats.
This query can be read as follows: “Find all values of N such that there is some tuple
(I, N, T, A) in Sailors satisfying the following condition: for every (B, BN, C), either
this is not a tuple in Boats or there is some tuple (Ir, Br, D) in Reserves that proves
that Sailor I has reserved boat B.” The ∀ quantifier allows the domain variables B,
BN , and C to range over all values in their respective attribute domains, and the
pattern ‘¬((B, BN, C) ∈ Boats)∨’ is necessary to restrict attention to those values
that appear in tuples of Boats. This pattern is common in DRC formulas, and the
notation ∀(B, BN, C) ∈ Boats can be used as a shorthand instead. This is similar to
114 Chapter 4
the notation introduced earlier for ∃. With this notation the query would be written
as follows:
Here, we find all sailors such that for every red boat there is a tuple in Reserves that
shows the sailor has reserved it.
We have presented two formal query languages for the relational model. Are they
equivalent in power? Can every query that can be expressed in relational algebra also
be expressed in relational calculus? The answer is yes, it can. Can every query that
can be expressed in relational calculus also be expressed in relational algebra? Before
we answer this question, we consider a major problem with the calculus as we have
presented it.
Consider the query {S | ¬(S ∈ Sailors)}. This query is syntactically correct. However,
it asks for all tuples S such that S is not in (the given instance of) Sailors. The set of
such S tuples is obviously infinite, in the context of infinite domains such as the set of
all integers. This simple example illustrates an unsafe query. It is desirable to restrict
relational calculus to disallow unsafe queries.
We now sketch how calculus queries are restricted to be safe. Consider a set I of
relation instances, with one instance per relation that appears in the query Q. Let
Dom(Q, I) be the set of all constants that appear in these relation instances I or in
the formulation of the query Q itself. Since we only allow finite instances I, Dom(Q, I)
is also finite.
given a TRC formula of the form ∀R(p(R)), we want to find any values for variable
R that make this formula false by checking only tuples that contain constants in
Dom(Q, I).
1. For any given I, the set of answers for Q contains only values that are in Dom(Q, I).
Note that this definition is not constructive, that is, it does not tell us how to check if
a query is safe.
Returning to the question of expressiveness, we can show that every query that can be
expressed using a safe relational calculus query can also be expressed as a relational
algebra query. The expressive power of relational algebra is often used as a metric of
how powerful a relational database query language is. If a query language can express
all the queries that we can express in relational algebra, it is said to be relationally
complete. A practical query language is expected to be relationally complete; in ad-
dition, commercial query languages typically support features that allow us to express
some queries that cannot be expressed in relational algebra.
The inputs and outputs of a query are relations. A query takes instances of each
input relation and produces an instance of the output relation. (Section 4.1)
A relational algebra query describes a procedure for computing the output rela-
tion from the input relations by applying relational algebra operators. Internally,
database systems use some variant of relational algebra to represent query evalu-
ation plans. (Section 4.2)
Two basic relational algebra operators are selection (σ), to select subsets of a
relation, and projection (π), to select output fields. (Section 4.2.1)
116 Chapter 4
Relational algebra includes standard operations on sets such as union (∪), inter-
section (∩), set-difference (−), and cross-product (×). (Section 4.2.2)
Relations and fields can be renamed in relational algebra using the renaming
operator (ρ). (Section 4.2.3)
Another relational algebra operation that arises commonly in practice is the join
($%) —with important special cases of equijoin and natural join. (Section 4.2.4)
The division operation (/) is a convenient way to express that we only want tuples
where all possible value combinations—as described in another relation—exist.
(Section 4.2.5)
EXERCISES
Exercise 4.1 Explain the statement that relational algebra operators can be composed. Why
is the ability to compose operators important?
Exercise 4.2 Given two relations R1 and R2, where R1 contains N1 tuples, R2 contains
N2 tuples, and N2 > N1 > 0, give the minimum and maximum possible sizes (in tuples) for
the result relation produced by each of the following relational algebra expressions. In each
case, state any assumptions about the schemas for R1 and R2 that are needed to make the
expression meaningful:
(1) R1 ∪ R2, (2) R1 ∩ R2, (3) R1 − R2, (4) R1 × R2, (5) σa=5 (R1), (6) πa (R1), and
(7) R1/R2
The key fields are underlined, and the domain of each field is listed after the field name.
Thus sid is the key for Suppliers, pid is the key for Parts, and sid and pid together form the
key for Catalog. The Catalog relation lists the prices charged for parts by Suppliers. Write
the following queries in relational algebra, tuple relational calculus, and domain relational
calculus:
Exercise 4.4 Consider the Supplier-Parts-Catalog schema from the previous question. State
what the following queries compute:
Exercise 4.5 Consider the following relations containing airline flight information:
Note that the Employees relation describes pilots and other kinds of employees as well; every
pilot is certified for some aircraft (otherwise, he or she would not qualify as a pilot), and only
pilots are certified to fly.
Write the following queries in relational algebra, tuple relational calculus, and domain rela-
tional calculus. Note that some of these queries may not be expressible in relational algebra
(and, therefore, also not expressible in tuple and domain relational calculus)! For such queries,
informally explain why they cannot be expressed. (See the exercises at the end of Chapter 5
for additional queries over the airline schema.)
Exercise 4.7 What is an unsafe query? Give an example and explain why it is important
to disallow such queries.
BIBLIOGRAPHIC NOTES
Relational algebra was proposed by Codd in [156], and he showed the equivalence of relational
algebra and TRC in [158]. Earlier, Kuhns [392] considered the use of logic to pose queries.
LaCroix and Pirotte discussed DRC in [397]. Klug generalized the algebra and calculus to
include aggregate operations in [378]. Extensions of the algebra and calculus to deal with
aggregate functions are also discussed in [503]. Merrett proposed an extended relational
algebra with quantifiers such as the number of, which go beyond just universal and existential
quantification [460]. Such generalized quantifiers are discussed at length in [42].
SQL QUERIES PROGRAMMING AND TRIGGERS
Introduction to SQL
Structure Query Language (SQL) is a programming language used for storing and managing data
in RDBMS. SQL was the first commercial language introduced for E.F Codd's Relational model.
What is SQL?
SQL keywords are NOT case sensitive, select is the same as SELECT
Some database systems require a semicolon at the end of each SQL statement. Semicolon
is the standard way to separate each SQL statement in database systems that allow more
than one SQL statement to be executed in the same call to the server.
Command Description
CREATE:
The CREATE DATABASE statement is used to create a new SQL database.
Syntax
CREATE DATABASE databasename;
OR
The CREATE Table statement is used to create a new Table in SQL database.
create table table-name
{
column-name1 datatype1,
column-name2 datatype2,
column-name3 datatype3,
column-name4 datatype4
};
Example:
OR
CREATE TABLE new table
AS (SELECT * FROM old table); // selected all columns from existing table
OR
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
The ALTER TABLE statement is also used to add and drop various constraints on an existing
table.
Syntax:
TRUNCATE:
TRUNCATE TABLE command is used to delete complete data from an existing table.
Syntax:
DROP/ DELETE:
The SQL DROP TABLE statement is used to remove a table definition and all the data, indexes,
triggers, constraints and permission specifications for that table.
Syntax:
RENAME:
RENAME command is used to rename a table.
Syntax:
Command Description
Syntax:
OR
UPDATE:
The UPDATE statement is used to modify the existing records in a table.
Be careful when updating records. If you omit the WHERE clause, ALL records will be
updated!
Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Example:
UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
DELETE:
Delete command is used to delete data from a table. Delete command can also be used with
condition to delete a particular row.
Be careful when deleting records in a table! Notice the WHERE clause in the DELETE
statement. The WHERE clause specifies which record(s) that should be deleted. If you omit
the WHERE clause, all records in the table will be deleted!
Syntax:
DELETE FROM table_name;
OR
DELETE * FROM table_name; // command will delete all the records from Student table.
Example:
DELETE FROM student;
OR
DELETE FROM student where s_id=103; // command will delete the record where s_id is 103
Transaction Control Language (TCL) commands are used to manage transactions in database. These
are used to manage the changes made by DML statements. It also allows statements to be grouped
together into logical transactions.
COMMIT:
Commit command is used to permanently save any transaction into database.
Syntax:
Commit;
ROLLBACK:
This command restores the database to last commited state. It is also use with savepoint
command to jump to a savepoint in a transaction.
Syntax:
Rollback to savepoint-name;
SAVEPOINT:
savepoint command is used to temporarily save a transaction so that you can rollback to
that point whenever necessary.
Syntax:
Savepoint savepoint-name;
Example of Savepoint and Rollback
Following is the class table,
ID NAME
1 abhi
2 adam
4 alex
SQL Queries:
INSERT into class values(5,'Rahul');
commit;
UPDATE class set name='abhijit' where id='5';
savepoint A;
INSERT into class values(6,'Chris');
savepoint B;
INSERT into class values(7,'Bravo');
savepoint C;
SELECT * from class;
ID NAME
1 abhi
2 adam
4 alex
5 abhijit
6 chris
7 bravo
ID NAME
1 abhi
2 adam
4 alex
5 abhijit
6 chris
Now rollback to savepoint A
Rollback to A;
SELECT * from class;
The result table will look like
ID NAME
1 abhi
2 adam
4 alex
5 abhijit
1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS (EXCEPT)
UNION Operation
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Example of UNION
The First table,
ID Name
1 abhi
2 adam
ID Name
2 adam
3 Chester
ID NAME
1 abhi
2 adam
3 Chester
UNION ALL
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION
ALL:
Syntax:
The column names in the result-set are usually equal to the column names in the first SELECT
statement in the UNION.
The union operation automatically eliminates duplicates, unlike the select clause.
ID NAME
1 abhi
2 adam
2 adam
3 Chester
INTERSECT
It is used to combine two SELECT statements. The Intersect operation returns the common
rows from both the SELECT statements.
In the Intersect operation, the number of datatype and columns must be the same.
It has no duplicates and it arranges the data in ascending order by default.
If we want to retain all duplicates, we must write intersect all in place of intersect.
Intersect Syntax
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;
Example of Intersect
According to The First and Second table,
Intersect query will be,
SELECT * FROM First
INTERSECT
SELECT * FROM Second;
The resultset table will look like
ID NAME
2 adam
EXCEPT/MINUS
It combines the result of two SELECT statements. Minus operator is used to display the rows
which are present in the first query but absent in the second query.
It has no duplicates and data arranged in ascending order by default.
If we want to retain duplicates, we must write except all in place of except.
Except Syntax:
SELECT column_name(s) FROM table1
EXCEPT/MINUS
SELECT column_name(s) FROM table2;
Example of Minus
According to The First and Second table,
Minus query will be,
SELECT * FROM First
MINUS
SELECT * FROM Second;
The resultset table will look like,
ID NAME
1 abhi
AGGREGATE OPERATORS
These functions return a single value after performing calculations on a group of values. Following
are some of the frequently used Aggregrate functions.
AVG() Function
Average returns average value after calculating it from values in a numeric column.
Its general syntax is,
SELECT AVG(column_name) FROM table_name
avg(salary)
8200
COUNT() Function
Count returns the number of rows present in the table either based on some condition or without
condition.
Its general syntax is,
SELECT COUNT(column_name) FROM table-name
count(name)
Example of COUNT(distinct)
Consider the above Emp table
SQL query is,
SELECT COUNT(DISTINCT salary) FROM emp;
Result of the above query will be,
count(distinct salary)
MAX() Function
MAX function returns maximum value from selected column of the table.
Syntax of MAX function is,
SELECT MAX(column_name) from table-name;
Using MAX() function
Consider the above Emp table
SQL query to find the Maximum salary will be,
SELECT MAX(salary) FROM emp;
Result of the above query will be,
MAX(salary)
10000
MIN() Function
MIN function returns minimum value from a selected column of the table.
Syntax for MIN function is,
SELECT MIN(column_name) from table-name;
Using MIN() function
Consider the above Emp table,
SQL query to find minimum salary is,
SELECT MIN(salary) FROM emp;
Result will be,
MIN(salary)
6000
SUM() Function
SUM function returns total sum of a selected columns numeric values.
Syntax for SUM is,
SELECT SUM(column_name) from table-name;
SUM(salary)
41000
= Equal
!= Not Equal
SELECT *
FROM suppliers
WHERE supplier_name = 'Microsoft';
There will be 1 record selected. These are the results that you should see:
supplier_id supplier_name city state
SELECT *
FROM suppliers
WHERE supplier_name <> 'Microsoft';
SELECT *
FROM suppliers
WHERE supplier_name != 'Microsoft';
There will be 8 records selected. These are the results you should see with either one of the SQL
statements:
supplier_id supplier_name city state
SELECT *
FROM customers
WHERE customer_id > 6000;
There will be 3 records selected. These are the results that you should see:
customer_id last_name first_name favorite_website
SELECT *
FROM customers
WHERE customer_id >= 6000;
There will be 4 records selected. These are the results that you should see:
customer_id last_name first_name favorite_website
1 Pear 50
2 Banana 50
3 Orange 50
4 Apple 50
5 Bread 75
6 Sliced Ham 25
7 Kleenex NULL
Enter the following SQL statement:
SELECT *
FROM products
WHERE product_id < 5;
There will be 4 records selected. These are the results that you should see:
product_id product_name category_id
1 Pear 50
2 Banana 50
3 Orange 50
4 Apple 50
Example - Less Than or Equal Operator
In SQL, you can use the <= operator to test for an expression less than or equal to.
Let's use the same products table as the previous example.
Enter the following SQL statement:
SELECT *
FROM products
WHERE product_id <= 5;
There will be 5 records selected. These are the results that you should see:
product_id product_name category_id
1 Pear 50
2 Banana 50
3 Orange 50
4 Apple 50
5 Bread 75
The ORDER BY keyword is used to sort the result-set in ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in
descending order, use the DESC keyword.
ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG)
to group the result-set by one or more columns.
GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.
HAVING Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards used in conjunction with the LIKE operator:
% - The percent sign represents zero, one, or multiple characters
_ - The underscore represents a single character
The percent sign and the underscore can also be used in combinations!
LIKE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE columnN LIKE pattern;
You can also combine any number of conditions using AND or OR operators.
Here are some examples showing different LIKE operators with '%' and '_' wildcards:
WHERE CustomerName LIKE 'a%' Finds any values that starts with "a"
WHERE CustomerName LIKE '%a' Finds any values that ends with "a"
WHERE CustomerName LIKE '%or%' Finds any values that have "or" in any
position
WHERE CustomerName LIKE '_r%' Finds any values that have "r" in the second
position
WHERE CustomerName LIKE 'a_%_%' Finds any values that starts with "a" and are
at least 3 characters in length
WHERE ContactName LIKE 'a%o' Finds any values that starts with "a" and
ends with "o"
Examples:
The following SQL statement selects all customers with a CustomerName starting with "a":
The following SQL statement selects all customers with a CustomerName ending with "a":
The following SQL statement selects all customers with a CustomerName that have "r" in the second
position:
The following SQL statement selects all customers with a CustomerName that starts with "a" and
are at least 3 characters in length:
The following SQL statement selects all customers with a ContactName that starts with "a" and
ends with "o":
The following SQL statement selects all customers with a CustomerName that NOT starts with "a":
The BETWEEN operator selects values within a given range. The values can be numbers, text, or
dates.
The BETWEEN operator is inclusive: begin and end values are included.
BETWEEN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
or:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);
Nested Subqueries
A subquery is nested when you are having a subquery in the where or having clause of another
subquery.
Get the result of all the students who are enrolled in the same course as the student with
ROLLNO 12.
Select *
From result
where rollno in (select rollno
from student
where courseid = (select courseid
from student
where rollno = 12));
The innermost subquery will be executed first and then based on its result the next subquery will
be executed and based on that result the outer query will be executed. The levels to which you
can do the nesting is implementation-dependent.
A subquery can be nested inside other subqueries. SQL has an ability to nest queries within one
another. A subquery is a SELECT statement that is nested within another SELECT statement and
which return intermediate results. SQL executes innermost subquery first, then next level.
SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<
(SELECT MAX(AVG(min_salary))
FROM jobs
WHERE job_id IN
(SELECT job_id FROM job_history
WHERE department_id
BETWEEN 50 AND 100)
GROUP BY job_id);
OR
SELECT job_id,AVG(salary)
SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<
(SELECT MAX(myavg) from (select job_id,AVG(min_salary) as myavg
FROM jobs
WHERE job_id IN
(SELECT job_id FROM job_history
WHERE department_id
BETWEEN 50 AND 100)
GROUP BY job_id) ss);
Output
JOB_ID AVG(SALARY)
---------- -----------
IT_PROG 5760
AC_ACCOUNT 8300
ST_MAN 7280
AD_ASST 4400
SH_CLERK 3215
FI_ACCOUNT 7920
PU_CLERK 2780
SA_REP 8350
MK_REP 6000
ST_CLERK 2785
HR_REP 6500
Explanation:
This example contains three queries: a nested subquery, a subquery, and the outer query. These
parts of queries are runs in that order.
Let's break the example down into three parts and observes the results returned.
SQL Code:
This nested subquery retrieves the job_id(s) from job_history table which is within the
department_id 50 and 100.
Output:
JOB_ID
----------
ST_CLERK
ST_CLERK
IT_PROG
SA_REP
SA_MAN
AD_ASST
AC_ACCOUNT
SELECT MAX(AVG(min_salary))
FROM jobs WHERE job_id
IN(.....output from the nested subquery......)
GROUP BY job_id
SQL Code:
SELECT MAX(AVG(min_salary))
FROM jobs
WHERE job_id
IN (
'ST_CLERK','ST_CLERK','IT_PROG',
'SA_REP','SA_MAN','AD_ASST', '
AC_ACCOUNT')
GROUP BY job_id;
The subquery returns the maximum of averages of min_salary for each unique job_id return ( i.e.
'ST_CLERK','ST_CLERK','IT_PROG', 'SA_REP','SA_MAN','AD_ASST', 'AC_ACCOUNT' ) by the previous
subquery.
Output:
MAX(AVG(MIN_SALARY))
--------------------
10000
Here is the pictorial representation of how the above output returns.
Now the outer query that receives output from the subquery and which also receives the output
from the nested subquery stated above.
SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<
(.....output from the subquery(
output from the nested subquery)......)
SQL Code:
SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<10000;
The outer query returns the job_id, average salary of employees that are less than maximum of
average of min_salary returned by the previous query
Output:
JOB_ID AVG(SALARY)
---------- -----------
IT_PROG 5760
AC_ACCOUNT 8300
ST_MAN 7280
AD_ASST 4400
SH_CLERK 3215
FI_ACCOUNT 7920
PU_CLERK 2780
SA_REP 8350
MK_REP 6000
ST_CLERK 2785
HR_REP 6500
Correlated Subquery
A Correlated Subquery is one that is executed after the outer query is executed. So correlated
subqueries take an approach opposite to that of normal subqueries. The correlated subquery
execution is as follows:
Correlated Subqueries differ from the normal subqueries in that the nested SELECT statement
referes back to the table in the first SELECT statement.
To find out the names of all the students who appeared in more than three papers of their opted
course, the SQL will be
Select name
from student A
Where 3 < (select count (*)
from result b
where b.rollno = a.rollno);
In other words, a correlated subquery is one whose value depends upon some variable that receives
its value in some outer query. A non-correlated subquery as said before is evaluted in a bottom-
to-up manner, i.e. the inner most query is evaluated first. But a correlated subquery is resolved in
a top-to-bottom fashion. The top most query is analyzed and based on that result the next query
is initiated. Such a subquery has to be evaluated repeatedly, once for each value of the variable in
question, instead of once and for all.
SQL Correlated Subqueries are used to select data from a table referenced in the outer query. The
subquery is known as a correlated because the subquery is related to the outer query. In this type
of queries, a table alias (also called a correlation name) must be used to specify which table
reference is to be used.
The alias is the pet name of a table which is brought about by putting directly after the table name
in the FROM clause. This is suitable when anybody wants to obtain information from two separate
tables.
The following correlated subqueries retrive ord_num, ord_amount, cust_code and agent_code from
the table orders ( 'a' and 'b' are the aliases of orders and agents table) with following conditions -
the agent_code of orders table must be the same agent_code of agents table and agent_name of
agents table must be Alex, the following SQL statement can be used:
SQL Code:
SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code
FROM orders a
WHERE a.agent_code=(
SELECT b.agent_code
FROM agents b WHERE b.agent_name='Alex');
Output:
SQL Code:
SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code
FROM orders a
WHERE a.agent_code='A003';
Pictorical Presentation:
2. The condition that determines whether the rule action should be executed: Once the triggering
event has occurred, an optional condition may be evaluated. If no condition is specified, the action
will be executed once the event occurs. If a condition is specified, it is first evaluated, and only if it
evaluates to true will the rule action be executed.
3. The action to be taken: The action is usually a sequence of SQL statements, but it could also be
a database transaction or an external program that will be automatically executed.
Trigger
A SQL trigger is a set of SQL statements stored in the database catalog. A SQL trigger is executed
or fired whenever an event associated with a table occurs e.g., insert, update or delete.
A SQL trigger is a special type of stored procedure. It is special because it is not called directly like
a stored procedure. The main difference between a trigger and a stored procedure is that a trigger
is called automatically when a data modification event is made against a table whereas a stored
procedure must be called explicitly.
SQL triggers only can provide an extended validation and they cannot replace all the
validations. Some simple validations have to be done in the application layer. For example,
you can validate user’s inputs in the client side by using JavaScript or in the server side using
server-side scripting languages such as JSP, PHP, ASP.NET, Perl, etc.
SQL triggers are invoked and executed invisible from the client applications, therefore, it is
difficult to figure out what happen in the database layer.
SQL triggers may increase the overhead of the database server.
Triggers are stored programs, which are automatically executed or fired when some events occur.
Triggers are, in fact, written to be executed in response to any of the following events −
Triggers can be defined on the table, view, schema, or database with which the event is associated.
Row level trigger: - Row level trigger is executed when each row of the table is inserted/
updated/ deleted. If it is a row level trigger, then we have to explicitly specify while creating
the trigger, as we did in the above example. Also, we have to specify the WHEN (condition)
in the trigger.
Statement level trigger: - this trigger will be executed only once for DML statement. This
DML statement may insert / delete/ update one row or multiple rows or whole table.
Irrespective of number of rows, this trigger will be fired for the statement. If we have not
specified the type of trigger while creating, by default it would be a statement level trigger.
Benefits of Triggers
Where,
CREATE [OR REPLACE] TRIGGER trigger_name − Creates or replaces an existing trigger with
the trigger_name.
{BEFORE | AFTER | INSTEAD OF} − This specifies when the trigger will be executed. The
INSTEAD OF clause is used for creating trigger on a view.
{INSERT [OR] | UPDATE [OR] | DELETE} − This specifies the DML operation.
[OF col_name] − This specifies the column name that will be updated.
[ON table_name] − This specifies the name of the table associated with the trigger.
[REFERENCING OLD AS o NEW AS n] − This allows you to refer new and old values for
various DML statements, such as INSERT, UPDATE, and DELETE.
[FOR EACH ROW] − This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected. Otherwise the trigger will execute just once when the SQL
statement is executed, which is called a table level trigger.
WHEN (condition) − This provides a condition for rows for which the trigger would fire. This
clause is valid only for row-level triggers.
Example
Given Student Report Database, in which student marks assessment is recorded. In such schema,
create a trigger so that the total and average of specified marks is automatically inserted whenever
a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.
Suppose the database Schema –
mysql> desc Student;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| per | int(3) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
SQL Join is used to fetch data from two or more tables, which is joined to appear as single set of
data. It is used for combining column from two or more tables by using values common to both
tables.
JOIN Keyword is used in SQL queries for joining two or more tables. Minimum required condition
for joining table, is (n-1) where n, is number of tables. A table can also join to itself, which is known
as, Self Join.
Types of JOIN
This type of JOIN returns the cartesian product of rows from the tables in Join. It will return a
table which consists of records which combines each row from the first table with each row of the
second table.
Cross JOIN Syntax is,
SELECT column-name-list
FROM
table-name1 CROSS JOIN table-name2;
ID NAME
1 abhi
2 adam
4 alex
class_info table,
ID Address
1 DELHI
2 MUMBAI
3 CHENNAI
ID NAME ID Address
1 abhi 1 DELHI
2 adam 1 DELHI
4 alex 1 DELHI
1 abhi 2 MUMBAI
2 adam 2 MUMBAI
4 alex 2 MUMBAI
1 abhi 3 CHENNAI
2 adam 3 CHENNAI
4 alex 3 CHENNAI
As you can see, this join returns the cross product of all the records present in both the tables.
This is a simple JOIN in which the result is based on matched data as per the equality condition
specified in the SQL query.
Inner Join Syntax is,
SELECT column-name-list FROM
table-name1 INNER JOIN table-name2
WHERE table-name1.column-name = table-name2.column-name;
Example of INNER JOIN
ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
Natural JOIN
Natural Join is a type of Inner join which is based on column having same name and same datatype
present in both the tables to be joined.
The syntax for Natural Join is,
SELECT * FROM
table-name1 NATURAL JOIN table-name2;
ID NAME Address
1 abhi DELHI
2 adam MUMBAI
3 alex CHENNAI
In the above example, both the tables being joined have ID column(same name and same
datatype), hence the records for which value of ID matches in both the tables will be the result of
Natural Join of these two tables.
OUTER JOIN
Outer Join is based on both matched and unmatched data. Outer Joins subdivide further into,
ID NAME
1 abhi
2 adam
3 alex
4 anu
5 ashish
class_info table,
ID Address
1 DELHI
2 MUMBAI
3 CHENNAI
7 NOIDA
8 PANIPAT
ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
VIEW:
A view in SQL is a logical subset of data from one or more tables. View is used to restrict
data access.
A view can join information from several tables together, or we can say that Views are
useful for Hiding unwanted information.
Database View is a subset of the database sorted and displayed in a particular way.
A database view displays one or more database records on the same page.
Syntax:
CREATE or REPLACE view view_name AS<Query expression>
SELECT column_name(s)
FROM table_name
WHERE condition
Update a View
Update command for view is same as for tables.
Syntax to Update a View is,
UPDATE view-name
set value
WHERE condition;
If we update a view it also updates base table data automatically.
Types of View
There are two types of view,
Simple View
Complex View
Dropping Views
You need a way to drop the view if it is no longer needed. The syntax is
DROP VIEW view_name;
Advantages:
1. Provide additional level of table security by restricting access to a predetermined set of rows or
columns of a table.
2. Hide Data complexity: For example, a single view might be defined with a join, which is a
collection of related columns or rows in multiple tables. However, the view hides the fact that this
information actually originates from several tables.
3. Simplify Statements for User: Views allow users to select information from multiple tables
without actually knowing how to perform join.
4. Present Data in different perspective: Columns of views can be renamed without effecting the
tables on which the views are based.
Disadvantages:
1. Rows available through a view are not sorted and are not ordered either.
2. Cannot use DML operations on a View.
3. When table is dropped view becomes inactive, it depends on the table objects.
4. It affects performance, querying from view takes more time than directly querying from the
table.
Host variable : These are the variables of host language used to pass the value to the query as
well as to capture the values returned by the query.
BEGIN DECLAREand END DECLARE section. Again, these declare block should be enclosed
within EXEC SQL and ‘;’.
Indicator Variable : These variables are also host variables but are of 2 byte short type always.
These variables are used to capture the NULL values that a query returns or to INSERT/ UPDATE
any NULL values to the tables. When it is used in a SELECT query, it captures any NULL value
returned for any column. When used along with INSERT or UPDATE, it sets the column value as
NULL, even though the host variable has value. If we have to capture the NULL values for each
host variable in the code, then we have to declare indicator variables to each of the host variables.
Execution Section
This is the execution section, and it contains all the SQL queries and statements prefixed by ‘EXEC
SQL’.
In this embedded SQL, all the queries are dependent on the values of host variable and queries are
static. That means, in above example of SELECT query, it always pulls student details for the student
Id inserted. But suppose user enters student name instead of student ID. Then these SQLs are not
flexible to modify the query to fetch details based on name. Suppose query is based on name and
address of a student. Then code will not modify the query to fetch details based on name and
address of a student. That means queries are static and it cannot be modified based on user input.
Hence this kind of SQLs is known as static SQLs.
Error Handling
Error handling method would be based on the host language. Here we are using C language and
we use labeling method, i.e.; when error occurs we stop the current sequence of execution and ask
the compiler to jump to error handling section of the code to continue. In order to handle error, C
programs require separate error handling structure which holds different variables to capture
different set of errors. This structure is known as SQL Communication Area or SQLCA.
#include <stdio.h>
#include <sqlca.h>
int main(){
EXEC SQL INCLUDE SQLCA;
//Error handling
EXEC WHENEVER NOT FOUND GOTO error_msg1;
EXEC WHENEVER SQLERROR GOTO error_msg2;
printf("Enter the Student name:");
scanf("%s", STD_Name);
// Executes the query
EXEC SQL SELECT STD_ID INTO :SID INDICATOR ind_sid FROM STUDENT WHERE
STD_NAME = :STD_NAME;
printf("STUDENT ID:%d", STD_ID); // prints the result from DB
exit(0);
// Error handling labels
error_msg1:
printf("Student Id %d is not found", STD_ID);
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
error_msg2:
printf("Error has occurred!");
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
}
Dynamic SQL
Dynamic SQL is a programming technique that enables you to build SQL statements
dynamically at runtime. You can create more general purpose, flexible applications by using
dynamic SQL because the full text of a SQL statement may be unknown at compilation.
Dynamic SQL programs can handle changes in data definitions, without the need to
recompile.
If we need to build up queries at run time, then we can use dynamic sql. That means if query
changes according to user input, then it always better to use dynamic SQL.
The query when user enters student name alone and when user enters both student name
and address, is different. If we use embedded SQL, one cannot implement this requirement
in the code. In such case dynamic SQL helps the user to develop query depending on the
values entered by him, without making him know which query is being executed.
It can also be used when we do not know which SQL statements like Insert, Delete update
or select needs to be used, when number of host variables is unknown, or when datatypes
of host variables are unknown or when there is direct reference to DB objects like tables,
views, indexes are required.
In dynamic SQL, queries are created, compiled and executed only at the run time. This makes
the dynamic SQL little complex, and time consuming.
PREPARE
Dynamic SQL builds a query at run time, as a first step we need to capture all the inputs from the
user. It will be stored in a string variable. Depending on the inputs received from the user, string
variable is appended with inputs and SQL keywords. These SQL like string statements are then
converted into SQL query. This is done by using PREPARE statement.
Example, Here sql_stmt is a character variable, which holds inputs from the users along with SQL
commands. But is cannot be considered as SQL query as it is still a sting value. It needs to be
converted into a proper SQL query which is done at the last line using PREPARE statement. Here
sql_query is also a string variable, but it holds the string as a SQL query.
sql_stmt = "SELECT STD_ID FROM STUDENT ";
if (strcmp(STD_NAME, '') != 0){
sql_stmt = sql_stmt || " WHERE STD_NAME = :STD_NAME";
}
else if (CLASS_ID > 0){
sql_stmt = sql_stmt || " WHERE CLASS_ID = :CLASS_ID";
EXECUTE
This statement is used to compile and execute the SQL statements prepared in DB.
EXEC SQL EXECUTE sql_query;
EXECUTE IMMEDIATE
This statement is used to prepare SQL statement as well as execute the SQL statements in DB. It
performs the task of PREPARE and EXECUTE in a single line.
EXEC SQL EXECUTE IMMEDIATE : sql_stmt;
#include
#include
int main(){
EXEC SQL INCLUDE SQLCA;
error_msg1:
printf("Student Id %d is not found", STD_ID);
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
error_msg2:
printf("Error has occurred!");
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
}
Difference between Embedded & Dynamic SQL
JDBC API provides a standard interface for interacting with any relational database management
systems (RDBMS). JDBC API consists of the following main components:
1. JDBC Driver
2. Connection
3. Statement
4. ResultSet
JDBC Driver
A JDBC driver is set of Java classes that implement JDBC interfaces for interacting with a
specific database. Almost all database vendors such as MySQL, Oracle, Microsoft SQL Server,
provide JDBC drivers. For example, MySQL provides a JDBC driver called MySQL Connection/J
that allows you to work with MySQL database through a standard JDBC API.
JDBC Driver is written in pure Java. It translates JDBC calls into MySQL specific calls and sends
the calls directly to a specific database. To use a JDBC driver, you need to include the driver
JAR file with your application.
Connection
The first and most important component of JDBC is the Connection object. In a Java application,
you first load a JDBC driver and then establish a connection to the database. Through the
Connection object, you can interact with the database e.g., creating a Statement to execute SQL
queries against tables. You can open more than one connection to a database at a time.
Statement
To execute a SQL query e.g., SELECT, INSERT, UPDATE, DELETE, etc., you use a Statement object.
You create the Statement object through the Connection object. JDBC provides several types of
statements for different purposes such as PreparedStatement , CallableStatement . We will cover
the details of each object in the next tutorials.
ResultSet
After querying data from the database, you get a ResultSet object. The ResultSet object provides
a set of API that allows you to traverse result of the query.
First, you need to import three classes: SQLException, DriverManager, and Connection from the
java.sql.* package.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
Second, you call the getConnection() method of the DriverManager class to get the Connection
object. There are three parameters you need to pass to the getConnection() method:
1. url: the database URL in the form jdbc:subprotocol:subname. For MySQL, you use the
jdbc:mysql://localhost:3306/mysqljdbc i.e., you are connecting to the MySQL with server
name localhost, port 3006, and database mysqljdbc.
2. user: the database user that will be used to connect to MySQL.
3. password: the password of the database user.
JDBC Program:
When connecting to MySQL, anything could happens e.g., database server is not available, wrong
user name or password, etc. in such cases, JDBC throws a SQLException . Therefore, when you
create a Connection object, you should always put it inside a try catch block. Also you should always
close the database connection once you complete interacting with database by
calling close() method of the Connection object.
ODBC
ODBC is Open Database Connectivity. Like JDBC, ODBC is also an API that acts as an interface
between an application on the client side and the database on the server side. Microsoft introduced
ODBC in the year 1992.
ODBC helps an application to access the data from the database. An application written in any
language can use ODBC to access different types of databases and hence, it is said to be language
and platform independent. Like JDBC, ODBC aslo provides ODBC drivers that convert the request
of application written in any language into the language understandable by databases.
ODBC is most widely used and understands many different programming languages. But its code
is complex and hard to understand.
Application: Processes and calls the ODBC functions and submits the SQL statements;
Driver manager: Loads drivers for each application;
Driver: Handles ODBC function calls, and then submits each SQL request to a data source;
and
Data source: The data being accessed and its database management system (DBMS) OS.
ODBC JDBC
ODBC Stands for Open Database JDBC Stands for java database
Connectivity. connectivity.