Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

4 RELATIONAL ALGEBRA

AND CALCULUS

Stand firm in your refusal to remain conscious during algebra. In real life, I assure
you, there is no such thing as algebra.

—Fran Lebowitz, Social Studies

This chapter presents two formal query languages associated with the relational model.
Query languages are specialized languages for asking questions, or queries, that in-
volve the data in a database. After covering some preliminaries in Section 4.1, we
discuss relational algebra in Section 4.2. Queries in relational algebra are composed
using a collection of operators, and each query describes a step-by-step procedure for
computing the desired answer; that is, queries are specified in an operational manner.
In Section 4.3 we discuss relational calculus, in which a query describes the desired
answer without specifying how the answer is to be computed; this nonprocedural style
of querying is called declarative. We will usually refer to relational algebra and rela-
tional calculus as algebra and calculus, respectively. We compare the expressive power
of algebra and calculus in Section 4.4. These formal query languages have greatly
influenced commercial query languages such as SQL, which we will discuss in later
chapters.

4.1 PRELIMINARIES

We begin by clarifying some important points about relational queries. The inputs and
outputs of a query are relations. A query is evaluated using instances of each input
relation and it produces an instance of the output relation. In Section 3.4, we used
field names to refer to fields because this notation makes queries more readable. An
alternative is to always list the fields of a given relation in the same order and to refer
to fields by position rather than by field name.

In defining relational algebra and calculus, the alternative of referring to fields by


position is more convenient than referring to fields by name: Queries often involve the
computation of intermediate results, which are themselves relation instances, and if
we use field names to refer to fields, the definition of query language constructs must
specify the names of fields for all intermediate relation instances. This can be tedious
and is really a secondary issue because we can refer to fields by position anyway. On
the other hand, field names make queries more readable.

91
92 Chapter 4

Due to these considerations, we use the positional notation to formally define relational
algebra and calculus. We also introduce simple conventions that allow intermediate
relations to ‘inherit’ field names, for convenience.

We present a number of sample queries using the following schema:

Sailors(sid: integer, sname: string, rating: integer, age: real)


Boats(bid: integer, bname: string, color: string)
Reserves(sid: integer, bid: integer, day: date)

The key fields are underlined, and the domain of each field is listed after the field
name. Thus sid is the key for Sailors, bid is the key for Boats, and all three fields
together form the key for Reserves. Fields in an instance of one of these relations will
be referred to by name, or positionally, using the order in which they are listed above.

In several examples illustrating the relational algebra operators, we will use the in-
stances S1 and S2 (of Sailors) and R1 (of Reserves) shown in Figures 4.1, 4.2, and 4.3,
respectively.

sid sname rating age


sid sname rating age 28 yuppy 9 35.0
22 Dustin 7 45.0 31 Lubber 8 55.5
31 Lubber 8 55.5 44 guppy 5 35.0
58 Rusty 10 35.0 58 Rusty 10 35.0

Figure 4.1 Instance S1 of Sailors Figure 4.2 Instance S2 of Sailors

sid bid day


22 101 10/10/96
58 103 11/12/96

Figure 4.3 Instance R1 of Reserves

4.2 RELATIONAL ALGEBRA

Relational algebra is one of the two formal query languages associated with the re-
lational model. Queries in algebra are composed using a collection of operators. A
fundamental property is that every operator in the algebra accepts (one or two) rela-
tion instances as arguments and returns a relation instance as the result. This property
makes it easy to compose operators to form a complex query—a relational algebra
expression is recursively defined to be a relation, a unary algebra operator applied
Relational Algebra and Calculus 93

to a single expression, or a binary algebra operator applied to two expressions. We


describe the basic operators of the algebra (selection, projection, union, cross-product,
and difference), as well as some additional operators that can be defined in terms of
the basic operators but arise frequently enough to warrant special attention, in the
following sections.

Each relational query describes a step-by-step procedure for computing the desired
answer, based on the order in which operators are applied in the query. The procedural
nature of the algebra allows us to think of an algebra expression as a recipe, or a
plan, for evaluating a query, and relational systems in fact use algebra expressions to
represent query evaluation plans.

4.2.1 Selection and Projection

Relational algebra includes operators to select rows from a relation (σ) and to project
columns (π). These operations allow us to manipulate data in a single relation. Con-
sider the instance of the Sailors relation shown in Figure 4.2, denoted as S2. We can
retrieve rows corresponding to expert sailors by using the σ operator. The expression

σrating>8 (S2)

evaluates to the relation shown in Figure 4.4. The subscript rating>8 specifies the
selection criterion to be applied while retrieving tuples.

sname rating
yuppy 9
sid sname rating age Lubber 8
28 yuppy 9 35.0 guppy 5
58 Rusty 10 35.0 Rusty 10

Figure 4.4 σrating>8 (S2) Figure 4.5 πsname,rating (S2)

The selection operator σ specifies the tuples to retain through a selection condition.
In general, the selection condition is a boolean combination (i.e., an expression using
the logical connectives ∧ and ∨) of terms that have the form attribute op constant or
attribute1 op attribute2, where op is one of the comparison operators <, <=, =, #=, >=,
or >. The reference to an attribute can be by position (of the form .i or i) or by name
(of the form .name or name). The schema of the result of a selection is the schema of
the input relation instance.

The projection operator π allows us to extract columns from a relation; for example,
we can find out all sailor names and ratings by using π. The expression

πsname,rating (S2)
94 Chapter 4

evaluates to the relation shown in Figure 4.5. The subscript sname,rating specifies the
fields to be retained; the other fields are ‘projected out.’ The schema of the result of
a projection is determined by the fields that are projected in the obvious way.

Suppose that we wanted to find out only the ages of sailors. The expression

πage (S2)

evaluates to the relation shown in Figure 4.6. The important point to note is that
although three sailors are aged 35, a single tuple with age=35.0 appears in the result
of the projection. This follows from the definition of a relation as a set of tuples. In
practice, real systems often omit the expensive step of eliminating duplicate tuples,
leading to relations that are multisets. However, our discussion of relational algebra
and calculus assumes that duplicate elimination is always done so that relations are
always sets of tuples.

Since the result of a relational algebra expression is always a relation, we can substitute
an expression wherever a relation is expected. For example, we can compute the names
and ratings of highly rated sailors by combining two of the preceding queries. The
expression
πsname,rating (σrating>8 (S2))
produces the result shown in Figure 4.7. It is obtained by applying the selection to S2
(to get the relation shown in Figure 4.4) and then applying the projection.

age sname rating


35.0 yuppy 9
55.5 Rusty 10

Figure 4.6 πage (S2) Figure 4.7 πsname,rating (σrating>8 (S2))

4.2.2 Set Operations

The following standard operations on sets are also available in relational algebra: union
(∪), intersection (∩), set-difference (−), and cross-product (×).

Union: R ∪ S returns a relation instance containing all tuples that occur in either
relation instance R or relation instance S (or both). R and S must be union-
compatible, and the schema of the result is defined to be identical to the schema
of R.
Two relation instances are said to be union-compatible if the following condi-
tions hold:
– they have the same number of the fields, and
– corresponding fields, taken in order from left to right, have the same domains.
Relational Algebra and Calculus 95

Note that field names are not used in defining union-compatibility. For conve-
nience, we will assume that the fields of R ∪ S inherit names from R, if the fields
of R have names. (This assumption is implicit in defining the schema of R ∪ S to
be identical to the schema of R, as stated earlier.)

Intersection: R ∩S returns a relation instance containing all tuples that occur in


both R and S. The relations R and S must be union-compatible, and the schema
of the result is defined to be identical to the schema of R.

Set-difference: R − S returns a relation instance containing all tuples that occur


in R but not in S. The relations R and S must be union-compatible, and the
schema of the result is defined to be identical to the schema of R.

Cross-product: R × S returns a relation instance whose schema contains all the


fields of R (in the same order as they appear in R) followed by all the fields of S
(in the same order as they appear in S). The result of R × S contains one tuple
(r, s) (the concatenation of tuples r and s) for each pair of tuples r ∈ R, s ∈ S.
The cross-product opertion is sometimes called Cartesian product.
We will use the convention that the fields of R × S inherit names from the cor-
responding fields of R and S. It is possible for both R and S to contain one or
more fields having the same name; this situation creates a naming conflict. The
corresponding fields in R × S are unnamed and are referred to solely by position.

In the preceding definitions, note that each operator can be applied to relation instances
that are computed using a relational algebra (sub)expression.

We now illustrate these definitions through several examples. The union of S1 and S2
is shown in Figure 4.8. Fields are listed in order; field names are also inherited from
S1. S2 has the same field names, of course, since it is also an instance of Sailors. In
general, fields of S2 may have different names; recall that we require only domains to
match. Note that the result is a set of tuples. Tuples that appear in both S1 and S2
appear only once in S1 ∪ S2. Also, S1 ∪ R1 is not a valid operation because the two
relations are not union-compatible. The intersection of S1 and S2 is shown in Figure
4.9, and the set-difference S1 − S2 is shown in Figure 4.10.

sid sname rating age


22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
28 yuppy 9 35.0
44 guppy 5 35.0

Figure 4.8 S1 ∪ S2
96 Chapter 4

sid sname rating age


31 Lubber 8 55.5 sid sname rating age
58 Rusty 10 35.0 22 Dustin 7 45.0

Figure 4.9 S1 ∩ S2 Figure 4.10 S1 − S2

The result of the cross-product S1 × R1 is shown in Figure 4.11. Because R1 and


S1 both have a field named sid, by our convention on field names, the corresponding
two fields in S1 × R1 are unnamed, and referred to solely by the position in which
they appear in Figure 4.11. The fields in S1 × R1 have the same domains as the
corresponding fields in R1 and S1. In Figure 4.11 sid is listed in parentheses to
emphasize that it is not an inherited field name; only the corresponding domain is
inherited.

(sid) sname rating age (sid) bid day


22 Dustin 7 45.0 22 101 10/10/96
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0 22 101 10/10/96
58 Rusty 10 35.0 58 103 11/12/96

Figure 4.11 S1 × R1

4.2.3 Renaming

We have been careful to adopt field name conventions that ensure that the result of
a relational algebra expression inherits field names from its argument (input) relation
instances in a natural way whenever possible. However, name conflicts can arise in
some cases; for example, in S1 × R1. It is therefore convenient to be able to give
names explicitly to the fields of a relation instance that is defined by a relational
algebra expression. In fact, it is often convenient to give the instance itself a name so
that we can break a large algebra expression into smaller pieces by giving names to
the results of subexpressions.

We introduce a renaming operator ρ for this purpose. The expression ρ(R(F ), E)


takes an arbitrary relational algebra expression E and returns an instance of a (new)
relation called R. R contains the same tuples as the result of E, and has the same
schema as E, but some fields are renamed. The field names in relation R are the
same as in E, except for fields renamed in the renaming list F , which is a list of
Relational Algebra and Calculus 97

terms having the form oldname → newname or position → newname. For ρ to be


well-defined, references to fields (in the form of oldnames or positions in the renaming
list) may be unambiguous, and no two fields in the result must have the same name.
Sometimes we only want to rename fields or to (re)name the relation; we will therefore
treat both R and F as optional in the use of ρ. (Of course, it is meaningless to omit
both.)

For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation
that contains the tuples shown in Figure 4.11 and has the following schema: C(sid1:
integer, sname: string, rating: integer, age: real, sid2: integer, bid: integer,
day: dates).

It is customary to include some additional operators in the algebra, but they can all be
defined in terms of the operators that we have defined thus far. (In fact, the renaming
operator is only needed for syntactic convenience, and even the ∩ operator is redundant;
R ∩ S can be defined as R − (R − S).) We will consider these additional operators,
and their definition in terms of the basic operators, in the next two subsections.

4.2.4 Joins

The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
Although a join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products. Further, the
result of a cross-product is typically much larger than the result of a join, and it
is very important to recognize joins and implement them without materializing the
underlying cross-product (by applying the selections and projections ‘on-the-fly’). For
these reasons, joins have received a lot of attention, and there are several variants of
the join operation.1

Condition Joins

The most general version of the join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:

R $%c S = σc (R × S)

Thus $% is defined to be a cross-product followed by a selection. Note that the condition


c can (and typically does) refer to attributes of both R and S. The reference to an
1 There are several variants of joins that are not discussed in this chapter. An important class of
joins called outer joins is discussed in Chapter 5.
98 Chapter 4

attribute of a relation, say R, can be by position (of the form R.i) or by name (of the
form R.name).

As an example, the result of S1 $%S1.sid<R1.sid R1 is shown in Figure 4.12. Because sid


appears in both S1 and R1, the corresponding fields in the result of the cross-product
S1 × R1 (and therefore in the result of S1 $%S1.sid<R1.sid R1) are unnamed. Domains
are inherited from the corresponding fields of S1 and R1.

(sid) sname rating age (sid) bid day


22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 58 103 11/12/96

Figure 4.12 S1 #$S1.sid<R1.sid R1

Equijoin

A common special case of the join operation R $% S is when the join condition con-
sists solely of equalities (connected by ∧) of the form R.name1 = S.name2, that is,
equalities between two fields in R and S. In this case, obviously, there is some redun-
dancy in retaining both attributes in the result. For join conditions that contain only
such equalities, the join operation is refined by doing an additional projection in which
S.name2 is dropped. The join operation with this refinement is called equijoin.

The schema of the result of an equijoin contains the fields of R (with the same names
and domains as in R) followed by the fields of S that do not appear in the join
conditions. If this set of fields in the result relation includes two fields that inherit the
same name from R and S, they are unnamed in the result relation.

We illustrate S1 $%R.sid=S.sid R1 in Figure 4.13. Notice that only one field called sid
appears in the result.

sid sname rating age bid day


22 Dustin 7 45.0 101 10/10/96
58 Rusty 10 35.0 103 11/12/96

Figure 4.13 S1 #$R.sid=S.sid R1


Relational Algebra and Calculus 99

Natural Join

A further special case of the join operation R $% S is an equijoin in which equalities


are specified on all fields having the same name in R and S. In this case, we can
simply omit the join condition; the default is that the join condition is a collection of
equalities on all common fields. We call this special case a natural join, and it has the
nice property that the result is guaranteed not to have two fields with the same name.

The equijoin expression S1 $%R.sid=S.sid R1 is actually a natural join and can simply
be denoted as S1 $% R1, since the only common field is sid. If the two relations have
no attributes in common, S1 $% R1 is simply the cross-product.

4.2.5 Division

The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.” Understanding how to use
the basic operators of the algebra to define division is a useful exercise. However,
the division operator does not have the same importance as the other operators—it
is not needed as often, and database systems do not try to exploit the semantics of
division by implementing it as a distinct operator (as, for example, is done with the
join operator).

We discuss division through an example. Consider two relation instances A and B in


which A has (exactly) two fields x and y and B has just one field y, with the same
domain as in A. We define the division operation A/B as the set of all x values (in
the form of unary tuples) such that for every y value in (a tuple of) B, there is a tuple
(x,y) in A.

Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, the x value is in the result of A/B.

An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A
and B, A/B is the largest relation instance Q such that Q × B ⊆ A.

Division is illustrated in Figure 4.14. It helps to think of A as a relation listing the


parts supplied by suppliers, and of the B relations as listing parts. A/Bi computes
suppliers who supply all parts listed in relation instance Bi.

Expressing A/B in terms of the basic algebra operators is an interesting exercise, and
the reader should try to do this before reading further. The basic idea is to compute
all x values in A that are not disqualified. An x value is disqualified if by attaching a
100 Chapter 4

A sno pno B1 pno A/B1 sno


s1 p1 p2 s1
s1 p2 s2
s1 p3 B2 pno s3
s1 p4 p2 s4
s2 p1 p4
s2 p2 A/B2 sno
s3 p2 pno s1
B3
s4 p2 s4
p1
s4 p4
p2
p4 A/B3 sno
s1

Figure 4.14 Examples Illustrating Division

y value from B, we obtain a tuple (x,y) that is not in A. We can compute disqualified
tuples using the algebra expression

πx ((πx (A) × B) − A)

Thus we can define A/B as

πx (A) − πx ((πx (A) × B) − A)

To understand the division operation in full generality, we have to consider the case
when both x and y are replaced by a set of attributes. The generalization is straightfor-
ward and is left as an exercise for the reader. We will discuss two additional examples
illustrating division (Queries Q9 and Q10) later in this section.

4.2.6 More Examples of Relational Algebra Queries

We now present several examples to illustrate how to write queries in relational algebra.
We use the Sailors, Reserves, and Boats schema for all our examples in this section.
We will use parentheses as needed to make our algebra expressions unambiguous. Note
that all the example queries in this chapter are given a unique query number. The
query numbers are kept unique across both this chapter and the SQL query chapter
(Chapter 5). This numbering makes it easy to identify a query when it is revisited in
the context of relational calculus and SQL and to compare different ways of writing
the same query. (All references to a query can be found in the subject index.)
Relational Algebra and Calculus 101

In the rest of this chapter (and in Chapter 5), we illustrate queries using the instances
S3 of Sailors, R2 of Reserves, and B1 of Boats, shown in Figures 4.15, 4.16, and 4.17,
respectively.

sid sname rating age sid bid day


22 Dustin 7 45.0 22 101 10/10/98
29 Brutus 1 33.0 22 102 10/10/98
31 Lubber 8 55.5 22 103 10/8/98
32 Andy 8 25.5 22 104 10/7/98
58 Rusty 10 35.0 31 102 11/10/98
64 Horatio 7 35.0 31 103 11/6/98
71 Zorba 10 16.0 31 104 11/12/98
74 Horatio 9 35.0 64 101 9/5/98
85 Art 3 25.5 64 102 9/8/98
95 Bob 3 63.5 74 103 9/8/98

Figure 4.15 An Instance S3 of Sailors Figure 4.16 An Instance R2 of Reserves

bid bname color


101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red

Figure 4.17 An Instance B1 of Boats

(Q1) Find the names of sailors who have reserved boat 103.

This query can be written as follows:

πsname ((σbid=103 Reserves) $% Sailors)

We first compute the set of tuples in Reserves with bid = 103 and then take the
natural join of this set with Sailors. This expression can be evaluated on instances
of Reserves and Sailors. Evaluated on the instances R2 and S3, it yields a relation
that contains just one field, called sname, and three tuples (Dustin), (Horatio), and
(Lubber). (Observe that there are two sailors called Horatio, and only one of them has
reserved a red boat.)

We can break this query into smaller pieces using the renaming operator ρ:

ρ(T emp1, σbid=103 Reserves)


102 Chapter 4

ρ(T emp2, T emp1 $% Sailors)


πsname (T emp2)

Notice that because we are only using ρ to give names to intermediate relations, the
renaming list is optional and is omitted. T emp1 denotes an intermediate relation that
identifies reservations of boat 103. T emp2 is another intermediate relation, and it
denotes sailors who have made a reservation in the set T emp1. The instances of these
relations when evaluating this query on the instances R2 and S3 are illustrated in
Figures 4.18 and 4.19. Finally, we extract the sname column from T emp2.

sid bid day sid sname rating age bid day


22 103 10/8/98 22 Dustin 7 45.0 103 10/8/98
31 103 11/6/98 31 Lubber 8 55.5 103 11/6/98
74 103 9/8/98 74 Horatio 9 35.0 103 9/8/98

Figure 4.18 Instance of T emp1 Figure 4.19 Instance of T emp2

The version of the query using ρ is essentially the same as the original query; the use
of ρ is just syntactic sugar. However, there are indeed several distinct ways to write a
query in relational algebra. Here is another way to write this query:

πsname (σbid=103 (Reserves $% Sailors))

In this version we first compute the natural join of Reserves and Sailors and then apply
the selection and the projection.

This example offers a glimpse of the role played by algebra in a relational DBMS.
Queries are expressed by users in a language such as SQL. The DBMS translates an
SQL query into (an extended form of) relational algebra, and then looks for other
algebra expressions that will produce the same answers but are cheaper to evaluate. If
the user’s query is first translated into the expression

πsname (σbid=103 (Reserves $% Sailors))

a good query optimizer will find the equivalent expression

πsname ((σbid=103 Reserves) $% Sailors)

Further, the optimizer will recognize that the second expression is likely to be less
expensive to compute because the sizes of intermediate relations are smaller, thanks
to the early use of selection.

(Q2) Find the names of sailors who have reserved a red boat.

πsname ((σcolor=!red! Boats) $% Reserves $% Sailors)


Relational Algebra and Calculus 103

This query involves a series of two joins. First we choose (tuples describing) red boats.
Then we join this set with Reserves (natural join, with equality specified on the bid
column) to identify reservations of red boats. Next we join the resulting intermediate
relation with Sailors (natural join, with equality specified on the sid column) to retrieve
the names of sailors who have made reservations of red boats. Finally, we project the
sailors’ names. The answer, when evaluated on the instances B1, R2 and S3, contains
the names Dustin, Horatio, and Lubber.

An equivalent expression is:

πsname (πsid ((πbid σcolor=!red! Boats) $% Reserves) $% Sailors)

The reader is invited to rewrite both of these queries by using ρ to make the interme-
diate relations explicit and to compare the schemas of the intermediate relations. The
second expression generates intermediate relations with fewer fields (and is therefore
likely to result in intermediate relation instances with fewer tuples, as well). A rela-
tional query optimizer would try to arrive at the second expression if it is given the
first.

(Q3) Find the colors of boats reserved by Lubber.

πcolor ((σsname=!Lubber! Sailors) $% Reserves $% Boats)

This query is very similar to the query we used to compute sailors who reserved red
boats. On instances B1, R2, and S3, the query will return the colors gren and red.

(Q4) Find the names of sailors who have reserved at least one boat.

πsname (Sailors $% Reserves)

The join of Sailors and Reserves creates an intermediate relation in which tuples consist
of a Sailors tuple ‘attached to’ a Reserves tuple. A Sailors tuple appears in (some
tuple of) this intermediate relation only if at least one Reserves tuple has the same
sid value, that is, the sailor has made some reservation. The answer, when evaluated
on the instances B1, R2 and S3, contains the three tuples (Dustin), (Horatio), and
(Lubber). Even though there are two sailors called Horatio who have reserved a boat,
the answer contains only one copy of the tuple (Horatio), because the answer is a
relation, i.e., a set of tuples, without any duplicates.

At this point it is worth remarking on how frequently the natural join operation is
used in our examples. This frequency is more than just a coincidence based on the
set of queries that we have chosen to discuss; the natural join is a very natural and
widely used operation. In particular, natural join is frequently used when joining two
tables on a foreign key field. In Query Q4, for example, the join equates the sid fields
of Sailors and Reserves, and the sid field of Reserves is a foreign key that refers to the
sid field of Sailors.
104 Chapter 4

(Q5) Find the names of sailors who have reserved a red or a green boat.
ρ(T empboats, (σcolor=!red! Boats) ∪ (σcolor=!green! Boats))
πsname (T empboats $% Reserves $% Sailors)
We identify the set of all boats that are either red or green (Tempboats, which contains
boats with the bids 102, 103, and 104 on instances B1, R2, and S3). Then we join with
Reserves to identify sids of sailors who have reserved one of these boats; this gives us
sids 22, 31, 64, and 74 over our example instances. Finally, we join (an intermediate
relation containing this set of sids) with Sailors to find the names of Sailors with these
sids. This gives us the names Dustin, Horatio, and Lubber on the instances B1, R2,
and S3. Another equivalent definition is the following:
ρ(T empboats, (σcolor=!red! ∨color=!green! Boats))
πsname (T empboats $% Reserves $% Sailors)

Let us now consider a very similar query:

(Q6) Find the names of sailors who have reserved a red and a green boat. It is tempting
to try to do this by simply replacing ∪ by ∩ in the definition of Tempboats:
ρ(T empboats2, (σcolor=!red! Boats) ∩ (σcolor=!green! Boats))
πsname (T empboats2 $% Reserves $% Sailors)
However, this solution is incorrect—it instead tries to compute sailors who have re-
served a boat that is both red and green. (Since bid is a key for Boats, a boat can
be only one color; this query will always return an empty answer set.) The correct
approach is to find sailors who have reserved a red boat, then sailors who have reserved
a green boat, and then take the intersection of these two sets:
ρ(T empred, πsid ((σcolor=!red! Boats) $% Reserves))
ρ(T empgreen, πsid ((σcolor=!green! Boats) $% Reserves))
πsname ((T empred ∩ T empgreen) $% Sailors)
The two temporary relations compute the sids of sailors, and their intersection identifies
sailors who have reserved both red and green boats. On instances B1, R2, and S3, the
sids of sailors who have reserved a red boat are 22, 31, and 64. The sids of sailors who
have reserved a green boat are 22, 31, and 74. Thus, sailors 22 and 31 have reserved
both a red boat and a green boat; their names are Dustin and Lubber.

This formulation of Query Q6 can easily be adapted to find sailors who have reserved
red or green boats (Query Q5); just replace ∩ by ∪:
ρ(T empred, πsid ((σcolor=!red! Boats) $% Reserves))
ρ(T empgreen, πsid ((σcolor=!green! Boats) $% Reserves))
πsname ((T empred ∪ T empgreen) $% Sailors)
Relational Algebra and Calculus 105

In the above formulations of Queries Q5 and Q6, the fact that sid (the field over which
we compute union or intersection) is a key for Sailors is very important. Consider the
following attempt to answer Query Q6:

ρ(T empred, πsname ((σcolor=!red! Boats) $% Reserves $% Sailors))


ρ(T empgreen, πsname ((σcolor=!green! Boats) $% Reserves $% Sailors))
T empred ∩ T empgreen

This attempt is incorrect for a rather subtle reason. Two distinct sailors with the
same name, such as Horatio in our example instances, may have reserved red and
green boats, respectively. In this case, the name Horatio will (incorrectly) be included
in the answer even though no one individual called Horatio has reserved a red boat
and a green boat. The cause of this error is that sname is being used to identify sailors
(while doing the intersection) in this version of the query, but sname is not a key.

(Q7) Find the names of sailors who have reserved at least two boats.

ρ(Reservations, πsid,sname,bid (Sailors $% Reserves))


ρ(Reservationpairs(1 → sid1, 2 → sname1, 3 → bid1, 4 → sid2,
5 → sname2, 6 → bid2), Reservations × Reservations)
πsname1 σ(sid1=sid2)∧(bid1#=bid2) Reservationpairs

First we compute tuples of the form (sid,sname,bid), where sailor sid has made a
reservation for boat bid; this set of tuples is the temporary relation Reservations.
Next we find all pairs of Reservations tuples where the same sailor has made both
reservations and the boats involved are distinct. Here is the central idea: In order
to show that a sailor has reserved two boats, we must find two Reservations tuples
involving the same sailor but distinct boats. Over instances B1, R2, and S3, the
sailors with sids 22, 31, and 64 have each reserved at least two boats. Finally, we
project the names of such sailors to obtain the answer, containing the names Dustin,
Horatio, and Lubber.

Notice that we included sid in Reservations because it is the key field identifying sailors,
and we need it to check that two Reservations tuples involve the same sailor. As noted
in the previous example, we can’t use sname for this purpose.

(Q8) Find the sids of sailors with age over 20 who have not reserved a red boat.

πsid (σage>20 Sailors) −


πsid ((σcolor=!red! Boats) $% Reserves $% Sailors)

This query illustrates the use of the set-difference operator. Again, we use the fact
that sid is the key for Sailors. We first identify sailors aged over 20 (over instances B1,
R2, and S3, sids 22, 29, 31, 32, 58, 64, 74, 85, and 95) and then discard those who
106 Chapter 4

have reserved a red boat (sids 22, 31, and 64), to obtain the answer (sids 29, 32, 58, 74,
85, and 95). If we want to compute the names of such sailors, we must first compute
their sids (as shown above), and then join with Sailors and project the sname values.

(Q9) Find the names of sailors who have reserved all boats. The use of the word all
(or every) is a good indication that the division operation might be applicable:

ρ(T empsids, (πsid,bid Reserves)/(πbid Boats))


πsname (T empsids $% Sailors)

The intermediate relation Tempsids is defined using division, and computes the set of
sids of sailors who have reserved every boat (over instances B1, R2, and S3, this is just
sid 22). Notice how we define the two relations that the division operator (/) is applied
to—the first relation has the schema (sid,bid) and the second has the schema (bid).
Division then returns all sids such that there is a tuple (sid,bid) in the first relation for
each bid in the second. Joining Tempsids with Sailors is necessary to associate names
with the selected sids; for sailor 22, the name is Dustin.

(Q10) Find the names of sailors who have reserved all boats called Interlake.

ρ(T empsids, (πsid,bid Reserves)/(πbid (σbname=!Interlake! Boats)))


πsname (T empsids $% Sailors)

The only difference with respect to the previous query is that now we apply a selection
to Boats, to ensure that we compute only bids of boats named Interlake in defining the
second argument to the division operator. Over instances B1, R2, and S3, Tempsids
evaluates to sids 22 and 64, and the answer contains their names, Dustin and Horatio.

4.3 RELATIONAL CALCULUS

Relational calculus is an alternative to relational algebra. In contrast to the algebra,


which is procedural, the calculus is nonprocedural, or declarative, in that it allows
us to describe the set of answers without being explicit about how they should be
computed. Relational calculus has had a big influence on the design of commercial
query languages such as SQL and, especially, Query-by-Example (QBE).

The variant of the calculus that we present in detail is called the tuple relational
calculus (TRC). Variables in TRC take on tuples as values. In another variant, called
the domain relational calculus (DRC), the variables range over field values. TRC has
had more of an influence on SQL, while DRC has strongly influenced QBE. We discuss
DRC in Section 4.3.2.2
2 The material on DRC is referred to in the chapter on QBE; with the exception of this chapter,
the material on DRC and TRC can be omitted without loss of continuity.
Relational Algebra and Calculus 107

4.3.1 Tuple Relational Calculus

A tuple variable is a variable that takes on tuples of a particular relation schema as


values. That is, every value assigned to a given tuple variable has the same number
and type of fields. A tuple relational calculus query has the form { T | p(T) }, where
T is a tuple variable and p(T ) denotes a formula that describes T ; we will shortly
define formulas and queries rigorously. The result of this query is the set of all tuples
t for which the formula p(T ) evaluates to true with T = t. The language for writing
formulas p(T ) is thus at the heart of TRC and is essentially a simple subset of first-order
logic. As a simple example, consider the following query.

(Q11) Find all sailors with a rating above 7.

{S | S ∈ Sailors ∧ S.rating > 7}

When this query is evaluated on an instance of the Sailors relation, the tuple variable
S is instantiated successively with each tuple, and the test S.rating>7 is applied. The
answer contains those instances of S that pass this test. On instance S3 of Sailors, the
answer contains Sailors tuples with sid 31, 32, 58, 71, and 74.

Syntax of TRC Queries

We now define these concepts formally, beginning with the notion of a formula. Let
Rel be a relation name, R and S be tuple variables, a an attribute of R, and b an
attribute of S. Let op denote an operator in the set {<, >, =, ≤, ≥, #=}. An atomic
formula is one of the following:

R ∈ Rel

R.a op S.b

R.a op constant, or constant op R.a

A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(R) denotes a formula in which the variable R appears:

any atomic formula

¬p, p ∧ q, p ∨ q, or p ⇒ q

∃R(p(R)), where R is a tuple variable

∀R(p(R)), where R is a tuple variable

In the last two clauses above, the quantifiers ∃ and ∀ are said to bind the variable
R. A variable is said to be free in a formula or subformula (a formula contained in a
108 Chapter 4

larger formula) if the (sub)formula does not contain an occurrence of a quantifier that
binds it.3

We observe that every variable in a TRC formula appears in a subformula that is


atomic, and every relation schema specifies a domain for each field; this observation
ensures that each variable in a TRC formula has a well-defined domain from which
values for the variable are drawn. That is, each variable has a well-defined type, in the
programming language sense. Informally, an atomic formula R ∈ Rel gives R the type
of tuples in Rel, and comparisons such as R.a op S.b and R.a op constant induce type
restrictions on the field R.a. If a variable R does not appear in an atomic formula of
the form R ∈ Rel (i.e., it appears only in atomic formulas that are comparisons), we
will follow the convention that the type of R is a tuple whose fields include all (and
only) fields of R that appear in the formula.

We will not define types of variables formally, but the type of a variable should be clear
in most cases, and the important point to note is that comparisons of values having
different types should always fail. (In discussions of relational calculus, the simplifying
assumption is often made that there is a single domain of constants and that this is
the domain associated with each field of each relation.)

A TRC query is defined to be expression of the form {T | p(T)}, where T is the only
free variable in the formula p.

Semantics of TRC Queries

What does a TRC query mean? More precisely, what is the set of answer tuples for a
given TRC query? The answer to a TRC query {T | p(T)}, as we noted earlier, is the
set of all tuples t for which the formula p(T ) evaluates to true with variable T assigned
the tuple value t. To complete this definition, we must state which assignments of tuple
values to the free variables in a formula make the formula evaluate to true.

A query is evaluated on a given instance of the database. Let each free variable in a
formula F be bound to a tuple value. For the given assignment of tuples to variables,
with respect to the given database instance, F evaluates to (or simply ‘is’) true if one
of the following holds:

F is an atomic formula R ∈ Rel, and R is assigned a tuple in the instance of


relation Rel.
3 We will make the assumption that each variable in a formula is either free or bound by exactly one
occurrence of a quantifier, to avoid worrying about details such as nested occurrences of quantifiers
that bind some, but not all, occurrences of variables.
Relational Algebra and Calculus 109

F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the tuples
assigned to R and S have field values R.a and S.b that make the comparison true.

F is of the form ¬p, and p is not true; or of the form p ∧ q, and both p and q are
true; or of the form p ∨ q, and one of them is true, or of the form p ⇒ q and q is
true whenever4 p is true.

F is of the form ∃R(p(R)), and there is some assignment of tuples to the free
variables in p(R), including the variable R,5 that makes the formula p(R) true.

F is of the form ∀R(p(R)), and there is some assignment of tuples to the free
variables in p(R) that makes the formula p(R) true no matter what tuple is
assigned to R.

Examples of TRC Queries

We now illustrate the calculus through several examples, using the instances B1 of
Boats, R2 of Reserves, and S3 of Sailors shown in Figures 4.15, 4.16, and 4.17. We will
use parentheses as needed to make our formulas unambiguous. Often, a formula p(R)
includes a condition R ∈ Rel, and the meaning of the phrases some tuple R and for all
tuples R is intuitive. We will use the notation ∃R ∈ Rel(p(R)) for ∃R(R ∈ Rel ∧ p(R)).
Similarly, we use the notation ∀R ∈ Rel(p(R)) for ∀R(R ∈ Rel ⇒ p(R)).

(Q12) Find the names and ages of sailors with a rating above 7.

{P | ∃S ∈ Sailors(S.rating > 7 ∧ P.name = S.sname ∧ P.age = S.age)}

This query illustrates a useful convention: P is considered to be a tuple variable with


exactly two fields, which are called name and age, because these are the only fields of
P that are mentioned and P does not range over any of the relations in the query;
that is, there is no subformula of the form P ∈ Relname. The result of this query is
a relation with two fields, name and age. The atomic formulas P.name = S.sname
and P.age = S.age give values to the fields of an answer tuple P . On instances B1,
R2, and S3, the answer is the set of tuples (Lubber, 55.5), (Andy, 25.5), (Rusty, 35.0),
(Zorba, 16.0), and (Horatio, 35.0).

(Q13) Find the sailor name, boat id, and reservation date for each reservation.

{P | ∃R ∈ Reserves ∃S ∈ Sailors
(R.sid = S.sid ∧ P.bid = R.bid ∧ P.day = R.day ∧ P.sname = S.sname)}

For each Reserves tuple, we look for a tuple in Sailors with the same sid. Given a
pair of such tuples, we construct an answer tuple P with fields sname, bid, and day by
4 Whenever should be read more precisely as ‘for all assignments of tuples to the free variables.’
5 Note that some of the free variables in p(R) (e.g., the variable R itself) may be bound in F .
110 Chapter 4

copying the corresponding fields from these two tuples. This query illustrates how we
can combine values from different relations in each answer tuple. The answer to this
query on instances B1, R2, and S3 is shown in Figure 4.20.

sname bid day


Dustin 101 10/10/98
Dustin 102 10/10/98
Dustin 103 10/8/98
Dustin 104 10/7/98
Lubber 102 11/10/98
Lubber 103 11/6/98
Lubber 104 11/12/98
Horatio 101 9/5/98
Horatio 102 9/8/98
Horatio 103 9/8/98

Figure 4.20 Answer to Query Q13

(Q1) Find the names of sailors who have reserved boat 103.
{P | ∃S ∈ Sailors ∃R ∈ Reserves(R.sid = S.sid∧R.bid = 103∧P.sname = S.sname)}
This query can be read as follows: “Retrieve all sailor tuples for which there exists a
tuple in Reserves, having the same value in the sid field, and with bid = 103.” That
is, for each sailor tuple, we look for a tuple in Reserves that shows that this sailor has
reserved boat 103. The answer tuple P contains just one field, sname.

(Q2) Find the names of sailors who have reserved a red boat.
{P | ∃S ∈ Sailors ∃R ∈ Reserves(R.sid = S.sid ∧ P.sname = S.sname
∧∃B ∈ Boats(B.bid = R.bid ∧ B.color =$red$ ))}
This query can be read as follows: “Retrieve all sailor tuples S for which there exist
tuples R in Reserves and B in Boats such that S.sid = R.sid, R.bid = B.bid, and
B.color =$red$ .” Another way to write this query, which corresponds more closely to
this reading, is as follows:
{P | ∃S ∈ Sailors ∃R ∈ Reserves ∃B ∈ Boats
(R.sid = S.sid ∧ B.bid = R.bid ∧ B.color =$red$ ∧ P.sname = S.sname)}

(Q7) Find the names of sailors who have reserved at least two boats.
{P | ∃S ∈ Sailors ∃R1 ∈ Reserves ∃R2 ∈ Reserves
(S.sid = R1.sid ∧ R1.sid = R2.sid ∧ R1.bid #= R2.bid ∧ P.sname = S.sname)}
Relational Algebra and Calculus 111

Contrast this query with the algebra version and see how much simpler the calculus
version is. In part, this difference is due to the cumbersome renaming of fields in the
algebra version, but the calculus version really is simpler.

(Q9) Find the names of sailors who have reserved all boats.

{P | ∃S ∈ Sailors ∀B ∈ Boats
(∃R ∈ Reserves(S.sid = R.sid ∧ R.bid = B.bid ∧ P.sname = S.sname))}

This query was expressed using the division operator in relational algebra. Notice
how easily it is expressed in the calculus. The calculus query directly reflects how we
might express the query in English: “Find sailors S such that for all boats B there is
a Reserves tuple showing that sailor S has reserved boat B.”

(Q14) Find sailors who have reserved all red boats.

{S | S ∈ Sailors ∧ ∀B ∈ Boats
(B.color =$red$ ⇒ (∃R ∈ Reserves(S.sid = R.sid ∧ R.bid = B.bid)))}

This query can be read as follows: For each candidate (sailor), if a boat is red, the
sailor must have reserved it. That is, for a candidate sailor, a boat being red must
imply the sailor having reserved it. Observe that since we can return an entire sailor
tuple as the answer instead of just the sailor’s name, we have avoided introducing a
new free variable (e.g., the variable P in the previous example) to hold the answer
values. On instances B1, R2, and S3, the answer contains the Sailors tuples with sids
22 and 31.

We can write this query without using implication, by observing that an expression of
the form p ⇒ q is logically equivalent to ¬p ∨ q:

{S | S ∈ Sailors ∧ ∀B ∈ Boats
(B.color #=$red$ ∨ (∃R ∈ Reserves(S.sid = R.sid ∧ R.bid = B.bid)))}

This query should be read as follows: “Find sailors S such that for all boats B, either
the boat is not red or a Reserves tuple shows that sailor S has reserved boat B.”

4.3.2 Domain Relational Calculus

A domain variable is a variable that ranges over the values in the domain of some
attribute (e.g., the variable can be assigned an integer if it appears in an attribute
whose domain is the set of integers). A DRC query has the form {(x1 , x2 , . . . , xn ) |
p((x1 , x2 , . . . , xn ))}, where each xi is either a domain variable or a constant and
p((x1 , x2 , . . . , xn )) denotes a DRC formula whose only free variables are the vari-
ables among the xi , 1 ≤ i ≤ n. The result of this query is the set of all tuples
(x1 , x2 , . . . , xn ) for which the formula evaluates to true.
112 Chapter 4

A DRC formula is defined in a manner that is very similar to the definition of a TRC
formula. The main difference is that the variables are now domain variables. Let op
denote an operator in the set {<, >, =, ≤, ≥, #=} and let X and Y be domain variables.
An atomic formula in DRC is one of the following:

(x1 , x2 , . . . , xn ) ∈ Rel, where Rel is a relation with n attributes; each xi , 1 ≤ i ≤ n


is either a variable or a constant.

X op Y

X op constant, or constant op X

A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(X) denotes a formula in which the variable X appears:

any atomic formula

¬p, p ∧ q, p ∨ q, or p ⇒ q

∃X(p(X)), where X is a domain variable

∀X(p(X)), where X is a domain variable

The reader is invited to compare this definition with the definition of TRC formulas
and see how closely these two definitions correspond. We will not define the semantics
of DRC formulas formally; this is left as an exercise for the reader.

Examples of DRC Queries

We now illustrate DRC through several examples. The reader is invited to compare
these with the TRC versions.

(Q11) Find all sailors with a rating above 7.

{(I, N, T, A) | (I, N, T, A) ∈ Sailors ∧ T > 7}

This differs from the TRC version in giving each attribute a (variable) name. The
condition (I, N, T, A) ∈ Sailors ensures that the domain variables I, N , T , and A are
restricted to be fields of the same tuple. In comparison with the TRC query, we can
say T > 7 instead of S.rating > 7, but we must specify the tuple (I, N, T, A) in the
result, rather than just S.

(Q1) Find the names of sailors who have reserved boat 103.

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors


∧∃Ir, Br, D((Ir, Br, D) ∈ Reserves ∧ Ir = I ∧ Br = 103))}
Relational Algebra and Calculus 113

Notice that only the sname field is retained in the answer and that only N is a free
variable. We use the notation ∃Ir, Br, D(. . .) as a shorthand for ∃Ir(∃Br(∃D(. . .))).
Very often, all the quantified variables appear in a single relation, as in this example.
An even more compact notation in this case is ∃(Ir, Br, D) ∈ Reserves. With this
notation, which we will use henceforth, the above query would be as follows:

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors


∧∃(Ir, Br, D) ∈ Reserves(Ir = I ∧ Br = 103))}

The comparison with the corresponding TRC formula should now be straightforward.
This query can also be written as follows; notice the repetition of variable I and the
use of the constant 103:

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors


∧∃D((I, 103, D) ∈ Reserves))}

(Q2) Find the names of sailors who have reserved a red boat.

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors


∧∃(I, Br, D) ∈ Reserves ∧ ∃(Br, BN,$red$ ) ∈ Boats)}

(Q7) Find the names of sailors who have reserved at least two boats.

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors ∧


∃Br1, Br2, D1, D2((I, Br1, D1) ∈ Reserves ∧ (I, Br2, D2) ∈ Reserves ∧ Br1 #= Br2))}

Notice how the repeated use of variable I ensures that the same sailor has reserved
both the boats in question.

(Q9) Find the names of sailors who have reserved all boats.

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors ∧


∀B, BN, C(¬((B, BN, C) ∈ Boats) ∨
(∃(Ir, Br, D) ∈ Reserves(I = Ir ∧ Br = B))))}

This query can be read as follows: “Find all values of N such that there is some tuple
(I, N, T, A) in Sailors satisfying the following condition: for every (B, BN, C), either
this is not a tuple in Boats or there is some tuple (Ir, Br, D) in Reserves that proves
that Sailor I has reserved boat B.” The ∀ quantifier allows the domain variables B,
BN , and C to range over all values in their respective attribute domains, and the
pattern ‘¬((B, BN, C) ∈ Boats)∨’ is necessary to restrict attention to those values
that appear in tuples of Boats. This pattern is common in DRC formulas, and the
notation ∀(B, BN, C) ∈ Boats can be used as a shorthand instead. This is similar to
114 Chapter 4

the notation introduced earlier for ∃. With this notation the query would be written
as follows:

{(N ) | ∃I, T, A((I, N, T, A) ∈ Sailors ∧ ∀(B, BN, C) ∈ Boats


(∃(Ir, Br, D) ∈ Reserves(I = Ir ∧ Br = B)))}

(Q14) Find sailors who have reserved all red boats.

{(I, N, T, A) | (I, N, T, A) ∈ Sailors ∧ ∀(B, BN, C) ∈ Boats


(C =$red$ ⇒ ∃(Ir, Br, D) ∈ Reserves(I = Ir ∧ Br = B))}

Here, we find all sailors such that for every red boat there is a tuple in Reserves that
shows the sailor has reserved it.

4.4 EXPRESSIVE POWER OF ALGEBRA AND CALCULUS *

We have presented two formal query languages for the relational model. Are they
equivalent in power? Can every query that can be expressed in relational algebra also
be expressed in relational calculus? The answer is yes, it can. Can every query that
can be expressed in relational calculus also be expressed in relational algebra? Before
we answer this question, we consider a major problem with the calculus as we have
presented it.

Consider the query {S | ¬(S ∈ Sailors)}. This query is syntactically correct. However,
it asks for all tuples S such that S is not in (the given instance of) Sailors. The set of
such S tuples is obviously infinite, in the context of infinite domains such as the set of
all integers. This simple example illustrates an unsafe query. It is desirable to restrict
relational calculus to disallow unsafe queries.

We now sketch how calculus queries are restricted to be safe. Consider a set I of
relation instances, with one instance per relation that appears in the query Q. Let
Dom(Q, I) be the set of all constants that appear in these relation instances I or in
the formulation of the query Q itself. Since we only allow finite instances I, Dom(Q, I)
is also finite.

For a calculus formula Q to be considered safe, at a minimum we want to ensure that


for any given I, the set of answers for Q contains only values that are in Dom(Q, I).
While this restriction is obviously required, it is not enough. Not only do we want the
set of answers to be composed of constants in Dom(Q, I), we wish to compute the set
of answers by only examining tuples that contain constants in Dom(Q, I)! This wish
leads to a subtle point associated with the use of quantifiers ∀ and ∃: Given a TRC
formula of the form ∃R(p(R)), we want to find all values for variable R that make this
formula true by checking only tuples that contain constants in Dom(Q, I). Similarly,
Relational Algebra and Calculus 115

given a TRC formula of the form ∀R(p(R)), we want to find any values for variable
R that make this formula false by checking only tuples that contain constants in
Dom(Q, I).

We therefore define a safe TRC formula Q to be a formula such that:

1. For any given I, the set of answers for Q contains only values that are in Dom(Q, I).

2. For each subexpression of the form ∃R(p(R)) in Q, if a tuple r (assigned to variable


R) makes the formula true, then r contains only constants in Dom(Q, I).

3. For each subexpression of the form ∀R(p(R)) in Q, if a tuple r (assigned to variable


R) contains a constant that is not in Dom(Q, I), then r must make the formula
true.

Note that this definition is not constructive, that is, it does not tell us how to check if
a query is safe.

The query Q = {S | ¬(S ∈ Sailors)} is unsafe by this definition. Dom(Q,I) is the


set of all values that appear in (an instance I of) Sailors. Consider the instance S1
shown in Figure 4.1. The answer to this query obviously includes values that do not
appear in Dom(Q, S1).

Returning to the question of expressiveness, we can show that every query that can be
expressed using a safe relational calculus query can also be expressed as a relational
algebra query. The expressive power of relational algebra is often used as a metric of
how powerful a relational database query language is. If a query language can express
all the queries that we can express in relational algebra, it is said to be relationally
complete. A practical query language is expected to be relationally complete; in ad-
dition, commercial query languages typically support features that allow us to express
some queries that cannot be expressed in relational algebra.

4.5 POINTS TO REVIEW

The inputs and outputs of a query are relations. A query takes instances of each
input relation and produces an instance of the output relation. (Section 4.1)

A relational algebra query describes a procedure for computing the output rela-
tion from the input relations by applying relational algebra operators. Internally,
database systems use some variant of relational algebra to represent query evalu-
ation plans. (Section 4.2)

Two basic relational algebra operators are selection (σ), to select subsets of a
relation, and projection (π), to select output fields. (Section 4.2.1)
116 Chapter 4

Relational algebra includes standard operations on sets such as union (∪), inter-
section (∩), set-difference (−), and cross-product (×). (Section 4.2.2)

Relations and fields can be renamed in relational algebra using the renaming
operator (ρ). (Section 4.2.3)

Another relational algebra operation that arises commonly in practice is the join
($%) —with important special cases of equijoin and natural join. (Section 4.2.4)

The division operation (/) is a convenient way to express that we only want tuples
where all possible value combinations—as described in another relation—exist.
(Section 4.2.5)

Instead of describing a query by how to compute the output relation, a relational


calculus query describes the tuples in the output relation. The language for spec-
ifying the output tuples is essentially a restricted subset of first-order predicate
logic. In tuple relational calculus, variables take on tuple values and in domain re-
lational calculus, variables take on field values, but the two versions of the calculus
are very similar. (Section 4.3)

All relational algebra queries can be expressed in relational calculus. If we restrict


ourselves to safe queries on the calculus, the converse also holds. An important cri-
terion for commercial query languages is that they should be relationally complete
in the sense that they can express all relational algebra queries. (Section 4.4)

EXERCISES

Exercise 4.1 Explain the statement that relational algebra operators can be composed. Why
is the ability to compose operators important?

Exercise 4.2 Given two relations R1 and R2, where R1 contains N1 tuples, R2 contains
N2 tuples, and N2 > N1 > 0, give the minimum and maximum possible sizes (in tuples) for
the result relation produced by each of the following relational algebra expressions. In each
case, state any assumptions about the schemas for R1 and R2 that are needed to make the
expression meaningful:

(1) R1 ∪ R2, (2) R1 ∩ R2, (3) R1 − R2, (4) R1 × R2, (5) σa=5 (R1), (6) πa (R1), and
(7) R1/R2

Exercise 4.3 Consider the following schema:

Suppliers(sid: integer, sname: string, address: string)


Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)
Relational Algebra and Calculus 117

The key fields are underlined, and the domain of each field is listed after the field name.
Thus sid is the key for Suppliers, pid is the key for Parts, and sid and pid together form the
key for Catalog. The Catalog relation lists the prices charged for parts by Suppliers. Write
the following queries in relational algebra, tuple relational calculus, and domain relational
calculus:

1. Find the names of suppliers who supply some red part.


2. Find the sids of suppliers who supply some red or green part.
3. Find the sids of suppliers who supply some red part or are at 221 Packer Ave.
4. Find the sids of suppliers who supply some red part and some green part.
5. Find the sids of suppliers who supply every part.
6. Find the sids of suppliers who supply every red part.
7. Find the sids of suppliers who supply every red or green part.
8. Find the sids of suppliers who supply every red part or supply every green part.
9. Find pairs of sids such that the supplier with the first sid charges more for some part
than the supplier with the second sid.
10. Find the pids of parts that are supplied by at least two different suppliers.
11. Find the pids of the most expensive parts supplied by suppliers named Yosemite Sham.
12. Find the pids of parts supplied by every supplier at less than $200. (If any supplier either
does not supply the part or charges more than $200 for it, the part is not selected.)

Exercise 4.4 Consider the Supplier-Parts-Catalog schema from the previous question. State
what the following queries compute:

1. πsname (πsid (σcolor=!red! P arts) #$ (σcost<100 Catalog) #$ Suppliers)


2. πsname (πsid ((σcolor=!red! P arts) #$ (σcost<100 Catalog) #$ Suppliers))
3. (πsname ((σcolor=!red! P arts) #$ (σcost<100 Catalog) #$ Suppliers)) ∩

(πsname ((σcolor=!green! P arts) #$ (σcost<100 Catalog) #$ Suppliers))

4. (πsid ((σcolor=!red! P arts) #$ (σcost<100 Catalog) #$ Suppliers)) ∩

(πsid ((σcolor=!green! P arts) #$ (σcost<100 Catalog) #$ Suppliers))

5. πsname ((πsid,sname ((σcolor=!red! P arts) #$ (σcost<100 Catalog) #$ Suppliers)) ∩

(πsid,sname ((σcolor=!green! P arts) #$ (σcost<100 Catalog) #$ Suppliers)))

Exercise 4.5 Consider the following relations containing airline flight information:

Flights(flno: integer, from: string, to: string,


distance: integer, departs: time, arrives: time)
Aircraft(aid: integer, aname: string, cruisingrange: integer)
Certified(eid: integer, aid: integer)
Employees(eid: integer, ename: string, salary: integer)
118 Chapter 4

Note that the Employees relation describes pilots and other kinds of employees as well; every
pilot is certified for some aircraft (otherwise, he or she would not qualify as a pilot), and only
pilots are certified to fly.

Write the following queries in relational algebra, tuple relational calculus, and domain rela-
tional calculus. Note that some of these queries may not be expressible in relational algebra
(and, therefore, also not expressible in tuple and domain relational calculus)! For such queries,
informally explain why they cannot be expressed. (See the exercises at the end of Chapter 5
for additional queries over the airline schema.)

1. Find the eids of pilots certified for some Boeing aircraft.


2. Find the names of pilots certified for some Boeing aircraft.
3. Find the aids of all aircraft that can be used on non-stop flights from Bonn to Madras.
4. Identify the flights that can be piloted by every pilot whose salary is more than $100,000.
(Hint: The pilot must be certified for at least one plane with a sufficiently large cruising
range.)
5. Find the names of pilots who can operate some plane with a range greater than 3,000
miles but are not certified on any Boeing aircraft.
6. Find the eids of employees who make the highest salary.
7. Find the eids of employees who make the second highest salary.
8. Find the eids of pilots who are certified for the largest number of aircraft.
9. Find the eids of employees who are certified for exactly three aircraft.
10. Find the total amount paid to employees as salaries.
11. Is there a sequence of flights from Madison to Timbuktu? Each flight in the sequence is
required to depart from the city that is the destination of the previous flight; the first
flight must leave Madison, the last flight must reach Timbuktu, and there is no restriction
on the number of intermediate flights. Your query must determine whether a sequence
of flights from Madison to Timbuktu exists for any input Flights relation instance.

Exercise 4.6 What is relational completeness? If a query language is relationally complete,


can you write any desired query in that language?

Exercise 4.7 What is an unsafe query? Give an example and explain why it is important
to disallow such queries.

BIBLIOGRAPHIC NOTES

Relational algebra was proposed by Codd in [156], and he showed the equivalence of relational
algebra and TRC in [158]. Earlier, Kuhns [392] considered the use of logic to pose queries.
LaCroix and Pirotte discussed DRC in [397]. Klug generalized the algebra and calculus to
include aggregate operations in [378]. Extensions of the algebra and calculus to deal with
aggregate functions are also discussed in [503]. Merrett proposed an extended relational
algebra with quantifiers such as the number of, which go beyond just universal and existential
quantification [460]. Such generalized quantifiers are discussed at length in [42].
SQL QUERIES PROGRAMMING AND TRIGGERS

Introduction to SQL
Structure Query Language (SQL) is a programming language used for storing and managing data
in RDBMS. SQL was the first commercial language introduced for E.F Codd's Relational model.
What is SQL?

 SQL stands for Structured Query Language


 SQL lets you access and manipulate databases
 SQL is an ANSI (American National Standards Institute) standard

 SQL keywords are NOT case sensitive, select is the same as SELECT
 Some database systems require a semicolon at the end of each SQL statement. Semicolon
is the standard way to separate each SQL statement in database systems that allow more
than one SQL statement to be executed in the same call to the server.

DDL : DATA DEFINITION LANGUAGE


All DDL commands are auto-committed. That means it saves all the changes permanently in the
database.

Command Description

create to create new table or database

alter for alteration

truncate delete data from table

drop to drop a table

rename to rename a table

 CREATE:
The CREATE DATABASE statement is used to create a new SQL database.
Syntax
CREATE DATABASE databasename;

OR

The CREATE Table statement is used to create a new Table in SQL database.
create table table-name
{
column-name1 datatype1,
column-name2 datatype2,
column-name3 datatype3,
column-name4 datatype4
};
Example:

create table Student(id int, name varchar, age int);


Create Table Using Another Table
A copy of an existing table can be created using a combination of the CREATE TABLE statement
and the SELECT statement.
 The new table gets the same column definitions.
 All columns or specific columns can be selected.
 If you create a new table using an existing table, the new table will be filled with the existing
values from the old table.
Syntax
CREATE TABLE new_table_name AS
SELECT column1, column2,... // selected specific columns from existing table
FROM existing_table_name
WHERE ....;

OR
CREATE TABLE new table
AS (SELECT * FROM old table); // selected all columns from existing table
OR

CREATE TABLE new_table


AS (SELECT *
FROM old_table WHERE 1=2); // selected all columns from existing table without
// data
 ALTER:

 The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
 The ALTER TABLE statement is also used to add and drop various constraints on an existing
table.

 Alter Table - Add Column


Syntax:
ALTER TABLE table_name
ADD column_name datatype;

 Alter Table – Drop Column


Syntax:
ALTER TABLE table_name
DROP COLUMN column_name;
 Alter Table – Alter/ Modify Column

Syntax:

ALTER TABLE table_name ALTER TABLE table_name


ALTER COLUMN column_name datatype; OR MODIFY COLUMN column_name datatype;

 TRUNCATE:

TRUNCATE TABLE command is used to delete complete data from an existing table.

Syntax:

TRUNCATE TABLE table_name;

 DROP/ DELETE:
The SQL DROP TABLE statement is used to remove a table definition and all the data, indexes,
triggers, constraints and permission specifications for that table.

Syntax:

DROP TABLE table_name;

 RENAME:
RENAME command is used to rename a table.

Syntax:

RENAME TABLE old_table_name to new_table_name;

DML : DATA MANIPULATION LANGUAGE


DML commands are not auto-committed. It means changes are not permanent to database, they
can be rolled back.

Command Description

insert to insert a new row

update to update existing row

delete to delete a row


 INSERT:
The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.

Syntax:

INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)


VALUES (value1, value2, value3,...valueN);
OR

INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);

OR

INSERT into Student values (103,'Chris',default)

Any default value that has been set while creating


Table, will apply automatically to this field

 UPDATE:
The UPDATE statement is used to modify the existing records in a table.
Be careful when updating records. If you omit the WHERE clause, ALL records will be
updated!

Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example:
UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;

 DELETE:
Delete command is used to delete data from a table. Delete command can also be used with
condition to delete a particular row.
Be careful when deleting records in a table! Notice the WHERE clause in the DELETE
statement. The WHERE clause specifies which record(s) that should be deleted. If you omit
the WHERE clause, all records in the table will be deleted!
Syntax:
DELETE FROM table_name;
OR
DELETE * FROM table_name; // command will delete all the records from Student table.
Example:
DELETE FROM student;
OR

DELETE FROM student where s_id=103; // command will delete the record where s_id is 103

TCL: TRANSACTION CONTROL LANGUAGE

Transaction Control Language (TCL) commands are used to manage transactions in database. These
are used to manage the changes made by DML statements. It also allows statements to be grouped
together into logical transactions.
 COMMIT:
Commit command is used to permanently save any transaction into database.
Syntax:
Commit;

 ROLLBACK:
This command restores the database to last commited state. It is also use with savepoint
command to jump to a savepoint in a transaction.
Syntax:
Rollback to savepoint-name;
 SAVEPOINT:
savepoint command is used to temporarily save a transaction so that you can rollback to
that point whenever necessary.
Syntax:
Savepoint savepoint-name;
Example of Savepoint and Rollback
Following is the class table,

ID NAME

1 abhi

2 adam

4 alex

SQL Queries:
INSERT into class values(5,'Rahul');
commit;
UPDATE class set name='abhijit' where id='5';
savepoint A;
INSERT into class values(6,'Chris');
savepoint B;
INSERT into class values(7,'Bravo');
savepoint C;
SELECT * from class;

The resultant table will look like,

ID NAME

1 abhi

2 adam

4 alex

5 abhijit

6 chris

7 bravo

Now rollback to savepoint B


rollback to B;
SELECT * from class;

The resultant table will look like

ID NAME

1 abhi

2 adam

4 alex

5 abhijit

6 chris
Now rollback to savepoint A
Rollback to A;
SELECT * from class;
The result table will look like

ID NAME

1 abhi

2 adam

4 alex

5 abhijit

SET OPERATIONS IN SQL


SQL supports few Set operations which can be performed on the table data. These are used to
get meaningful results from data stored in the table, under different special conditions.

1. UNION

2. UNION ALL

3. INTERSECT

4. MINUS (EXCEPT)

UNION Operation

The UNION operator is used to combine the result-set of two or more SELECT statements.

 Each SELECT statement within UNION must have the same number of columns
 The columns must also have similar data types
 The columns in each SELECT statement must also be in the same order
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

Example of UNION
The First table,

ID Name

1 abhi

2 adam

The Second table,

ID Name

2 adam

3 Chester

Union SQL query will be,


SELECT * FROM First
UNION
SELECT * FROM Second;
The resultset table will look like,

ID NAME

1 abhi

2 adam

3 Chester
UNION ALL
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION
ALL:

Syntax:

SELECT column_name(s) FROM table1


UNION ALL
SELECT column_name(s) FROM table2;

 The column names in the result-set are usually equal to the column names in the first SELECT
statement in the UNION.
 The union operation automatically eliminates duplicates, unlike the select clause.

Example of Union All


According to The First and Second table,
Union All query will be like,
SELECT * FROM First
UNION ALL
SELECT * FROM Second;
The resultset table will look like,

ID NAME

1 abhi

2 adam

2 adam

3 Chester
INTERSECT
 It is used to combine two SELECT statements. The Intersect operation returns the common
rows from both the SELECT statements.
 In the Intersect operation, the number of datatype and columns must be the same.
 It has no duplicates and it arranges the data in ascending order by default.
 If we want to retain all duplicates, we must write intersect all in place of intersect.

Intersect Syntax
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;

Example of Intersect
According to The First and Second table,
Intersect query will be,
SELECT * FROM First
INTERSECT
SELECT * FROM Second;
The resultset table will look like

ID NAME

2 adam

EXCEPT/MINUS
 It combines the result of two SELECT statements. Minus operator is used to display the rows
which are present in the first query but absent in the second query.
 It has no duplicates and data arranged in ascending order by default.
 If we want to retain duplicates, we must write except all in place of except.
Except Syntax:
SELECT column_name(s) FROM table1
EXCEPT/MINUS
SELECT column_name(s) FROM table2;

Example of Minus
According to The First and Second table,
Minus query will be,
SELECT * FROM First
MINUS
SELECT * FROM Second;
The resultset table will look like,

ID NAME

1 abhi

AGGREGATE OPERATORS
These functions return a single value after performing calculations on a group of values. Following
are some of the frequently used Aggregrate functions.

AVG() Function
Average returns average value after calculating it from values in a numeric column.
Its general syntax is,
SELECT AVG(column_name) FROM table_name

Using AVG() function


Consider the following Emp table
eid name age salary

401 Anu 22 9000

402 Shane 29 8000

403 Rohan 34 6000

404 Scott 44 10000

405 Tiger 35 8000

SQL query to find average salary will be,


SELECT avg(salary) from Emp;
Result of the above query will be,

avg(salary)

8200

COUNT() Function
Count returns the number of rows present in the table either based on some condition or without
condition.
Its general syntax is,
SELECT COUNT(column_name) FROM table-name

Using COUNT() function


Consider the above Emp table
SQL query to count employees, satisfying specified condition is,
SELECT COUNT(name) FROM Emp WHERE salary = 8000;
Result of the above query will be,

count(name)

Example of COUNT(distinct)
Consider the above Emp table
SQL query is,
SELECT COUNT(DISTINCT salary) FROM emp;
Result of the above query will be,

count(distinct salary)

MAX() Function
MAX function returns maximum value from selected column of the table.
Syntax of MAX function is,
SELECT MAX(column_name) from table-name;
Using MAX() function
Consider the above Emp table
SQL query to find the Maximum salary will be,
SELECT MAX(salary) FROM emp;
Result of the above query will be,

MAX(salary)

10000

MIN() Function
MIN function returns minimum value from a selected column of the table.
Syntax for MIN function is,
SELECT MIN(column_name) from table-name;
Using MIN() function
Consider the above Emp table,
SQL query to find minimum salary is,
SELECT MIN(salary) FROM emp;
Result will be,

MIN(salary)

6000
SUM() Function
SUM function returns total sum of a selected columns numeric values.
Syntax for SUM is,
SELECT SUM(column_name) from table-name;

Using SUM() function


Consider the above Emp table
SQL query to find sum of salaries will be,
SELECT SUM(salary) FROM emp;
Result of above query is,

SUM(salary)

41000

SET COMPARISON OPERATORS


Comparison operators are used in the WHERE clause to determine which records to select. Here
is a list of the comparison operators that you can use in SQL:
Comparison Operator Description

= Equal

<> Not Equal

!= Not Equal

> Greater Than


>= Greater Than or Equal

< Less Than

<= Less Than or Equal

IN ( ) Matches a value in a list

NOT Negates a condition


BETWEEN Within a range (inclusive)

IS NULL NULL value


IS NOT NULL Non-NULL value

LIKE Pattern matching with % and _

EXISTS Condition is met if subquery returns at least one row


Example - Equality Operator
In SQL, you can use the = operator to test for equality in a query.
In this example, we have a table called suppliers with the following data:
supplier_id supplier_name city state

100 Microsoft Redmond Washington


200 Google Mountain View California

300 Oracle Redwood City California

400 Kimberly-Clark Irving Texas

500 Tyson Foods Springdale Arkansas

600 SC Johnson Racine Wisconsin


700 Dole Food Company Westlake Village California

800 Flowers Foods Thomasville Georgia


900 Electronic Arts Redwood City California
Enter the following SQL statement:

SELECT *
FROM suppliers
WHERE supplier_name = 'Microsoft';

There will be 1 record selected. These are the results that you should see:
supplier_id supplier_name city state

100 Microsoft Redmond Washington


In this example, the SELECT statement above would return all rows from the suppliers table where
the supplier_name is equal to Microsoft.
Example - Inequality Operator
In SQL, there are two ways to test for inequality in a query. You can use either
the <> or != operator. Both will return the same results.
Let's use the same suppliers table as the previous example.
Enter the following SQL statement to test for inequality using the <> operator:

SELECT *
FROM suppliers
WHERE supplier_name <> 'Microsoft';

OR enter this next SQL statement to use the != operator:

SELECT *
FROM suppliers
WHERE supplier_name != 'Microsoft';

There will be 8 records selected. These are the results you should see with either one of the SQL
statements:
supplier_id supplier_name city state

200 Google Mountain View California

300 Oracle Redwood City California

400 Kimberly-Clark Irving Texas


500 Tyson Foods Springdale Arkansas

600 SC Johnson Racine Wisconsin

700 Dole Food Company Westlake Village California

800 Flowers Foods Thomasville Georgia

900 Electronic Arts Redwood City California


In the example, both SELECT statements would return all rows from the suppliers table where
the supplier_name is not equal to Microsoft.
Example - Greater Than Operator
You can use the > operator in SQL to test for an expression greater than.
In this example, we have a table called customers with the following data:

customer_id last_name first_name favorite_website

4000 Jackson Joe techonthenet.com

5000 Smith Jane digminecraft.com

6000 Ferguson Samantha bigactivities.com


7000 Reynolds Allen checkyourmath.com

8000 Anderson Paige NULL

9000 Johnson Derek techonthenet.com


Enter the following SQL statement:

SELECT *
FROM customers
WHERE customer_id > 6000;

There will be 3 records selected. These are the results that you should see:
customer_id last_name first_name favorite_website

7000 Reynolds Allen checkyourmath.com


8000 Anderson Paige NULL
9000 Johnson Derek techonthenet.com

Example - Greater Than or Equal Operator


In SQL, you can use the >= operator to test for an expression greater than or equal to.
Let's use the same customers table as the previous example.
Enter the following SQL statement:

SELECT *
FROM customers
WHERE customer_id >= 6000;

There will be 4 records selected. These are the results that you should see:
customer_id last_name first_name favorite_website

6000 Ferguson Samantha bigactivities.com

7000 Reynolds Allen checkyourmath.com


8000 Anderson Paige NULL

9000 Johnson Derek techonthenet.com

Example - Less Than Operator


You can use the < operator in SQL to test for an expression less than.
In this example, we have a table called products with the following data:
product_id product_name category_id

1 Pear 50

2 Banana 50

3 Orange 50

4 Apple 50
5 Bread 75

6 Sliced Ham 25

7 Kleenex NULL
Enter the following SQL statement:

SELECT *
FROM products
WHERE product_id < 5;

There will be 4 records selected. These are the results that you should see:
product_id product_name category_id

1 Pear 50
2 Banana 50
3 Orange 50

4 Apple 50
Example - Less Than or Equal Operator
In SQL, you can use the <= operator to test for an expression less than or equal to.
Let's use the same products table as the previous example.
Enter the following SQL statement:

SELECT *
FROM products
WHERE product_id <= 5;

There will be 5 records selected. These are the results that you should see:
product_id product_name category_id

1 Pear 50

2 Banana 50

3 Orange 50
4 Apple 50

5 Bread 75

THE SQL ORDER BY KEYWORD

The ORDER BY keyword is used to sort the result-set in ascending or descending order.

The ORDER BY keyword sorts the records in ascending order by default. To sort the records in
descending order, use the DESC keyword.

ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

Example: SELECT * FROM Customers


ORDER BY Country;

SELECT * FROM Customers


ORDER BY Country DESC;

THE SQL GROUP BY STATEMENT

The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG)
to group the result-set by one or more columns.
GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

Example: SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country;

SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;

THE SQL HAVING CLAUSE

The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.

HAVING Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);

Example: SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5
ORDER BY COUNT(CustomerID) DESC;

THE SQL LIKE OPERATOR

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are two wildcards used in conjunction with the LIKE operator:
 % - The percent sign represents zero, one, or multiple characters
 _ - The underscore represents a single character

The percent sign and the underscore can also be used in combinations!

LIKE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE columnN LIKE pattern;

You can also combine any number of conditions using AND or OR operators.

Here are some examples showing different LIKE operators with '%' and '_' wildcards:

LIKE Operator Description

WHERE CustomerName LIKE 'a%' Finds any values that starts with "a"

WHERE CustomerName LIKE '%a' Finds any values that ends with "a"

WHERE CustomerName LIKE '%or%' Finds any values that have "or" in any
position

WHERE CustomerName LIKE '_r%' Finds any values that have "r" in the second
position

WHERE CustomerName LIKE 'a_%_%' Finds any values that starts with "a" and are
at least 3 characters in length

WHERE ContactName LIKE 'a%o' Finds any values that starts with "a" and
ends with "o"

Examples:

The following SQL statement selects all customers with a CustomerName starting with "a":

SELECT * FROM Customers


WHERE CustomerName LIKE 'a%';

The following SQL statement selects all customers with a CustomerName ending with "a":

SELECT * FROM Customers


WHERE CustomerName LIKE '%a';
The following SQL statement selects all customers with a CustomerName that have "or" in any
position:

SELECT * FROM Customers


WHERE CustomerName LIKE '%or%';

The following SQL statement selects all customers with a CustomerName that have "r" in the second
position:

SELECT * FROM Customers


WHERE CustomerName LIKE '_r%';

The following SQL statement selects all customers with a CustomerName that starts with "a" and
are at least 3 characters in length:

SELECT * FROM Customers


WHERE CustomerName LIKE 'a_%_%';

The following SQL statement selects all customers with a ContactName that starts with "a" and
ends with "o":

SELECT * FROM Customers


WHERE ContactName LIKE 'a%o';

The following SQL statement selects all customers with a CustomerName that NOT starts with "a":

SELECT * FROM Customers


WHERE CustomerName NOT LIKE 'a%';

THE SQL BETWEEN OPERATOR

The BETWEEN operator selects values within a given range. The values can be numbers, text, or
dates.

The BETWEEN operator is inclusive: begin and end values are included.

BETWEEN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;

Example: SELECT * FROM Products


WHERE Price BETWEEN 10 AND 20;

SELECT * FROM Products


WHERE Price NOT BETWEEN 10 AND 20;
THE SQL IN OPERATOR

The IN operator allows you to specify multiple values in a WHERE clause.

The IN operator is a shorthand for multiple OR conditions.

IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);

or:

SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);

Example: SELECT * FROM Customers


WHERE Country IN ('Germany', 'France', 'UK');

SELECT * FROM Customers


WHERE Country NOT IN ('Germany', 'France', 'UK');

SELECT * FROM Customers


WHERE Country IN (SELECT Country FROM Suppliers);

NESTED & CORRELATED SUBQUERIES


There are two main types of subqueries - nested and correlated. Subqueries are nested, when the
subquery is executed first, and its results are inserted into Where clause of the main query.
Correlated subqueries are the opposite case, where the main query is executed first and the
subquery is executed for every row returned by the main query.

Nested Subqueries
A subquery is nested when you are having a subquery in the where or having clause of another
subquery.

Get the result of all the students who are enrolled in the same course as the student with
ROLLNO 12.
Select *
From result
where rollno in (select rollno
from student
where courseid = (select courseid
from student
where rollno = 12));
The innermost subquery will be executed first and then based on its result the next subquery will
be executed and based on that result the outer query will be executed. The levels to which you
can do the nesting is implementation-dependent.

A subquery can be nested inside other subqueries. SQL has an ability to nest queries within one
another. A subquery is a SELECT statement that is nested within another SELECT statement and
which return intermediate results. SQL executes innermost subquery first, then next level.

Example -1 : Nested subqueries


If we want to retrieve that unique job_id and there average salary from the employees table which
unique job_id have a salary is smaller than (the maximum of averages of min_salary of each unique
job_id from the jobs table which job_id are in the list, picking from (the job_history table which is
within the department_id 50 and 100)) the following SQL statement can be used :

Sample table: employees


employee_id first_name last_name email phone_number hire_date job_id
salary commission_pct manager_id department_id
----------- ---------- ---------- ---------- ------------ ---------- ----------
---------- -------------- ---------- -------------
100 Steven King SKING 515.123.4567 6/17/1987 AD_PRES
24000 90
101 Neena Kochhar NKOCHHAR 515.123.4568 6/18/1987 AD_VP
17000 100 90
102 Lex De Haan LDEHAAN 515.123.4569 6/19/1987 AD_VP
17000 100 90
103 Alexander Hunold AHUNOLD 590.423.4567 6/20/1987 IT_PROG
9000 102 60
104 Bruce Ernst BERNST 590.423.4568 6/21/1987 IT_PROG
6000 103 60
105 David Austin DAUSTIN 590.423.4569 6/22/1987 IT_PROG
4800 103 60

Sample table: jobs

JOB_ID JOB_TITLE MIN_SALARY MAX_SALARY


AD_PRES President 20000 40000
AD_VP Administration Vice President 15000 30000
AD_ASST Administration Assistant 3000 6000
FI_MGR Finance Manager 8200 16000
FI_ACCOUNT Accountant 4200 9000
AC_MGR Accounting Manager 8200 16000
AC_ACCOUNT Public Accountant 4200 9000
SA_MAN Sales Manager 10000 20000
SA_REP Sales Representative 6000 12000
PU_MAN Purchasing Manager 8000 15000
PU_CLERK Purchasing Clerk 2500 5500
ST_MAN Stock Manager 5500 8500
JOB_ID JOB_TITLE MIN_SALARY MAX_SALARY
ST_CLERK Stock Clerk 2000 5000
SH_CLERK Shipping Clerk 2500 5500
IT_PROG Programmer 4000 10000
MK_MAN Marketing Manager 9000 15000
MK_REP Marketing Representative 4000 9000
HR_REP Human Resources Representative 4000 9000
PR_REP Public Relations Representative 4500 10500
SQL Code:

SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<
(SELECT MAX(AVG(min_salary))
FROM jobs
WHERE job_id IN
(SELECT job_id FROM job_history
WHERE department_id
BETWEEN 50 AND 100)
GROUP BY job_id);
OR

SELECT job_id,AVG(salary)
SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<
(SELECT MAX(myavg) from (select job_id,AVG(min_salary) as myavg
FROM jobs
WHERE job_id IN
(SELECT job_id FROM job_history
WHERE department_id
BETWEEN 50 AND 100)
GROUP BY job_id) ss);

Output

JOB_ID AVG(SALARY)
---------- -----------
IT_PROG 5760
AC_ACCOUNT 8300
ST_MAN 7280
AD_ASST 4400
SH_CLERK 3215
FI_ACCOUNT 7920
PU_CLERK 2780
SA_REP 8350
MK_REP 6000
ST_CLERK 2785
HR_REP 6500

Explanation:
This example contains three queries: a nested subquery, a subquery, and the outer query. These
parts of queries are runs in that order.

Let's break the example down into three parts and observes the results returned.

At first the nested subquery as follows:

SQL Code:

SELECT job_id FROM job_history


WHERE department_id
BETWEEN 50 AND 100;

This nested subquery retrieves the job_id(s) from job_history table which is within the
department_id 50 and 100.

Output:

JOB_ID
----------
ST_CLERK
ST_CLERK
IT_PROG
SA_REP
SA_MAN
AD_ASST
AC_ACCOUNT

Here is the pictorial representation of how the above output comes.


Now the subquery that receives output from the nested subquery stated above.

SELECT MAX(AVG(min_salary))
FROM jobs WHERE job_id
IN(.....output from the nested subquery......)
GROUP BY job_id

The subquery internally works as follows:

SQL Code:

SELECT MAX(AVG(min_salary))
FROM jobs
WHERE job_id
IN (
'ST_CLERK','ST_CLERK','IT_PROG',
'SA_REP','SA_MAN','AD_ASST', '
AC_ACCOUNT')
GROUP BY job_id;

The subquery returns the maximum of averages of min_salary for each unique job_id return ( i.e.
'ST_CLERK','ST_CLERK','IT_PROG', 'SA_REP','SA_MAN','AD_ASST', 'AC_ACCOUNT' ) by the previous
subquery.
Output:

MAX(AVG(MIN_SALARY))
--------------------
10000
Here is the pictorial representation of how the above output returns.

Now the outer query that receives output from the subquery and which also receives the output
from the nested subquery stated above.

SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<
(.....output from the subquery(
output from the nested subquery)......)

The outer query internally works as follows:

SQL Code:
SELECT job_id,AVG(salary)
FROM employees
GROUP BY job_id
HAVING AVG(salary)<10000;

The outer query returns the job_id, average salary of employees that are less than maximum of
average of min_salary returned by the previous query

Output:

JOB_ID AVG(SALARY)
---------- -----------
IT_PROG 5760
AC_ACCOUNT 8300
ST_MAN 7280
AD_ASST 4400
SH_CLERK 3215
FI_ACCOUNT 7920
PU_CLERK 2780
SA_REP 8350
MK_REP 6000
ST_CLERK 2785
HR_REP 6500

Correlated Subquery
A Correlated Subquery is one that is executed after the outer query is executed. So correlated
subqueries take an approach opposite to that of normal subqueries. The correlated subquery
execution is as follows:

-The outer query receives a row.


-For each candidate row of the outer query, the subquery (the correlated subquery) is executed
once.
-The results of the correlated subquery are used to determine whether the candidate row should
be part of the result set.
-The process is repeated for all rows.

Correlated Subqueries differ from the normal subqueries in that the nested SELECT statement
referes back to the table in the first SELECT statement.

To find out the names of all the students who appeared in more than three papers of their opted
course, the SQL will be

Select name
from student A
Where 3 < (select count (*)
from result b
where b.rollno = a.rollno);

In other words, a correlated subquery is one whose value depends upon some variable that receives
its value in some outer query. A non-correlated subquery as said before is evaluted in a bottom-
to-up manner, i.e. the inner most query is evaluated first. But a correlated subquery is resolved in
a top-to-bottom fashion. The top most query is analyzed and based on that result the next query
is initiated. Such a subquery has to be evaluated repeatedly, once for each value of the variable in
question, instead of once and for all.

SQL Correlated Subqueries are used to select data from a table referenced in the outer query. The
subquery is known as a correlated because the subquery is related to the outer query. In this type
of queries, a table alias (also called a correlation name) must be used to specify which table
reference is to be used.

The alias is the pet name of a table which is brought about by putting directly after the table name
in the FROM clause. This is suitable when anybody wants to obtain information from two separate
tables.

Example: SQL Correlated Subqueries

The following correlated subqueries retrive ord_num, ord_amount, cust_code and agent_code from
the table orders ( 'a' and 'b' are the aliases of orders and agents table) with following conditions -

the agent_code of orders table must be the same agent_code of agents table and agent_name of
agents table must be Alex, the following SQL statement can be used:

Sample table: orders


ORD_NUM ORD_AMOUNT ADVANCE_AMOUNT ORD_DATE CUST_CODE AGENT_CODE RD_DESCRIPTION
---------- ---------- -------------- --------- --------------- --------------- -------
200114 3500 2000 15-AUG-08 C00002 A008
200122 2500 400 16-SEP-08 C00003 A004
200118 500 100 20-JUL-08 C00023 A006
200119 4000 700 16-SEP-08 C00007 A010
200121 1500 600 23-SEP-08 C00008 A004
200130 2500 400 30-JUL-08 C00025 A011
200134 4200 1800 25-SEP-08 C00004 A005
200108 4000 600 15-FEB-08 C00008 A004
200103 1500 700 15-MAY-08 C00021 A005
200105 2500 500 18-JUL-08 C00025 A011
200109 3500 800 30-JUL-08 C00011 A010
200101 3000 1000 15-JUL-08 C00001 A008
200111 1000 300 10-JUL-08 C00020 A008
200104 1500 500 13-MAR-08 C00006 A004
200106 2500 700 20-APR-08 C00005 A002
200125 2000 600 10-OCT-08 C00018 A005
200117 800 200 20-OCT-08 C00014 A001
200123 500 100 16-SEP-08 C00022 A002
200120 500 100 20-JUL-08 C00009 A002
200116 500 100 13-JUL-08 C00010 A009
200124 500 100 20-JUN-08 C00017 A007
200126 500 100 24-JUN-08 C00022 A002
200129 2500 500 20-JUL-08 C00024 A006
200127 2500 400 20-JUL-08 C00015 A003
200128 3500 1500 20-JUL-08 C00009 A002
200135 2000 800 16-SEP-08 C00007 A010
200131 900 150 26-AUG-08 C00012 A012
200133 1200 400 29-JUN-08 C00009 A002
200100 1000 600 08-JAN-08 C00015 A003
200110 3000 500 15-APR-08 C00019 A010
200107 4500 900 30-AUG-08 C00007 A010
200112 2000 400 30-MAY-08 C00016 A007
200113 4000 600 10-JUN-08 C00022 A002
200102 2000 300 25-MAY-08 C00012 A012

Sample table: agents


+------------+----------------------+--------------------+------------+---------------
| AGENT_CODE | AGENT_NAME | WORKING_AREA | COMMISSION | PHONE_NO
|
+------------+----------------------+--------------------+------------+---------------
| A007 | Ramasundar | Bangalore | 0.15 | 077-25814763
| |
| A003 | Alex | London | 0.13 | 075-12458969
| |
| A008 | Alford | New York | 0.12 | 044-25874365
| |
| A011 | Ravi Kumar | Bangalore | 0.15 | 077-45625874
| |
| A010 | Santakumar | Chennai | 0.14 | 007-22388644
| |
| A012 | Lucida | San Jose | 0.12 | 044-52981425
| |
| A005 | Anderson | Brisban | 0.13 | 045-21447739
| |
| A001 | Subbarao | Bangalore | 0.14 | 077-12346674
| |
| A002 | Mukesh | Mumbai | 0.11 | 029-12358964
| |
| A006 | McDen | London | 0.15 | 078-22255588
| |
| A004 | Ivan | Torento | 0.15 | 008-22544166
| |
| A009 | Benjamin | Hampshair | 0.11 | 008-22536178
| |
+------------+----------------------+--------------------+------------+---------------

SQL Code:

SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code
FROM orders a
WHERE a.agent_code=(
SELECT b.agent_code
FROM agents b WHERE b.agent_name='Alex');

Output:

ORD_NUM ORD_AMOUNT CUST_CODE AGENT_CODE


---------- ---------- ---------- ----------
200127 2500 C00015 A003
200100 1000 C00015 A003
The inner of the above query returns the 'agent_code' A003.

The simplified form of above code is:

SQL Code:

SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code
FROM orders a
WHERE a.agent_code='A003';

Pictorical Presentation:

ACTIVE DATABASES AND TRIGGERS


The model that has been used to specify active database rules is referred to as the Event-
Condition-Action (ECA) model. A rule in the ECA model has three components:
1. The event(s) that triggers the rule: These events are usually database update operations that are
explicitly applied to the database. However, in the general model, they could also be temporal
events2 or other kinds of external events.

2. The condition that determines whether the rule action should be executed: Once the triggering
event has occurred, an optional condition may be evaluated. If no condition is specified, the action
will be executed once the event occurs. If a condition is specified, it is first evaluated, and only if it
evaluates to true will the rule action be executed.

3. The action to be taken: The action is usually a sequence of SQL statements, but it could also be
a database transaction or an external program that will be automatically executed.

Trigger

A SQL trigger is a set of SQL statements stored in the database catalog. A SQL trigger is executed
or fired whenever an event associated with a table occurs e.g., insert, update or delete.

A SQL trigger is a special type of stored procedure. It is special because it is not called directly like
a stored procedure. The main difference between a trigger and a stored procedure is that a trigger
is called automatically when a data modification event is made against a table whereas a stored
procedure must be called explicitly.

Advantages of using SQL triggers

 SQL triggers provide an alternative way to check the integrity of data.


 SQL triggers can catch errors in business logic in the database layer.
 SQL triggers provide an alternative way to run scheduled tasks. By using SQL triggers, you
don’t have to wait to run the scheduled tasks because the triggers are invoked
automatically before or after a change is made to the data in the tables.
 SQL triggers are very useful to audit the changes of data in tables.

Disadvantages of using SQL triggers

 SQL triggers only can provide an extended validation and they cannot replace all the
validations. Some simple validations have to be done in the application layer. For example,
you can validate user’s inputs in the client side by using JavaScript or in the server side using
server-side scripting languages such as JSP, PHP, ASP.NET, Perl, etc.
 SQL triggers are invoked and executed invisible from the client applications, therefore, it is
difficult to figure out what happen in the database layer.
 SQL triggers may increase the overhead of the database server.
Triggers are stored programs, which are automatically executed or fired when some events occur.
Triggers are, in fact, written to be executed in response to any of the following events −

 A database manipulation (DML) statement (DELETE, INSERT, or UPDATE)

 A database definition (DDL) statement (CREATE, ALTER, or DROP).

 A database operation (SERVERERROR, LOGON, LOGOFF, STARTUP, or SHUTDOWN).

Triggers can be defined on the table, view, schema, or database with which the event is associated.

There are two types of triggers.

 Row level trigger: - Row level trigger is executed when each row of the table is inserted/
updated/ deleted. If it is a row level trigger, then we have to explicitly specify while creating
the trigger, as we did in the above example. Also, we have to specify the WHEN (condition)
in the trigger.
 Statement level trigger: - this trigger will be executed only once for DML statement. This
DML statement may insert / delete/ update one row or multiple rows or whole table.
Irrespective of number of rows, this trigger will be fired for the statement. If we have not
specified the type of trigger while creating, by default it would be a statement level trigger.

Benefits of Triggers

Triggers can be written for the following purposes −

 Generating some derived column values automatically


 Enforcing referential integrity
 Event logging and storing information on table access
 Auditing
 Synchronous replication of tables
 Imposing security authorizations
 Preventing invalid transactions
Creating Triggers
The syntax for creating a trigger is −

CREATE [OR REPLACE ] TRIGGER trigger_name


{BEFORE | AFTER | INSTEAD OF}
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF col_name]
ON table_name
[REFERENCING OLD AS o NEW AS n]
[FOR EACH ROW]
WHEN (condition)
DECLARE
Declaration-statements
BEGIN
Executable-statements
EXCEPTION
Exception-handling-statements
END;

Where,

 CREATE [OR REPLACE] TRIGGER trigger_name − Creates or replaces an existing trigger with
the trigger_name.

 {BEFORE | AFTER | INSTEAD OF} − This specifies when the trigger will be executed. The
INSTEAD OF clause is used for creating trigger on a view.

 {INSERT [OR] | UPDATE [OR] | DELETE} − This specifies the DML operation.

 [OF col_name] − This specifies the column name that will be updated.

 [ON table_name] − This specifies the name of the table associated with the trigger.

 [REFERENCING OLD AS o NEW AS n] − This allows you to refer new and old values for
various DML statements, such as INSERT, UPDATE, and DELETE.

 [FOR EACH ROW] − This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected. Otherwise the trigger will execute just once when the SQL
statement is executed, which is called a table level trigger.

 WHEN (condition) − This provides a condition for rows for which the trigger would fire. This
clause is valid only for row-level triggers.

Example

Given Student Report Database, in which student marks assessment is recorded. In such schema,
create a trigger so that the total and average of specified marks is automatically inserted whenever
a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.
Suppose the database Schema –
mysql> desc Student;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| per | int(3) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)

SQL Trigger to problem statement.


create trigger stud_marks
before INSERT
on
Student
for each row
set Student.total = Student.subj1 + Student.subj2 + Student.subj3, Student.per = Student.total *
60 / 100;
Above SQL statement will create a trigger in the student database in which whenever subjects
marks are entered, before inserting this data into the database, trigger will compute those two
values and insert with the entered values. i.e.,
mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)
mysql> select * from Student;
+-----+-------+-------+-------+-------+-------+------+
| tid | name | subj1 | subj2 | subj3 | total | per |
+-----+-------+-------+-------+-------+-------+------+
| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+-----+-------+-------+-------+-------+-------+------+
SQL JOIN

SQL Join is used to fetch data from two or more tables, which is joined to appear as single set of
data. It is used for combining column from two or more tables by using values common to both
tables.
JOIN Keyword is used in SQL queries for joining two or more tables. Minimum required condition
for joining table, is (n-1) where n, is number of tables. A table can also join to itself, which is known
as, Self Join.

Types of JOIN

Following are the types of JOIN that we can use in SQL:


 Inner
 Outer
 Left
 Right

Cross JOIN or Cartesian Product

This type of JOIN returns the cartesian product of rows from the tables in Join. It will return a
table which consists of records which combines each row from the first table with each row of the
second table.
Cross JOIN Syntax is,
SELECT column-name-list
FROM
table-name1 CROSS JOIN table-name2;

Example of Cross JOIN

Following is the class table,

ID NAME

1 abhi

2 adam

4 alex
class_info table,

ID Address

1 DELHI

2 MUMBAI

3 CHENNAI

Cross JOIN query will be,


SELECT * FROM
class CROSS JOIN class_info;

The resultset table will look like,

ID NAME ID Address

1 abhi 1 DELHI

2 adam 1 DELHI

4 alex 1 DELHI

1 abhi 2 MUMBAI

2 adam 2 MUMBAI

4 alex 2 MUMBAI

1 abhi 3 CHENNAI

2 adam 3 CHENNAI

4 alex 3 CHENNAI
As you can see, this join returns the cross product of all the records present in both the tables.

INNER Join or EQUI Join

This is a simple JOIN in which the result is based on matched data as per the equality condition
specified in the SQL query.
Inner Join Syntax is,
SELECT column-name-list FROM
table-name1 INNER JOIN table-name2
WHERE table-name1.column-name = table-name2.column-name;
Example of INNER JOIN

Consider a class and class_info table,

Inner JOIN query will be,


SELECT * from class INNER JOIN class_info where class.id = class_info.id;

The resultset table will look like,

ID NAME ID Address

1 abhi 1 DELHI

2 adam 2 MUMBAI

3 alex 3 CHENNAI

Natural JOIN
Natural Join is a type of Inner join which is based on column having same name and same datatype
present in both the tables to be joined.
The syntax for Natural Join is,
SELECT * FROM
table-name1 NATURAL JOIN table-name2;

Example of Natural JOIN

Consider a class and class_info table,

Natural join query will be,


SELECT * from class NATURAL JOIN class_info;
The resultset table will look like,

ID NAME Address

1 abhi DELHI

2 adam MUMBAI

3 alex CHENNAI
In the above example, both the tables being joined have ID column(same name and same
datatype), hence the records for which value of ID matches in both the tables will be the result of
Natural Join of these two tables.

OUTER JOIN

Outer Join is based on both matched and unmatched data. Outer Joins subdivide further into,

1. Left Outer Join


2. Right Outer Join
3. Full Outer Join

LEFT Outer Join


The left outer join returns a resultset table with the matched data from the two tables and then
the remaining rows of the left table and null from the right table's columns.
Syntax for Left Outer Join is,
SELECT column-name-list FROM
table-name1 LEFT OUTER JOIN table-name2
ON table-name1.column-name = table-name2.column-name;

To specify a condition, we use the ON keyword with Outer Join.

Example of Left Outer Join

Here is the class table,

ID NAME

1 abhi

2 adam

3 alex

4 anu

5 ashish
class_info table,

ID Address

1 DELHI

2 MUMBAI

3 CHENNAI

7 NOIDA

8 PANIPAT

Left Outer Join query will be,


SELECT * FROM class LEFT OUTER JOIN class_info ON (class.id = class_info.id);

The resultset table will look like,

ID NAME ID Address

1 abhi 1 DELHI

2 adam 2 MUMBAI

3 alex 3 CHENNAI

4 anu null null

5 ashish null null

RIGHT Outer Join


The right outer join returns a resultset table with the matched data from the two tables being
joined, then the remaining rows of the right table and null for the remaining left table's columns.

Syntax for Right Outer Join is,


SELECT column-name-list FROM
table-name1 RIGHT OUTER JOIN table-name2
ON table-name1.column-name = table-name2.column-name;
Example of Right Outer Join

Consider above class and class_info table,

Right Outer Join query will be,


SELECT * FROM class RIGHT OUTER JOIN class_info ON (class.id = class_info.id);

The resultant table will look like,

ID NAME ID Address

1 abhi 1 DELHI

2 adam 2 MUMBAI

3 alex 3 CHENNAI

null null 7 NOIDA

null null 8 PANIPAT

Full Outer Join


The full outer join returns a resultset table with the matched data of two table then remaining
rows of both left table and then the right table.
Syntax of Full Outer Join is,
SELECT column-name-list FROM
table-name1 FULL OUTER JOIN table-name2
ON table-name1.column-name = table-name2.column-name;

Example of Full outer join is,

Consider above class and class_info table,

Full Outer Join query will be like,


SELECT * FROM class FULL OUTER JOIN class_info ON (class.id = class_info.id);
The resultset table will look like,

ID NAME ID Address

1 abhi 1 DELHI

2 adam 2 MUMBAI

3 alex 3 CHENNAI

4 anu null null

5 ashish null null

null null 7 NOIDA

null null 8 PANIPAT

VIEW:
 A view in SQL is a logical subset of data from one or more tables. View is used to restrict
data access.
 A view can join information from several tables together, or we can say that Views are
useful for Hiding unwanted information.
 Database View is a subset of the database sorted and displayed in a particular way.
 A database view displays one or more database records on the same page.
Syntax:
CREATE or REPLACE view view_name AS<Query expression>
SELECT column_name(s)
FROM table_name
WHERE condition

SQL Query to Create View


CREATE or REPLACE view sale_view as select * from Sale where customer = 'Alex';
The data fetched from select statement will be stored in another object called sale_view.

Example of Displaying a View


Syntax of displaying a view is similar to fetching data from table using Select statement.
SELECT * from sale_view;

Update a View
Update command for view is same as for tables.
Syntax to Update a View is,
UPDATE view-name
set value
WHERE condition;
If we update a view it also updates base table data automatically.

Types of View
There are two types of view,

 Simple View
 Complex View

Simple View Complex View

Created from one table Created from one or more table

Does not contain functions Contain functions

Does not contain groups of data Contains groups of data

Dropping Views
You need a way to drop the view if it is no longer needed. The syntax is
DROP VIEW view_name;

Inserting Rows into a View


Rows of data can be inserted into a view. The same rules that apply to the UPDATE command also
apply to the INSERT command.

Deleting Rows into a View


Rows of data can be deleted from a view. The same rules that apply to the UPDATE and INSERT
commands apply to the DELETE command.

Advantages:
1. Provide additional level of table security by restricting access to a predetermined set of rows or
columns of a table.
2. Hide Data complexity: For example, a single view might be defined with a join, which is a
collection of related columns or rows in multiple tables. However, the view hides the fact that this
information actually originates from several tables.
3. Simplify Statements for User: Views allow users to select information from multiple tables
without actually knowing how to perform join.
4. Present Data in different perspective: Columns of views can be renamed without effecting the
tables on which the views are based.
Disadvantages:
1. Rows available through a view are not sorted and are not ordered either.
2. Cannot use DML operations on a View.
3. When table is dropped view becomes inactive, it depends on the table objects.
4. It affects performance, querying from view takes more time than directly querying from the
table.

Embedded SQL (Static)


 Embedded SQL is a method of combining the computing power of a programming
language and the database manipulation capabilities of SQL.
 Embedded SQL statements are SQL statements written inline with the program source code of
the host language.
 The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced
by host-language calls to a code library.
 The output from the preprocessor is then compiled by the host compiler. This allows
programmers to embed SQL statements in programs written in any number of languages
such as C/C++, COBOL and Fortran.
 When SQL is embedded within C language, the compiler processes the compilation in two
steps. It first extracts all the SQL codes from the program and the pre-compiler will compile
the SQL code for its syntax, correctness, execution path etc.
 Once pre-compilation is done, these executable codes are embedded into the C code. Then
the C compiler will compile the code and execute the code. Thus the compilation takes place
in two steps – one for SQL and one for application language.
 Hence these types of compilation require all the query, data value etc to be known at the
compilation time itself to generate the executable code. Hence the SQL codes written are
static and these embedded SQL is also known as static SQL.
 This is very important as pre-compiler will first extract all the SQLs embedded in it to compile
it at DB level. Then it will be embedded in the C code which will be compiled by the C compiler
to get executable code.
 All the embedded SQLs are preceded by ‘EXEC SQL’ and ends in semicolon (;). We can have
these SQLs placed anywhere in the C code, provided it is placed in the correct order-
declaration, execution and end.
Connection to DB
First connection to the DB that we are accessing needs to be established. This can be done using
the keyword CONNECT.
EXEC SQL CONNECT db_name;
Declaration Section
Once connection is established with DB, query will be written and executed. Similarly, results of DB
query will be returned to the host language which will be captured by the variables of host
language. Hence we need to declare the variables to pass the value to the query and get the values
from query. There are two types of variables used in the host language.

 Host variable : These are the variables of host language used to pass the value to the query as
well as to capture the values returned by the query.
BEGIN DECLAREand END DECLARE section. Again, these declare block should be enclosed
within EXEC SQL and ‘;’.

 Indicator Variable : These variables are also host variables but are of 2 byte short type always.
These variables are used to capture the NULL values that a query returns or to INSERT/ UPDATE
any NULL values to the tables. When it is used in a SELECT query, it captures any NULL value
returned for any column. When used along with INSERT or UPDATE, it sets the column value as
NULL, even though the host variable has value. If we have to capture the NULL values for each
host variable in the code, then we have to declare indicator variables to each of the host variables.

 Execution Section
This is the execution section, and it contains all the SQL queries and statements prefixed by ‘EXEC
SQL’.

EXEC SQL SELECT * FROM STUDENT WHERE STUDENT_ID =:STD_ID;

In this embedded SQL, all the queries are dependent on the values of host variable and queries are
static. That means, in above example of SELECT query, it always pulls student details for the student
Id inserted. But suppose user enters student name instead of student ID. Then these SQLs are not
flexible to modify the query to fetch details based on name. Suppose query is based on name and
address of a student. Then code will not modify the query to fetch details based on name and
address of a student. That means queries are static and it cannot be modified based on user input.
Hence this kind of SQLs is known as static SQLs.

 Error Handling
Error handling method would be based on the host language. Here we are using C language and
we use labeling method, i.e.; when error occurs we stop the current sequence of execution and ask
the compiler to jump to error handling section of the code to continue. In order to handle error, C
programs require separate error handling structure which holds different variables to capture
different set of errors. This structure is known as SQL Communication Area or SQLCA.

EXEC SQL WHENEVER condition action;

The condition in WHENEVER clause can be


 SQLWARNING – indicates SQL warning. It indicates the compiler that when SQL warning occurs
perform action.
 SQLERROR – indicates SQL Error. The SQLCODE will have negative value.
 NOT FOUND - SQLCODE will have positive value indicating no records are fetched.
On receiving error or warning, action can be any one of the following:
 CONTINUE – indicates to continue with the normal execution of the code.
 DO – it calls a function and hence program will move to execute this error handling function.
 GOTO <label> - Program will jump to the location <label> to execute error handling.
 STOP – it immediately stops the execution of the program by calling exit (0) and all the
incomplete transactions will be rolled back.
EXEC SQL WHENEVER SQLWARNING DO display_warning();

EXEC SQL WHENEVER SQLERROR STOP;

EXEC SQL WHENEVER NOT FOUND GOTO lbl_no_records;

Embedded SQL Program

#include <stdio.h>
#include <sqlca.h>
int main(){
EXEC SQL INCLUDE SQLCA;

EXEC SQL BEGIN DECLARE SECTION;


BASED ON STUDENT.STD_ID SID; // host variable to store the value returned by query
char *STD_NAME; // host variable to pass the value to the query
short ind_sid;// indicator variable
EXEC SQL END DECLARE SECTION;

//Error handling
EXEC WHENEVER NOT FOUND GOTO error_msg1;
EXEC WHENEVER SQLERROR GOTO error_msg2;
printf("Enter the Student name:");
scanf("%s", STD_Name);
// Executes the query
EXEC SQL SELECT STD_ID INTO :SID INDICATOR ind_sid FROM STUDENT WHERE
STD_NAME = :STD_NAME;
printf("STUDENT ID:%d", STD_ID); // prints the result from DB
exit(0);
// Error handling labels
error_msg1:
printf("Student Id %d is not found", STD_ID);
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
error_msg2:
printf("Error has occurred!");
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
}

Dynamic SQL

 Dynamic SQL is a programming technique that enables you to build SQL statements
dynamically at runtime. You can create more general purpose, flexible applications by using
dynamic SQL because the full text of a SQL statement may be unknown at compilation.
 Dynamic SQL programs can handle changes in data definitions, without the need to
recompile.
 If we need to build up queries at run time, then we can use dynamic sql. That means if query
changes according to user input, then it always better to use dynamic SQL.
 The query when user enters student name alone and when user enters both student name
and address, is different. If we use embedded SQL, one cannot implement this requirement
in the code. In such case dynamic SQL helps the user to develop query depending on the
values entered by him, without making him know which query is being executed.
 It can also be used when we do not know which SQL statements like Insert, Delete update
or select needs to be used, when number of host variables is unknown, or when datatypes
of host variables are unknown or when there is direct reference to DB objects like tables,
views, indexes are required.
 In dynamic SQL, queries are created, compiled and executed only at the run time. This makes
the dynamic SQL little complex, and time consuming.

 PREPARE
Dynamic SQL builds a query at run time, as a first step we need to capture all the inputs from the
user. It will be stored in a string variable. Depending on the inputs received from the user, string
variable is appended with inputs and SQL keywords. These SQL like string statements are then
converted into SQL query. This is done by using PREPARE statement.

Example, Here sql_stmt is a character variable, which holds inputs from the users along with SQL
commands. But is cannot be considered as SQL query as it is still a sting value. It needs to be
converted into a proper SQL query which is done at the last line using PREPARE statement. Here
sql_query is also a string variable, but it holds the string as a SQL query.
sql_stmt = "SELECT STD_ID FROM STUDENT ";
if (strcmp(STD_NAME, '') != 0){
sql_stmt = sql_stmt || " WHERE STD_NAME = :STD_NAME";
}
else if (CLASS_ID > 0){
sql_stmt = sql_stmt || " WHERE CLASS_ID = :CLASS_ID";

 EXECUTE
This statement is used to compile and execute the SQL statements prepared in DB.
EXEC SQL EXECUTE sql_query;

 EXECUTE IMMEDIATE
This statement is used to prepare SQL statement as well as execute the SQL statements in DB. It
performs the task of PREPARE and EXECUTE in a single line.
EXEC SQL EXECUTE IMMEDIATE : sql_stmt;

Dynamic SQL program

#include
#include
int main(){
EXEC SQL INCLUDE SQLCA;

EXEC SQL BEGIN DECLARE SECTION;


int STD_ID;
char *STD_NAME;
int CLASS_ID;
char *sql_stmt;
char *sql_query;
EXEC SQL END DECLARE SECTION;

EXEC WHENEVER NOT FOUND GOTO error_msg1;


EXEC WHENEVER SQLERROR GOTO error_msg2;
printf("Enter the Student name:");
scanf("%s", STD_Name);
printf("Enter the Class ID:");
scanf("%d", &CLASS_ID);

sql_stmt = "SELECT STD_ID FROM STUDENT ";


if (strcmp(STD_NAME, '') != 0)
sql_stmt = sql_stmt || " WHERE STD_NAME = :STD_NAME";
else if (CLASS_ID > 0)
sql_stmt = sql_stmt || " WHERE CLASS_ID = :CLASS_ID";
if (strcmp(STD_NAME, '') !=0 && CLASS_ID >0)
sql_stmt = sql_stmt || " AND CLASS_ID = :CLASS_ID";

EXEC SQL PREPARE sql_query FROM :sql_stmt;


EXEC SQL EXECUTE sql_query;

printf("STUDENT ID:%d", STD_ID);


exit(0);

error_msg1:
printf("Student Id %d is not found", STD_ID);
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);

error_msg2:
printf("Error has occurred!");
printf("ERROR:%ld", sqlca->sqlcode);
printf("ERROR State:%s", sqlca->sqlstate);
exit(0);
}
Difference between Embedded & Dynamic SQL

JDBC (Java Database Connectivity)

JDBC API provides a standard interface for interacting with any relational database management
systems (RDBMS). JDBC API consists of the following main components:

1. JDBC Driver
2. Connection
3. Statement
4. ResultSet
JDBC Driver

 A JDBC driver is set of Java classes that implement JDBC interfaces for interacting with a
specific database. Almost all database vendors such as MySQL, Oracle, Microsoft SQL Server,
provide JDBC drivers. For example, MySQL provides a JDBC driver called MySQL Connection/J
that allows you to work with MySQL database through a standard JDBC API.
 JDBC Driver is written in pure Java. It translates JDBC calls into MySQL specific calls and sends
the calls directly to a specific database. To use a JDBC driver, you need to include the driver
JAR file with your application.

Connection

The first and most important component of JDBC is the Connection object. In a Java application,
you first load a JDBC driver and then establish a connection to the database. Through the
Connection object, you can interact with the database e.g., creating a Statement to execute SQL
queries against tables. You can open more than one connection to a database at a time.

Statement

To execute a SQL query e.g., SELECT, INSERT, UPDATE, DELETE, etc., you use a Statement object.
You create the Statement object through the Connection object. JDBC provides several types of
statements for different purposes such as PreparedStatement , CallableStatement . We will cover
the details of each object in the next tutorials.

ResultSet

After querying data from the database, you get a ResultSet object. The ResultSet object provides
a set of API that allows you to traverse result of the query.

The typical flow of using JDBC is as follows:


1. First, load the JDBC driver and create a connection to the database.
2. Then, create a Statement and execute the query to get a ResultSet.
3. Next, traverse and process the ResultSet .
4. Close the ResultSet , Statement , and Connection .

Connecting to MySQL database

First, you need to import three classes: SQLException, DriverManager, and Connection from the
java.sql.* package.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
Second, you call the getConnection() method of the DriverManager class to get the Connection
object. There are three parameters you need to pass to the getConnection() method:
1. url: the database URL in the form jdbc:subprotocol:subname. For MySQL, you use the
jdbc:mysql://localhost:3306/mysqljdbc i.e., you are connecting to the MySQL with server
name localhost, port 3006, and database mysqljdbc.
2. user: the database user that will be used to connect to MySQL.
3. password: the password of the database user.

JDBC Program:

Connection conn = null;


try {
// db parameters
String url = "jdbc:mysql://localhost:3306/mysqljdbc";
String user = "root";
String password = "secret";
// create a connection to the database
conn = DriverManager.getConnection(url, user, password);
// more processing here
// ...
} catch(SQLException e) {
System.out.println(e.getMessage());
} finally {
try{
if(conn ! null)
conn.close();
}catch(SQLException ex){
System.out.println(ex.getMessage())
}

When connecting to MySQL, anything could happens e.g., database server is not available, wrong
user name or password, etc. in such cases, JDBC throws a SQLException . Therefore, when you
create a Connection object, you should always put it inside a try catch block. Also you should always
close the database connection once you complete interacting with database by
calling close() method of the Connection object.
ODBC

ODBC is Open Database Connectivity. Like JDBC, ODBC is also an API that acts as an interface
between an application on the client side and the database on the server side. Microsoft introduced
ODBC in the year 1992.

ODBC helps an application to access the data from the database. An application written in any
language can use ODBC to access different types of databases and hence, it is said to be language
and platform independent. Like JDBC, ODBC aslo provides ODBC drivers that convert the request
of application written in any language into the language understandable by databases.

ODBC is most widely used and understands many different programming languages. But its code
is complex and hard to understand.

The four different components of ODBC are:

 Application: Processes and calls the ODBC functions and submits the SQL statements;
 Driver manager: Loads drivers for each application;

 Driver: Handles ODBC function calls, and then submits each SQL request to a data source;
and

 Data source: The data being accessed and its database management system (DBMS) OS.

Difference between ODBC and JDBC

ODBC JDBC

ODBC Stands for Open Database JDBC Stands for java database
Connectivity. connectivity.

Introduced by SUN Micro Systems in


Introduced by Microsoft in 1992. 1997.

We can use ODBC for any language like


C,C++,Java etc. We can use JDBC only for Java languages.

We can choose ODBC only windows


platform. We can Use JDBC in any platform.

Mostly ODBC Driver developed in JDBC Stands for java database


native languages like C,C++. connectivity.

For Java applications it is not


recommended to use ODBC because For Java application it is highly
performance will be down due to recommended to use JDBC because there
internal conversion and applications will we no performance & platform
become platform Dependent. dependent problem.

ODBC is procedural. JDBC is object oriented.

You might also like