Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

CSMI14: Database Management

Systems
Dr. R. Bala Krishnan
Asst. Prof.
Dept. of CSE
NIT, Trichy – 620 015
Ph: 999 470 4853 E-mail: [email protected]
Course Content

2
Course Content

3
Books
• Text Books (TB)
 Silberschatz, Henry F. Korth, S. Sudharshan, “Database System
Concepts”, Fifth Edition, Tata McGraw Hill, 2006.
 J. Date, A. Kannan, S. Swamynathan, “An Introduction to Database
Systems”, Eighth Edition, Pearson Education, 2006.

• Reference Books (RB)


 Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database
Systems”, Fourth Edition, Pearson/Addision Wesley, 2007.
 Raghu Ramakrishnan, “Database Management Systems”, Third
Edition, McGraw Hill, 2003.
 S. K. Singh, “Database Systems Concepts, Design and Applications”,
First Edition, Pearson Education, 2006.

4
Books & Chapters
Unit Book Chapter

1 TB1 1

1_ TB1 6

2 RB2 3, 4

2_ TB1 3

3 RB2 19

4 RB2 16, 17, 18

5 TB1 11, 12

• https://www.databasestar.com/sql-practice/
• http://sqlfiddle.com/#!9/7379d5/1 5
Unit III

6
Functional Dependency
• Basically a many-to-one relationship from one set of attributes to
another within a given relation variable
• X → Y; XY → Z, where X & XY are called Determinant Attributes and Y
& Z are called Dependent Attributes
• Whenever two tuples of a relation agree on their “X” value they will also
agree on their “Y” value
• Eg: Branch No. / Serial No. = {S1, S2, S3, S4}; City = {London, Paris};
Product No. = {P1, P2, P4, P5}; Quantity = {100, 200, 400}
SCP Possible Functional Dependencies:
Serial No. City Product No. Quantity Serial No. → City
S1 London P1 100 Serial No. → Quantity
Quantity → Serial No.
S1 London P2 100
{Serial No., Product No.} → Quantity
S2 Paris P1 200 {Serial No., Product No.} → City
S2 Paris P2 200 {Serial No., Product No.} → {City, Quantity}
S3 Paris P2 200 And so on . . .
S4 London P2 400
S4 London P4 400
S4 London P5 400 7
Functional Dependency
SCP Possible Functional Dependencies:
Serial No. City Product No. Quantity Serial No. → City
Serial No. → Quantity
S1 London P1 100 Quantity → Serial No.
S1 London P2 100 {Serial No., Product No.} → Quantity
S2 Paris P1 200 {Serial No., Product No.} → City
{Serial No., Product No.} → {City, Quantity}
S2 Paris P2 200 And so on . . .
S3 Paris P2 200
Lossy / Lossless Decomposition
S4 London P2 400
S4 London P4 400 Serial No. Product No. Quantity
S4 London P5 400 S1 P1 100
S1 P2 100
Serial No. City Serial No. Quantity S2 P1 200
S1 London S1 100 S2 P2 200
S2 Paris S2 200 S3 P2 200
S3 Paris S3 200 S4 P2 400
S4 London S4 400 S4 P4 400
8
S4 P5 400
Armstrong’s Axioms
• If F is a set of functional dependencies then the closure of F, denoted
as F+, is the set of all functional dependencies logically implied by F
• Armstrong’s Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies

9
Armstrong’s Axioms
• Secondary Rules -> Can be derived from the above axioms

10
Armstrong’s Axioms
• Armstrong axioms refer to the Sound and Complete

- By sound, we mean that given a set of functional dependencies F

specified on a relation schema R, any dependency that we can

infer from F by using the primary rules of Armstrong axioms holds

in every relation state r of R that satisfies the dependencies in F

- By complete, we mean that using primary rules of Armstrong

axioms repeatedly to infer dependencies until no more

dependencies can be inferred results in the complete set of all

possible dependencies that can be inferred from F


11
Keys
(Candidate Key, Super Key & Minimal Super Key)
• A super key is a set of one or more attributes that allow entities (or
relationships) to be uniquely identified
• If there are 5 attributes in a relation, then there are 25 super keys
- Unique key is a super key
• A candidate key is a closely related concept where the superkey is reduced to
the minimum number of columns required to uniquely identify each row
- Candidate key is a super key that has no super keys as proper subsets
- Candidate key is a minimal super key
 All candidate keys are super keys but the reverse is not true
- Any attribute added with the minimal super key is also a super key
• Techniques to Identify all these Keys
- If the closure finds all the attributes, then it is a super key. Otherwise it
is not
- If it is not a super key, try applying Armstrong’s Axioms to get an
attribute combination that is a super key
- List of minimal super keys obtained is the candidate keys for that
relation

ABCDE = {A, B, C, D, E, AB, AC, AD, AE, BC, BD, BE, CD, CE, DE, ABC, ABD, ABE,
ACD, ACE, ADE, BCD, . . .} 12
Closure of a Set
Columns in Table: A,B, C, D, E
• Functional Dependency: A → BC; CD → E; B → D; E → A

A+ = ABC CD+ = CDE B+ = BD E+ = EA


ABCD CDEA (Not a Super Key. So,
try applying Axioms)
EABC
ABCDE CDEAB BC → DC EABCD
BC+ = BCD
BCDE
BCDEA

{ABCDE} {ABCDE} {ABCDE} {ABCDE}

13
Closure of a Set
A+ = ABC CD+ = CDE B+ = BD E+ = EA
ABCD CDEA (Not a Super Key. So,
try applying Axioms)
EABC
ABCDE CDEAB BC → DC EABCD
BC+ = BCD
BCDE
BCDEA

{ABCDE} {ABCDE} {ABCDE} {ABCDE}


• Every minimal • A+, CD+, BC+ and E+ => All are super keys Along with identified super keys, any
super key is a combination of keys containing A or
• How to find minimal super key?
candidate key CD or BC or E are also super keys
• If minimal
- A+ and E+ are already minimal super keys
super key - Check whether CD+ and BC+ are minimal super keys or not
contains only  CD+ => C+, D+ => C+ and D+ are not super keys => C+ and D+
one attribute, cannot be minimal super keys
then we can  Hence, CD+ is the minimal super key
specifically  BC+ => B+, C+ => B+ and C+ are not super keys => B+ and C+
call it as cannot be minimal super keys
Primary Key 14
 Hence, BC is the minimal super key
+
Equivalence of Functional Dependency
• A set of Functional Dependency (FD) “F” is said to cover another set of FD

“E”, if every FD in E is also in F+

- That is, if every dependency in E can be inferred from F

• Alternatively, we can say that E is “covered by” F

• Note: Both must be covered in both sides (ie): E+ = F+

• We can determine whether F covers E by calculating X+ w.r.t. F for each FD X

→ Y in E and then checking whether this X+ includes the attributes in Y

• If this is the case for every FD in E, then F covers E

• We determine whether E and F are equivalent by checking that E covers F

and F covers E
15
Equivalence of Functional Dependency
• Let, F = {A → B, B → C, AC → D}
G = {A → B, B → C, A → D}
• Solution: We can conclude that F and G are equivalent, if we prove that
all FD’s in F can be inferred from the set of FD’s in G and vice versa
• Find the closure for the LHS elements of F, using the FD of G
• Find the closure for the LHS elements of G, using the FD of F

F = {A → B, B → C, AC → D} G = {A → B, B → C, A → D}
A+ = AB B+ = BC A+ = ABD B+ = BC AC+ = AC
ABC ABDC
ABCD

A→A B→B A→A B→B AC → A


A→B B→C A→B B→C AC → C
A→C A→D
A→D A→C

A → C and A → D => AC → D Equivalent 16


Equivalence of Functional Dependency
• Let, F = {A → B, A → C}; G = {A → B; B → C}

Using F Using G
A+ = ABC B+ = B A+ = AB A+ (Already
ABC Calculated)

A→A B→B A→A


A→B A→B
A→C A→C

Not Equivalent
17
Canonical Cover
{ABCDE}; Functional Dependency: {A → BC; CD → E; B → D; E → A}
{A → BC; CD → E; B → D; E → A}
• A canonical cover is a simplified and reduced version of the given set of
functional dependencies
• Since it is a reduced version, it is also called as Irreducible Set
• Characteristics
- Canonical cover is free from all the extraneous functional
dependencies
- Closure of canonical cover is same as that of the given set of functional
dependencies
- Canonical cover is not unique and may be more than one for a given
set of functional dependencies
• Need
- Working with the set containing extraneous functional dependencies
increases the computation time
- Given set is reduced by eliminating the useless functional
dependencies
- Reduces the computation time and working with the irreducible set
becomes easier 18
Steps to Find Canonical Cover
• Step 1
- Write the given set of functional dependencies in such a way that
each functional dependency contains exactly one attribute on its
right side
- Eg: The functional dependency X → YZ will be written as - X → Y; X
→Z
{A → BC; CD → E; B → D; E → A} => {A → B; A → C; CD → E; B → D; E → A}

• Step 2
- Consider each functional dependency one by one from the set
obtained in Step 1
- Determine whether it is essential or non-essential
- To determine whether a functional dependency is essential or not,
compute the closure of its left side
 Once by considering that the particular functional
dependency is present in the set
 Once by considering that the particular functional
dependency is not present in the set
19
Steps to Find Canonical Cover
 Case 1: Results Come Out to be Same-
 If results come out to be same, it means that
the presence or absence of that functional
dependency does not create any difference ->
Non-essential
 Eliminate that functional dependency from
the set
 Case 2: Results Come Out to be Different-
 If results come out to be different, it means
that the presence or absence of that
functional dependency creates a difference ->
Essential
 Do not eliminate that functional dependency
from the set
 Mark that functional dependency as essential
{A → B; A → C; CD → E; B → D; E → A} => {A → B; A → C; CD → E; B → D}
20
Steps to Find Canonical Cover
• Step 3
- Consider the newly obtained set of functional dependencies after performing
Step 2
- Check if there is any functional dependency that contains more than one
attribute on its left side
 Case 1: No-
 There exists no functional dependency containing more than
one attribute on its left side -> Set obtained in Step 2 is the
canonical cover
 Case 2: Yes-
 There exists at least one functional dependency containing
more than one attribute on its left side
 Consider all such functional dependencies one by one
 Check if their left side can be reduced
 Use the following steps to perform a check-
 Consider a functional dependency
 Compute the closure of all the
{A → B; A → C; CD → E; B → D} possible subsets of the left side of
that functional dependency
{CD} => {C}, {D}  If any of the subsets produce the
+
CD = { } same closure result as produced by
C+ = { } the entire left side, then replace the
D+ = { } left side with that subset
 After this step is complete, the set
obtained is the canonical cover
21
https://www.gatevidyalay.com/tag/irreducible-set-of-functional-dependencies-in-dbms/
Normal Forms
• Normalization is the process of minimizing redundancy from a relation or

set of relations

• Redundancy in relation may cause insertion, deletion, and update

anomalies

• Normal forms are used to eliminate or reduce redundancy in database

tables

- Helps to minimize the redundancy in relations

https://www.geeksforgeeks.org/normal-forms-in-dbms/ 22
Normal Forms

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

- Boyce-Codd Normal Form (BCNF) or 3.5NF

• Fourth Normal Form (4NF)

• Fifth Normal Form (5NF)

• Sixth Normal Form (6NF)


23
1NF
• If a relation contain composite or multi-valued attribute, it violates first
normal form
• A relation is in first normal form if it does not contain any composite or
multi-valued attribute
• A relation is in first normal form if every attribute in that relation is
singled-valued/atomic-valued attribute

24
2NF

Candidate Key = {STUD_NO, COURSE_NO}

• Note that, there are many courses having the same course fee
- COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO
- COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO
- COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO
• COURSE_FEE would be a non-prime attribute, as it does not belong to the only
one candidate key {STUD_NO, COURSE_NO}
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent only on
COURSE_NO, which is a proper subset of the candidate key {STUD_NO,
COURSE_NO}
• Non-prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partial dependency
25
• So, this relation is not in 2NF
2NF
• To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency
- Partial Dependency: If the proper subset of candidate key
determines non-prime attribute, it is called partial dependency
 Non-Prime Attribute -> Attribute does not belong to any
candidate key
• A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime
attribute (attributes which are not part of any candidate key) is
dependent on any proper subset of any candidate key of the table

Candidate Key = {STUD_NO, COURSE_NO}

• Non-Prime Attribute -> Attribute does not belong to any candidate key
• Non-prime attribute must be fully dependent on candidate key. If not, then it violates 2NF
26
2NF

• To convert the above relation to 2NF, we need to split the table into two
tables such as
- Table 1: STUD_NO, COURSE_NO
- Table 2: COURSE_NO, COURSE_FEE
• Note: 2NF tries to reduce the redundant data getting stored in memory
- For instance, if there are 100 students taking C1 course, we don’t
need to store its Fee as 1000 for all the 100 records, instead, once
we can store it in the second table as the course fee for C1 is 1000
27
3NF

• A relation is in third normal form, if it is in second normal form and there

is no transitive dependency for non-prime attributes

- Transitive Dependency: If A → B and B → C are two FDs then A →

C is called transitive dependency

 3NF tells that B should not be a non-prime attribute

A relation R is not in 3NF, when there is some attribute which depends on some
non-prime attribute

28
3NF

• Candidate Key: {STUD_NO}


• FD: {STUD_NO -> STUD_NAME, STUD_NO → STUD_STATE, STUD_STATE
→ STUD_COUNTRY, STUD_NO → STUD_AGE}
• For this relation, STUD_NO → STUD_STATE and STUD_STATE →
STUD_COUNTRY are true
• STUD_COUNTRY is transitively dependent on STUD_NO -> Violates the
third normal form
• To convert it in third normal form, we will decompose the relation
- STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as
 STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_AGE)
 STATE_COUNTRY (STATE, COUNTRY)
29
BCNF or 3.5NF
• A relation is in BCNF if at least one of the following condition holds in
every non-trivial function dependency X → Y
- X is a super key
- Y is a prime attribute (each element of Y is part of some candidate
key)
In other words, a relation R • To determine the highest normal form of a given relation
to be in BCNF, for every FD R with functional dependencies, the first step is to check
X → Y, X should not be a whether the BCNF condition holds
non-prime attribute with Y • If R is found to be in BCNF, it can be safely deduced that
being a prime attribute the relation is also in 3NF, 2NF and 1NF as the hierarchy
shows
- 1NF has the least restrictive constraint – it only
requires a relation R to have atomic values in
each tuple
- 2NF has a slightly more restrictive constraint
- 3NF has more restrictive constraint than the first
two normal forms but is less restrictive than the
BCNF
• Restriction increases as we traverse down the hierarchy
30
https://www.geeksforgeeks.org/boyce-codd-normal-form-bcnf/
Example 1
• Find the highest normal form of a relation R(A,B,C,D,E) with FD set as
{BC → D, AC → BE, B → E}
- Step 1 (Identify Candidate Key): As we can see, (AC)+ ={A,C,B,E,D} but
none of its subset can determine all attribute of relation, So AC will be
candidate key
 A or C can’t be derived from any other attribute of the relation,
so there will be only 1 candidate key {AC}
- Step 2 (Identify Prime and Non-Prime Attributes): Prime attributes
are those attributes that are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example
• Relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute
• Relation is in 2nd normal form because
- BC → D is in 2nd normal form (BC is not a proper subset of candidate
key AC)
- AC → BE is in 2nd normal form (AC is candidate key)
- B → E is in 2nd normal form (B is not a proper subset of candidate key
AC)
• Relation is not in 3rd normal form because
- BC → D (neither BC is a super key nor D is a prime attribute)
- B → E (neither B is a super key nor E is a prime attribute)
- But to satisfy 3rd normal form, either LHS of an FD should be super key
or RHS should be prime attribute
31
• So, the highest normal form of relation will be 2NF
Example 2

• For example consider relation R(A, B, C) with FD: {A → BC, B → A}

• A and B both are super keys

• So, above relation is in BCNF

32
Discussion
• For a relation, with only one candidate key, 3NF and BCNF are equivalent

• BCNF is slightly stronger than 3NF

• BCNF decomposition may always not possible with dependency

preserving, however, it always satisfies lossless join condition

- Eg: Relation R (V, W, X, Y, Z), with functional dependencies:

{VW → X; YZ → X; W → Y}

 Would not satisfy dependency preserving BCNF

decomposition

https://www.geeksforgeeks.org/boyce-codd-normal-form-bcnf/
33
Dependency Preserving
• A Decomposition D = { R1, R2, R3, …., Rn } of R is dependency preserving
wrt a set F of Functional dependency if: (F1 ∪ F2 ∪ … ∪ Fm)+ = F+

• Eg: Consider a relation R: R ---> F{...with some functional


dependency(FD)....}

• R is decomposed or divided into R1 with FD { f1 } and R2 with { f2 }, then


there can be three cases:

- f1 U f2 = F -----> Decomposition is dependency preserving

- f1 U f2 is a subset of F -----> Not Dependency preserving

- f1 U f2 is a super set of F -----> This case is not possible

https://www.geeksforgeeks.org/data-base-dependency-preserving-decomposition/
34
Example for Dependency Preserving
Decomposition

https://www.geeksforgeeks.org/data-base-dependency-preserving-decomposition/
35
4NF
• 4NF deals with Multi-values dependencies
A →→ B
- 4NF tries to handle the problem created by 1NF

36
4NF
Department Job Part
d1 {J1, J2} {P1, P2}
d2 {J1, J2} {P1, P2}

Department Job
Department Job Part d1 J1
d1 J1 P1 d1 J2
d1 J1 P2 d2 J1
d1 J2 P1 d2 J2
=>
d1 J2 P2 +
d2 J1 P1
Department Part
d2 J1 P2
d1 P1
d2 J2 P1
d1 P2
d2 J2 P2
d2 P1
d2 P2
37
5NF
• Any relation in order to be in the fifth normal form must satisfy the
following conditions:
- Must be in Fourth Normal Form (4NF)
- Should have no join dependency and also the joining must be
lossless
 A Join dependency is generalization of Multi-valued
dependency
 A JD {R1, R2, ..., Rn} is said to hold over a relation R if R1, R2,
R3, ..., Rn is a lossless-join decomposition of R
• In the fifth normal form the relation must be decomposed in as many
sub-relations as possible so as to avoid any kind of redundancy and there
must be no extra tuples generated when the sub-relations are combined
together by using natural join
• A relation in 5NF cannot be decomposed further without any kind of
modification in the meaning or facts
• 5NF is also known as Project Join Normal Form (PJNF)
38
5NF

Deals with Join


Projection Anomaly

+ +

39
5NF

+ +

↓ Original

Student-ID Mobile Number Hobby


123 9999900000 Dancing
123 9999900000 Singing
123 8975622122 Singing
124 8999900000 Singing
124 8999900000 Dancing

Lossless Decomposition:
• All the tuples of original relation are successfully recreated through join operation
• No extra tuples are formed during join operation 40
4NF vs 5NF

41
6NF

42
THANK YOU

43

You might also like