Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Normal Forms in DBMS

Database normalization is the process of organizing the attributes of the database to reduce or
eliminate data redundancy (having the same data but at different places) .
Problems because of data redundancy
Data redundancy unnecessarily increases the size of the database as the same data is
repeated in many places. Inconsistency problems also arise during insert, delete and update
operations.

Normalization is the process of minimizing redundancy from a relation or set of relations.


Redundancy in relation may cause insertion, deletion and updation anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables.

1. First Normal Form –

If a relation contain composite or multi-valued attribute, it violates first normal form or a relation
is in first normal form if it does not contain any composite or multi-valued attribute. A relation is
in first normal form if every attribute in that relation is singled valued attribute.

 Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued


attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.

 Example 2 –
ID Name Courses

1 A c1, c2

2 E c3

3 M C2, c3

In the above table Course is a multi valued attribute so it is not in 1NF.


Below Table is in 1NF as there is no multi valued attribute
ID Name Course

1 A c1

1 A c2

2 E c3
3 M c2

3 M c3

2. Second Normal Form –

To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is dependent on any
proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it
is called partial dependency.
 Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE

1 C1 1000

2 C2 1500

1 C4 2000

4 C3 1000

4 C1 1000

2 C5 2000

{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial
dependency and so this relation is not in 2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if
there are 100 students taking C1 course, we dont need to store its Fee as 1000 for all the
100 records, instead once we can store it in the second table as the course fee for C1 is
1000.
 Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency,
i.e., any proper subset of AB doesn‟t determine any non-prime attribute.

3. Third Normal Form –

A relation is in third normal form, if there is no transitive dependency for non-prime attributes
as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
 Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on
STUD_NO. It violates the third normal form. To convert it in third normal form, we will
decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
 Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right
sides of all functional dependencies are prime.

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A
relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.
 Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC-
>D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can‟t be derived from any other
attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a
proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate
key) and B->E is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D
is a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to
satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime
attribute.
So the highest normal form of relation will be 2nd Normal form.
 Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in BCNF.
Key Points –
3. BCNF is free from redundancy.
4. If a relation is in BCNF, then 3NF is also also satisfied.
5. If all attributes of relation are prime attribute, then the relation is always in 3NF.
6. A relation in a Relational Database is always and at least in 1NF form.
7. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
8. If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1
attribute), then the Relation is always in 2NF( because no Partial functional dependency
possible).
9. Sometimes going for BCNF form may not preserve functional dependency. In that case
go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
10. There are many more Normal forms that exist after BCNF, like 4NF and more. But in real
world database systems it‟s generally not required to go beyond BCNF.

Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF and so on.
2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form. For example, ABC –> D is in BCNF (Note that ABC is a superkey), so no
need to check this dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this
dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don‟t need to check for this dependency as it already satisfied BCNF. Let
us consider CD -> AE. Since E is not a prime attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a
candidate key and it determine E, which is non-prime attribute. So, given relation is also not in
2 NF. So, the highest normal form is 1 NF.
Fourth normal form (4NF):

Fourth normal form (4NF) is a level of database normalization where there are no non-trivial
multivalued dependencies other than a candidate key. It builds on the first three normal forms
(1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states that, in addition to a
database meeting the requirements of BCNF, it must not contain more than one multivalued
dependency.
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of Fourth Normal
Form (4NK) because it creates unnecessary redundancies and can contribute to inconsistent
data. To bring this up to 4NF, it is necessary to break this information into two tables.
Example – Consider the database table of a class whaich has two relations R1 contains
student ID(SID) and student name (SNAME) and R2 contains course id(CID) and course
name (CNAME).

Table – R1(SID, SNAME)


SID SNAME

S1 A

S2 B

Table – R2(CID, CNAME)

CID CNAME
C1 C
C2 D
When there cross product is done it resulted in multivalued dependencies:

Table – R1 X R2
SID SNAME CID CNAME
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Multivalued dependencies (MVD) are:
SID->->CID; SID->->CNAME; SNAME->->CNAME
Joint dependency – Join decomposition is a further generalization of Multivalued
dependencies. If the join of R1 and R2 over C is equal to relation R then we can say that a join
dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of
a given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R. A
JD ⋈ {R1, R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless-join
decomposition. The *(A, B, C, D), (C, D) will be a JD of R if the join of join‟s attribute is equal
to
the relation R. Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a
JD of R.
Let R is a relation schema R1, R2, R3……..Rn be the decomposition of R. r( R ) is said to
satisfy join dependency if and only if

Example –

Table – R1
COMPANY PRODUCT
C1 pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product

Table – R2

AGENT COMPANY
AMAN C1
AMAN C2
MOHAN C1
Agent->->Company

Table – R3
AGENT PRODUCT
AMAN PENDRIVE
AMAN MIC
AMAN SPEAKER
MOHAN SPEAKER
Agent->->Product

Table – R1⋈R2⋈R3
COMPANY PRODUCT AGENT
C1 pendrive Aman
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product

Fifth Normal Form / Projected Normal Form (5NF):

A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys
of R. A relation decomposed into two relations must have loss-less join Property, which
ensures that no spurious or extra tuples are generated, when relations are reunited through a
natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
Example – Consider the above schema, with a case as “if a company makes a product and an
agent is an agent for that company, then he always sells that product for the company”. Under
these circumstances, the ACP table is shown as:

Table – ACP
AGENT COMPANY PRODUCT
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut

The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three
relations will be shown as:

Table – R1
AGENT COMPANY
A1 PQR
A1 XYZ
A2 PQR

Table – R2
AGENT PRODUCT
A1 NUT
A1 BOLT
A2 NUT

Table – R3
COMPANY PRODUCT
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
Result of Natural Join of R1 and R3 over „Company‟ and then Natural Join of R13 and R2 over
„Agent‟and „Product‟ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is
a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the
property of lossless join.

You might also like