Normalization and Demoralization
Normalization is a process of eliminating the redundant data and storing the related information in a
table.
The key points of Normalization are as below:
Eliminating Redundant Data
Faster Update
Improve Performance
Performance in Indexes
Below are the different Normal Forms
First Normal Form:
If the table is said to be in the 1st Normal Form it should follow the below rules
Each cell must have one value. Eliminating Duplicate Columns
Create separate table for the group of the related data and each row should be identified by
Primary Key.
Let us take an example here:
Name Departmen Phone Number Salary TAX
t
Name 1 Computers 12354588, 6887890, 123456 4000 40
Name 2 Electronics 12345678 2345666,7896543333 5000 50
Name 3 Civil 4567890 3000 30
In the above table we see that there are different phone numbers for the single Name and we have
to remove these duplicates by uniquely identifying each of them and giving a unique identification
by giving them the Primary Key.
Now the below table is redesigned such that it is in the First Normal Form.
ID Name Departmen Phone Salary TAX
t Number
1 Name 1 Computers 12354588 4000 40
2 Name 1 Computers 6887890 4000 40
3 Name 1 Computers 123456 4000 40
4 Name 2 Electronics 12345678 4000 40
5 Name 2 Electronics 2345666 4000 40
6 Name 2 Electronics 789654333 4000 40
7 Name 3 Civil 4567890 3000 30
Second Normal Form:
If the table is said to be in the 2st Normal Form it should follow the below rules
It should satisfy the 1st Normal Form.
Separate the particular Columns, values are duplicated in each row should be placed in the
separate Table.
Create the relationship between the tables.
Here in the above table we see that the Name and the department columns are duplicated and in
order to handle this we need to maintain the duplicates in the different table as below:
EMPID Name Department Salary TAX
1 Name 1 Computers 4000 40
2 Name 2 Electronics 4000 40
3 Name 3 Civil 3000 30
ID EMPID Phone
Number
1 1 12354588
2 1 6887890
3 1 123456
4 2 12345678
5 2 2345666
6 2 789654333
7 3 4567890
Here in these tables above EMPID is treated as the primary Key for the First Table and the Foreign
Key for the Second Table.
Third Normal Form:
If the table is said to be in the 3rd Normal Form it should follow the below rules
It should satisfy the 2nd Normal Form.
Separate the particular Columns that are not dependent on the primary key of the table.
EMPID Name Department Salary
1 Name 1 Computers 4000
2 Name 2 Electronics 4000
3 Name 3 Civil 3000
ID EMPID Phone
Number
1 1 12354588
2 1 6887890
3 1 123456
4 2 12345678
5 2 2345666
6 2 789654333
7 3 4567890
Salary TAX
4000 40
3000 30
Fourth Normal Form:
If the table is said to be in the 4thNormal Form it should follow the below rules
It should satisfy the 3rdNormal Form.
The Non-Key columns should be dependent on fully primary key instead of partial key, fits
not separate the tables.
Here in the above as we have already put a primary key in the table during the First Normal Form the
above tables are in Fourth Normal Form also.
De-Normalization:
De-normalization is a database optimization technique in which we add redundant (repetitive) data
to one or more tables. This can help us avoid costly joins in a relational database. Note that de-
normalization does not mean not doing normalization. It is an optimization technique that is applied
after doing normalization.
Example:
We have two table students and branch after performing normalization. The student table has the
attributes Roll_no, stud-name, age, and Branch_id.
Roll_No Student_Name Age Branch_ID
11 Stevan 23 1
12 Mark 22 2
13 Nishem 25 1
14 Taylor 21 3
15 Kane 26 4
Additionally, the branch table is related to the student table with branch_id as the student table's
foreign key.
Branch_I Branch_Name Total_students
D
1 EC 150
2 CSE 100
3 Civil 60
4 Mech 80
A JOIN operation between these two tables is needed when we need to retrieve all student names
as well as the branch name. Suppose we want to change the student name only, then it is great if
the table is small. The issue here is that if the tables are big, joins on tables can take an excessively
long time.
In this case, we'll update the database with de-normalization, redundancy, and extra effort to
maximize the efficiency benefits of fewer joins. Therefore, we can add the branch name's data from
the Branch table to the student table and optimizing the database.