Normalization is part of the process of designing database structures in a relational database. The point of normalization is to reduce duplicate information and avoid anomalies (i.e. insert, update and delete anomalies). In this post, I want to explain the three main normalization forms using a real table. But first, I will provide a brief overview of the three main normal forms.
First Normal Form (1NF):
>All non-key attributes must depend on the key
>Without repeating or nested groups
Second Normal Form (2NF):
>All non-key attributes must depend on the whole key (not just part of it)
>Partial functional dependencies must be removed into separate tables
Third Normal Form (3NF):
>All non-key attributes must depend on nothing but the key
>Transitive dependencies must be removed into separate tables
I will hereby apply these three normal forms to the table below.
Explanation: To arrive at 1NF all non-key attributes must depend on the key, so we should remove all repeating or nested groups. we can tell from the table above that EMP_ID and Proj_ID are the key attributes becasue every employee requires an EMP_ID and every employee must be associated with at least one project.
In the table above some employees are involved in more than one project. Notice that in order to take this table to 1NF, I have placed each project in a seperate row along with the role and rate associated with that project for each employee. The table below is in the First Normal Form.
Explanation: To arrive at 2NF all non-key attributes must depend on the whole key (not just part of it). note that 2NF must only be applied to those tables where the key is more than one column.
To go into 2NF we identify all of the functional dependencies and then remove all partial functional dependencies into seperate tables. For the table above to go into 2NF, the Job and Full Name columns must be moved into a seperate table. The illustration below shows our data in the Second Normal Form.
Explanation: To arrive at 3NF all non-key attributes must depend on nothing but the key, and transitive dependencies must be removed. In the two tables above (i.e . in 2NF ) ,EMP_ID and Proj_ID determine the role and role determines the rate, and this is a transitive dependecy. This menas that while role directly determines the rate, EMP_ID and Proj_ID also determine the rate.
To get rid of this transitive dependency, I have created a seperate table for Role and Rate and created a new primary key for that table (i.e Role_ID) which is reflected in other tables where the role column exists. The tables below illustrate the data in theThird Normal Form.