[ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ]

The Normal Forms

The next sections of this paper will describe each of the normal forms and how they are applied.  There will be examples used to describe the form and its application.  The examples chosen are obviously wrong and are designed to clearly demonstrate the normal form being discussed.

In your actual design work the normalization problems will probably be more subtle and require a much more careful study to discover and repair.

1st Normal Form (1NF)

Reduce entities to first normal form (1NF) by removing repeating or multi-valued attributes to another, child entity.

To understand 1st Normal Form we will use the table design below.

To discover the problem in this design we must consider the domains for the fields in the table.  The CustID is defined as the customer Primary key ID, the Name is the name of the customer, Contact1 is the name of a contact person, Contact2 is the name of a contact person, and Contact3 is the name of a contact person.

The fact that Contact1, 1, and 3 all have the same domain definition proves that in fact there is only one attribute, contact person, and that we need multiple values for that attribute.  This is a multi-valued attribute.

The 1st NF design for this situation is shown below.

Notice the creation of the new entity for Contacts and the relation of that entity to the original Customer entity.  Using this new design the customer can have any number of contacts from none to the capacity of the table storing the contact names.

What about the client who tells us that their customer will never have more than three contact names?  Do we really need to do this for those situations?

Well, reread what I said earlier about clients and the word never.  Besides that, if we provide the three fields for contact names and most customers have only one name, we are wasting a lot of space.  For a contact name of 40 characters and 1 million customer records that would amount to approximately 40 MB of wasted space.

Also, the first customer that comes along with four or more contact names would require that the user either use two customer records, not store all of the contact names, or pay for a revision to the data design to allow the fourth name.  With the 1st Normal Form structure none of these things are an issue.  If the customer has only one contact then there is only one record in the Contacts table.  If the customer has 300 contact names, then there are 300 records in the contacts table.

Reduce entities in 1NF to 2NF by removing attributes that are not dependent on the whole primary key.

2nd Normal Form (2NF)

The figure below will be used to study this normal form.

The primary key for the invoice details table in the figure is the combination of InvNo and LineNo.  The two fields together comprise the primary key.  2nd NF deals with non-key attributes that are not dependent on the entire primary key but rather only on part of it.

The ItemID and Price Quantity are dependent on the whole primary key.  You cannot know the item sold or its quantity price break without knowing the invoice and which line of the invoice you are interested in.

However the CustID will remain the same for all lines on an invoice.  This means that CustID is dependent on the InvNo only and not ion the LineNo.  CustID is dependent on part of the primary key.

To fix this we move the CustID field to another table where it is dependent on the whole primary key.

3rd Normal Form (3NF)

Reduce entities in 2NF to 3NF by removing attributes that depend on other, non-key attributes (other than alternate keys).

The golden rule of relational databases is, “the key, the whole key, and nothing but the key”.  The 3rd normal form deals with attributes that are codependent on the primary key and another, non-key, attribute.  The figure below shows a table design that violates the 3rd normal form.

With the 3rd normal form we are trying to identify non-key attributes that have a dependency on other non-key attributes (other than alternate keys).  In figure 13 the there are four non-key attributes that are all dependent on the primary key, that is to know the VendorID, VendorCity, Date, or Terms of a purchase order you must know which purchase order you are looking at.  However the VendorCity is also dependent on the VendorID for its value.  That is if you change the VendorID on a purchase order the VendorCity will also need to change.

The solution for this example is shown in below.

We have moved the VendorCity out of the purchase order table and put it in the Vendor table where the VendorID is the primary key.

Perhaps you have heard someone say that it is not a good design, in a relational database, to store the results of a calculation in a table.  Why not?  What rule does this break?  It violates 3rd normal form.

If I have a table for invoice detail lines and it has a UnitPrice field, a quantity field, and a TotalPrice field (which is calculated by multiplying the UnitPrice by the Quantity) then I have at least one field that is codependent, the TotalPrice field.  The TotalPrice for a line is dependent on the line number, but it is also dependent on both the UnitPrice and the Quantity.   If either UnitPrice or Quantity changes then the TotalPrice will also need to change.

Y)   Is 3rd Normal Form good enough?

I have often heard people say that 3rd normal form is good enough; perhaps you have too.  Is this true?  Is 3rd normal form good enough?  Well, I would have to ask that if 3rd normal form was as far as it is necessary to go with normalization then why are there three more normal forms after 3rd?

n truth, the next three normal forms only apply in certain specific situations and if none of those situations exist in the data design, then 3rd normal form is 5th normal form an fully normalized.

Z)   Boyce-Codd Normal Form (BCNF)

Reduce entities in 3NF to BCNF by ensuring that they are in 3NF for any feasible choice of candidate key as primary key.

The next normal form is named after the two people who first described it, Boyce and Codd.  This normal form is only required for tables that have more than one candidate for the primary key.  The rule is simple; if the table is in 3rd normal form for the primary key being used, insure that it is also in 3rd normal form for any of the alternate keys as well.

Imagine an employee table that has attributes for Social Security Number, Employee Clock Number, and Employee ID (a surrogate primary key).  3rd normal form would apply the first three rules using the Employee ID as the primary key.  Boyce-Codd normal form would go back and apply the first three rules using the Social Security Number and then using the Employee Clock Number as the primary key.  When the table structure is in 3rd normal form no matter which candidate for primary key is used, then it is in Boyce-Codd normal form.

4th Normal Form (4NF)

Reduce entities in BCNF to 4NF by removing any independently multi-valued components of the primary key to multiple new parent entities.

4th normal form is only applicable when the primary key is comprised of two or more attributes.  With a primary key of only one attribute there is no need to check 4th normal form.  4th and 5th normal forms resolve problems within the primary key itself.

In figure 15 we have a design that is meant to record and track employees, their skills, and their objectives.  The primary key for the table is the combination of the Employee ID, the Skill ID, and the Objective ID.  The problem with this design is the independence of the skill and objective attributes comprising the primary key.

To really understand the nature of the problem, let’s consider some data from this table:
EmpID Skill Objective
Jones Accounting More Money
Jones Accounting Master’s Degree
Jones Public Speaking More Money
Jones Public Speaking Master’s Degree

Looking at the sample data, what would need to happen if Jones was to tell you he had an objective of getting a doctorate degree too?  How many record would you need to ad for that change?  What if he received his Masters Degree?  Again how many records would need to change?  Both situations require that more than one record change in order to record the change in the data.

Below is shown the same information being recorded, but the design is in 4th normal form.  Any of the events asked about in the previous paragraph will only involve one record in the new design.

5th Normal Form (5NF)

Reduce entities in 4NF to 5NF by removing pair-wise cyclic dependencies (appearing within composite primary keys with three or more component attributes) to three or more new parent entities.

The 5th normal form is another one that is only required when the primary key has more than one attribute.  In fact, with 5th normal form the primary key must use three or more attributes.

Reading the definition for this normal form can be stress inducing for sure.  If you take it apart and understand each piece separately it really isn’t that complex.  The definition refers to pair-wise cyclic dependencies.  Pair-wise means taking two attributes at a time, dependencies is referring to the value of one attribute being dependent on the value of another.  The cyclic is simply saying that in a primary key of three attributes you need the value of the other two to determine the value of any one of them.  The figure below shows an example of a 5th normal form problem.

This design is to record information about a retail buying operation.  The requirement is to track the buyers, from whom do they buy, and what do they buy.  The table design has the combination of Buyer, Vendor, and Item as the primary key.

If you analyze the relationship between the components of the primary key in this design you will realize that if you want to know the buyer, you must first determine the vendor and item.  If you want to know the vendor, you need the buyer and item.  Finally if you want the item, you must know the vendor and buyer.  Notice the pair wise (you always need to know two) cyclic (no matter which one you need it is the other two that it depends on) dependency.

To appreciate the nature of the difficulty having a table that is in violation of 5th normal form will present to you, consider the following sample data.
Buyer Vendor Item
Mary Jordache Jeans
Mary Jordache Sneakers
Sally Jordache Jeans
Mary Liz Claiborne Blouses
Sally Liz Claiborne Blouses

Like 4th normal form, the major problem areas with 5th normal form have to do with data updates.  For example, if Liz Claiborne were to introduce a new line of Jeans, how many records would need to be added to this table to reflect that change?  Two, since both Mary and Sally buy from Claiborne and both Mary and Sally buy Jeans.  What if Jordache dropped their line of jeans?  Again, two records need to be modified (actually deleted) to reflect this change.

Below is the design reduced to the 5th normal form.

[ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ]