Like anything there is a correct way of creating a database and there is an incorrect way. More notably there is an efficient way and also a non efficient way. So far a lot of the databases you may have created will be in the latter category. This was and still is a major problem for database designers. Understanding what helps a database become efficient, scaleable and robust is fundamental in creating a stable and long lasting database. As databases provide the meat of most projects it is little wonder that there are formal methods to follow to help create a good database. The process we will be looking at is normalisation. The key to normalising a database is to produce one which is-

  • Eliminate data redundancy
  • To reduce or eliminate update and delete anomalies
  • Protect data consistency

Normalisation was first introduced in 1972 by E.F.Codd and was provided as series of three tests on a database. In order for the database to pass the three tests it must have certain desirable properties. These tests are called first, second and third normal form. We shall look at each form later and how to do it. However first we shall look the three desirable properties and see why a database should exhibit them.

Eliminating data redundancy and anomalies

Name

Age

DVD owned

Bob

15

Shrek

Bob

15

Starwars

Sally

14

Shrek

Above we have a simple database showing people and what DVD's they own. As we can see Bob has Shrek and Starwars while Sally has just shrek. Of course it is likely that Bob actually owns lots of DVD's. If this was the case each DVD he owned would have a row in the table with his name in it. We effectively are repeating his name over and over again. This is an example of data redundancy. This causes a large number of problems. For example consider the situation that Bob changed his name to “Fred”. In the current database we would have to update a large number of rows in order to ensure that are data in consistent. This brings up the question what happens if not all rows were updated? The answer is that data will become lost. It will still physically be in the database. However a query looking for all of Fred's DVDs will no longer return the correct result.

Efficiency of the database reduces as well. For each update and delete we would have to look at numerous records. If we had a similar database showing books published by a publisher we could potentially have thousands of records to update.

This problem also occurs for inserting data. When we add a new DVD for Bob we will have to ensure that we enter the correct values for name and age. If we do not then we will have inconsistent data. This could mean a number of things. Firstly not all the data will be found in a search. Secondly it means that data returned may not be accurate or trustworthy. Deleting also has this problem as we will need to delete all entries rather than just a single row.

By now it should be fairly clear that there is a case for normalisation. We shall now look at each of the normal forms and how to carry them out.