Normalize

Relational database design is supported by a rigorous process called normalization. Normalization is based upon applied mathematics, specifically Set Relationship Mathematics or the mathematics of sets. Set relationship mathematics was referred to in the 60's and 70's as "the New Math."

Set relationship mathematics was first applied to the design of databases by Dr. E. F. Codd in the late 1960s and early 1970s at IBM. Dr. Codd published the original paper, "A Relational Model for Large Shared Data Banks" in the Communications of the ACM Journal, Volume 13, Number 6, in June 1970.

"The whole key and nothing but the key, so help me Codd."

Although normalization has been around for 50 years, it is still not broadly understood and usually poorly executed. Everybody has a reason to "denormalize" or just not do it in the first palce because [insert bullshit reason here]. Then there's always, "normalize, what's that?"

In the design of databases, sets are collections of data that describe an organization, person, location, activity or thing. Sets may be related to other sets. The relationships that exist between sets, called "Relations", produce additional data that describe, date, qualify and quantify the relationship. Data sets are stored as tables in a relational database management system.

When a table is normalized, it is said to be "free from defects." A normalized table is also simple, stable and easily changed with little or no impact upon the application it supports. A normalized database design is also very SQL-friendly. SQL-friendly means the Structured Query Language can correctly store and retrieve data with little or no additional programming algorithms.

There are three tests you apply to an entity (table) to determine if it is normalized (the 3rd Normal Form).

• Test for 1st Normal Form - Test for Repeating Groups - eliminates complex processing algorithms

• Test for 2nd Normal Form - Test for Partial Dependency - eliminates redundant data

• Test for 3rd Normal Form - Test for Non-Dependency - validates data is in the correct entity

There is one additional test, that is not part of Normalization but extremely necessary and that is the test for "Chronological Correctness." This is a test to ensure that data is being stored at the same time that it is created in the business process.

The main reason to normalize is to identify all the implied data and relationships that are not explicitly stated in the requirements. A form is only a view of a database. It only contains some of the data from the database. It is the role of the Data Analyst to identify all the explicit data, but also to identify all of the data and relationships that are implied by the application and to foresee all of the ways those data and relationships will be used.