databases
How to Normalize Databases (II)
author
Luis Paredes
published
Jul 25, 2023
Welcome to the second and final installment of our blog series on database normalization. In this part, we will build upon the foundational concepts discussed in the previous article and explore the remaining normal forms used to remove redundancy in database tables.
So, without further ado, let's get started!
Multivalued dependencies
So far, the previous NF have dealt with functional dependencies, i.e. dependencies involving only two items in a given record, however, there's another type of dependency that can arise involving more than only two columns, and 4NF and 5NF are meant to address this extra type of dependency.
The type of dependency we're talking about is called multivalued dependency and is defined as follows: a multivalued dependency exists when there are at least three attributes, let's say attributes X, Y and Z, where for a value of X there are well defined values of Y and Z, but Y values and Z values are independent from one another.
A typical example occurs when creating a database for products or services that have feature sets tied to some feature set but independent from another:
Let's consider a scenario in which we have a bicycle shop that sells different bicycle models, and depending on the model there's a limited set of color and rim size options:
- Model 1
- Available colors
- Yellow
- Blue
- Available rim sizes
- 16 in
- 20 in
- Available colors
- Model 2
- Available colors
- Black
- Grey
- Available rim sizes
- 24 in
- 26 in
- Available colors
- Model 3
- Available colors
- Black
- Red
- Available rim sizes
- 27.5 in
- 29 in
- Available colors
In this scenario we have a set X (Model #) that determines both the available colors and the rim sizes of each record, but the colors and the rim sizes are independent from one another.
Fourth Normal Form (4NF)
Using the example introduced in the previous section, we can create a table that holds the information regarding the bicycles available like this:
Model | Color | Rim Size |
---|---|---|
Model 1 | Yellow | 16 in |
Model 1 | Yellow | 20 in |
Model 1 | Blue | 16 in |
Model 1 | Blue | 20 in |
Model 2 | Black | 24 in |
Model 2 | Black | 26 in |
Model 2 | Grey | 24 in |
Model 2 | Grey | 26 in |
Model 3 | Black | 27.5 in |
Model 3 | Black | 29 in |
Model 3 | Red | 27.5 in |
Model 3 | Red | 29 in |
Notice that:
- Model + Color + Rim Size uniquely identify each record, i.e. they make the key
- The table is in 3NF because there are no non-key attributes
However, if we were to add a new rim size or color for any of the models we risk leaving the table in an inconsistent state if we forget to add all the possible combinations, and this is exactly the reason why the fourth normal form exists (4NF).
For a table to be in 4NF: any multivalue dependency in a table must be a multivalued dependency on the key.
If we use the definition to analyze the example, we find that there are two multivalued dependencies, namely:
- Model ↠ Colors
- Model ↠ Rim sizes
And we also notice that both depend on Model instead of the key (Model + Color + Rim size), which means that the table doesn't satisfy 4NF.
Conforming to 4NF
If we split the table as follows:
Model | Color |
---|---|
Model 1 | Yellow |
Model 1 | Blue |
Model 2 | Black |
Model 2 | Grey |
Model 3 | Black |
Model 3 | Red |
Model | Rim Size |
---|---|
Model 1 | 16 in |
Model 1 | 20 in |
Model 2 | 24 in |
Model 2 | 26 in |
Model 3 | 27.5 in |
Model 3 | 29 in |
We guarantee that each multivalued dependency depends only on the key, since the key for each of the tables is the Model.
As a result of this, we eliminate the possibility of overlooking any possible combination if we introduce a new Color or a new Rim Size for any of the models.
Join dependencies
If you're familiar with databases you've may heard of joins, either as relational algebra operation or as an SQL JOIN
operation. In both scenarios, the concept revolves around the idea of combining the records of two or more tables based on their common attributes.
The existence of this operator introduces a new type of dependency called join dependency, which can be described as follows: a table T is subject to a join dependency if that table can be recreated by joining several tables, each having a subset of the table T.
Notice that, if one of the tables in the dependencies has all of the attributes of T, the join dependency is called trivial.
Join dependencies are addressed by the last normal form focused on removing redundancy in databases.
Fifth Normal Form (5NF)
If we take into account the definitions in the previous section, the fifth normal form (5NF), also known as projection–join normal form (PJ/NF) can be described as follows: the table cannot be described as the logical result of joining some other tables together.
Resources
- Functional dependency on Wikipedia
- Multivalued dependency on Wikipedia
- Fourth Normal Form on Wikipedia
- Fifth Normal Form on Wikipedia
- Join dependency on Wikipedia
- Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF by Decomplexify on YouTube
Conclusion
In conclusion, database normalization is a crucial process for organizing data efficiently and reducing redundancy in tables. In this second part of our blog series, we delved into the concepts of multivalued dependencies and join dependencies, which led us to explore the Fourth Normal Form (4NF) to address the former and the Fifth Normal Form (5NF) to address the latter.
By adhering to these normal forms, we can create well-structured and efficient databases that minimize data redundancy and improve data integrity.
I hope this series has provided valuable insights into database normalization and helps you apply these principles to your own projects.