databases

How to Normalize Databases (II)

author

Luis Paredes

published

Jul 25, 2023

Welcome to the second and final installment of our blog series on database normalization. In this part, we will build upon the foundational concepts discussed in the previous article and explore the remaining normal forms used to remove redundancy in database tables.

So, without further ado, let's get started!

Multivalued dependencies

So far, the previous NF have dealt with functional dependencies, i.e. dependencies involving only two items in a given record, however, there's another type of dependency that can arise involving more than only two columns, and 4NF and 5NF are meant to address this extra type of dependency.

The type of dependency we're talking about is called multivalued dependency and is defined as follows: a multivalued dependency exists when there are at least three attributes, let's say attributes X, Y and Z, where for a value of X there are well defined values of Y and Z, but Y values and Z values are independent from one another.

A typical example occurs when creating a database for products or services that have feature sets tied to some feature set but independent from another:

Let's consider a scenario in which we have a bicycle shop that sells different bicycle models, and depending on the model there's a limited set of color and rim size options:

  • Model 1
    • Available colors
      • Yellow
      • Blue
    • Available rim sizes
      • 16 in
      • 20 in
  • Model 2
    • Available colors
      • Black
      • Grey
    • Available rim sizes
      • 24 in
      • 26 in
  • Model 3
    • Available colors
      • Black
      • Red
    • Available rim sizes
      • 27.5 in
      • 29 in

In this scenario we have a set X (Model #) that determines both the available colors and the rim sizes of each record, but the colors and the rim sizes are independent from one another.

Fourth Normal Form (4NF)

Using the example introduced in the previous section, we can create a table that holds the information regarding the bicycles available like this:

ModelColorRim Size
Model 1Yellow16 in
Model 1Yellow20 in
Model 1Blue16 in
Model 1Blue20 in
Model 2Black24 in
Model 2Black26 in
Model 2Grey24 in
Model 2Grey26 in
Model 3Black27.5 in
Model 3Black29 in
Model 3Red27.5 in
Model 3Red29 in

Notice that:

  • Model + Color + Rim Size uniquely identify each record, i.e. they make the key
  • The table is in 3NF because there are no non-key attributes

However, if we were to add a new rim size or color for any of the models we risk leaving the table in an inconsistent state if we forget to add all the possible combinations, and this is exactly the reason why the fourth normal form exists (4NF).

For a table to be in 4NF: any multivalue dependency in a table must be a multivalued dependency on the key.

If we use the definition to analyze the example, we find that there are two multivalued dependencies, namely:

  • Model ↠ Colors
  • Model ↠ Rim sizes

And we also notice that both depend on Model instead of the key (Model + Color + Rim size), which means that the table doesn't satisfy 4NF.

Conforming to 4NF

If we split the table as follows:

Available Colors
ModelColor
Model 1Yellow
Model 1Blue
Model 2Black
Model 2Grey
Model 3Black
Model 3Red
Available Rim Sizes
ModelRim Size
Model 116 in
Model 120 in
Model 224 in
Model 226 in
Model 327.5 in
Model 329 in

We guarantee that each multivalued dependency depends only on the key, since the key for each of the tables is the Model.

As a result of this, we eliminate the possibility of overlooking any possible combination if we introduce a new Color or a new Rim Size for any of the models.

Join dependencies

If you're familiar with databases you've may heard of joins, either as relational algebra operation or as an SQL JOIN operation. In both scenarios, the concept revolves around the idea of combining the records of two or more tables based on their common attributes.

The existence of this operator introduces a new type of dependency called join dependency, which can be described as follows: a table T is subject to a join dependency if that table can be recreated by joining several tables, each having a subset of the table T.

Notice that, if one of the tables in the dependencies has all of the attributes of T, the join dependency is called trivial.

Join dependencies are addressed by the last normal form focused on removing redundancy in databases.

Fifth Normal Form (5NF)

If we take into account the definitions in the previous section, the fifth normal form (5NF), also known as projection–join normal form (PJ/NF) can be described as follows: the table cannot be described as the logical result of joining some other tables together.

Resources

Conclusion

In conclusion, database normalization is a crucial process for organizing data efficiently and reducing redundancy in tables. In this second part of our blog series, we delved into the concepts of multivalued dependencies and join dependencies, which led us to explore the Fourth Normal Form (4NF) to address the former and the Fifth Normal Form (5NF) to address the latter.

By adhering to these normal forms, we can create well-structured and efficient databases that minimize data redundancy and improve data integrity.

I hope this series has provided valuable insights into database normalization and helps you apply these principles to your own projects.