Relational Databases: Functional Dependency and Normalization

Abstract

Functional dependencies and Normalization play an important role in relational database design. Functional dependencies are key to establishing the relationship between key and non-key attributes in a relation. Normalization process works towards removing anomalies in a relation and prevents data redundancy. This paper, intended to be a graduate research paper, establishes the definitions of these concepts. The paper introduces Functional Dependency and explains its inference rules. The paper also introduces Normalization and various normal forms 1NF thru 5NF including the BCNF. The paper also explains how Functional Dependencies and Normalization are related, why they are important with regards to relational databases, and advantages of designing a normalized database.

Relational Databases: Functional Dependency and Normalization

Definitions and Concepts

Functional Dependency

A functional dependency is a constraint between two sets of attributes from the database. A functional dependency, represented by X → Y, between two sets of attributes X and Y that are subsets of a relation R specifies a constraint that, for any two tuples t1 and t2 in R that have t1[X] = t2[X],they must also have t1[Y] = t2[Y].

This means the values of the set of attributes Y of a tuple in R are determined by the set of attributes X. In other words, the values of the set of attributes X functionally determine the values of the set of attributes Y. We can also say that Y is functionally dependent on Y.

The set of attributes X on the left-hand side of the functional dependency is called determinant and the set of attributes Y on the right-hand side of the functional dependency is called dependent. Despite the mathematical definition a functional dependency cannot be determined automatically. It is a property of the semantics of attributes – the database designers will have to understand how the attributes are related to each other to specify a functional dependency. (Elmasri, Ramez and Shamkant B. Navathe. 2006)

Example

Consider the example of an SSN (Social Security Number) database. Every individual has a unique SSN. So, the other attributes of the relation, like name, address etc. can be determined using the SSN. That makes SSN a determinant, and name, address, the dependents – thus establishing the functional dependencies:

SSN  name

SSN  address

Inference Rules for Functional Dependency

Armstrong’s axioms are a set of inference rules used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong.

Axiom of reflexivity: if Y is a subset of X, then X determines Y

If Y is a subset of X then X  Y

Axiom of augmentation: if X determines Y, then XZ determines YZ for any Z

 If X  Y, then XZ  YZ

Axiom of transitivity: if X determines Y and Y determines Z, then X must determine Z

 If X  Y and Y  Z, then X  Z

Union: if X determines Y and X determines Z then X must also determine Y and Z

 If X  Y and X  Z, then X  YZ

Decomposition: if X determines Y and Z, then X determines Y and X determines Z separately

 If X  YZ, then X  Y and X  Z

Normalization

Database Normalization is a process that allows the storage of data without unnecessary redundancy and thereby eliminate data inconsistency. A normalized database eliminates anomalies in updating, inserting, and deleting data, which improves the efficiency and effectiveness of the database. Users can maintain and retrieve data from a normalized database without difficulty. Data Normalization can be used by the designer of a database to identify and group together related data elements (attributes) and establish relationships between the groups.

Database Normalization concept and its ‘Normal Forms’ were originally invented by Edgar Codd, the inventor of the relational model. The ‘Normal Forms’ provide the criteria for determining a table’s degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is.

First Normal Form (1NF)

An entity type or table is in 1NF when each of its attributes contain simple values which are atomic and contains no repeating groups of data. The domain of an attribute in an 1NF table must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.

Example

Consider the address attribute in a sales database. It is not an atomic attribute, because it is made up of atomic attributes as street, city, state and zip. For the relation to be in 1NF, the appropriate database design should have the atomic attributes street, city, state and zip instead of an address attribute.

Un-Normalized: sales (date, order_no, product_no, product_description, price, quantity_sold, cust_name, cust_address)

1NF: sales (date, order_no, product_no, product_description, price, quantity_sold, cust_name, cust_street, cust_city, cust_state, cust_zip)

Second Normal Form (2NF)

An entity type or table is in 2NF when it is in 1NF and all its non-key attributes depend on the whole key (i.e., functional dependency). There cannot be partial dependencies.

Example

Continuing with the sales database, the order_no and the product_no form the composite key for the table. There are partial dependencies – date is dependent on order_no, but not product_no – which violates the requirement for 2NF. The product_description is dependent on product_no and not on order_no. Removing these partial dependencies will result in 2NF.

1NF: sales (date, order_no, product_no, product_description, price, quantity_sold, customer_name, customer_street, customer_city, customer_state, customer_zip)

2NF: order (date, order_no, cust_no);

product (product_no, product_description, price);

order_detail (order_no, product_no, quantity_sold);

customer (cust_no, cust_name, cust_street, cust_city, cust_state, cust_zip)

Third Normal Form (3NF)

An entity type or table is in 3NF when it is in 2NF and non-key attributes do not depend on other non-key attributes (i.e., there is no transitive dependency).

Example

Continuing with the sales database, the non-key attributes cust_city and cust_state are dependent on cust_zip which is a non-key attribute. Creating a separate zip table will transform the design into 3NF, where in there are no more dependencies between non-key attributes.

2NF: order (date, order_no, cust_no);

product (product_no, product_description, price);

order_detail (order_no, product_no, quantity_sold);

customer (cust_no, cust_name, cust_street, cust_city, cust_state, cust_zip)

3NF: order (date, order_no, cust_no);

product (product_no, product_description, price);

order_detail (order_no, product_no, quantity_sold);

customer (cust_no, cust_name, cust_street, zip_code);

zip (zip_code, city, state)

Boyce Codd Normal Form (BCNF)

An entity type or table is in BCNF when it is in 3NF and all candidate keys defined for the relation satisfy the test for third normal form.

Example

Continuing with the sales database, all the candidate keys already satisfy the 3NF requirements.

Fourth Normal Form (4NF)

An entity type or table is in 4NF when it is in BCNF and there are no non-trivial multi-valued dependencies. To move from BCNF to 4NF, remove any independently multi-valued components of the primary key to two new parent entities.

Example

For example, a professor can teach multiple subjects and can also mentor multiple students. To be in 4NF, the professor to subjects should be a separate relation and professor to students should be a separate relation – since they are independent of each other

Fifth Normal Form (5NF)

To be in 5NF, a relation decomposed into two relations must have lossless-join property, which ensures that no spurious tuples are generated when relations are reunited through a natural join.

Example

In the sales database example, when the sales database was split into order and product, the natural join of those two tables does not result in loss of data (tuples).

(Russell, Gordon. Chapter 4; Nguyen Kim Anh, Relational Design Theory)

Importance of Functional Dependency and Normalization to Relational Model

How are they related

Normalization theory draws heavily on the theory of functional dependencies. When a database designer sets out to design a database, it is essential to understand the semantics of the data – how the attributes are related to one another. This helps in establishing the functional dependencies between attributes. Once the functional dependencies are identified, the design the database in to a ‘normal form’ of the highest order possible is easier. Rules for each normal form, starting from the 1NF are invariably framed around maintaining the functional dependencies and are also based on the inference rules for functional dependencies (refer Inference Rules section). For example, to be in 2NF the non-key attributes should be dependent on the whole-key, which means the functional dependencies should be satisfied. Similarly, to be in 3NF, transitive dependency should be removed, which can be done if the functional dependencies are established correctly.

In other words, database normalization process ensures an efficient organization of data in database tables, which results in guaranteeing that data dependencies make sense, and also reducing the space occupied by the database via eliminating redundant data.

Why are they necessary for Relational Database model?

Functional dependencies play an important role in relational database design. They are used to establish keys that are used to define normal forms for relations. In addition, they help in deriving constraints based on the relationships between attributes. As a database grows in size and complexity it is essential that order and organization be maintained to control these complexities and minimize errors and redundancy in the associated data. This goal is managed by normalization. Database normalization minimizes data duplication to safeguard databases against logical and structural problems, such as data anomalies.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Normalization can help keep the data free of errors and can also help ensure that the size of the database doesn’t grow large with duplicated data. Normalization permits us to design our relational database tables so that they “(1) contain all the data necessary for the purposes that the database is to serve, (2) have as little redundancy as possible, (3) accommodate multiple values for types of data that require them, (4) permit efficient updates of the data in the database, and (5) avoid the danger of losing data unknowingly (Wyllys, R. E., 2002).”

The resulting normalized database is highly efficient, which can be characterized by –

Increased Consistency: Information is stored in one place and one place only, reducing the possibility of inconsistent data.

Easier object-to-data mapping: Highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions.

Moreover, a normalized database is advantageous when operations will be write-intensive or when ACID (Atomicity, Consistency, Isolation, Durability) compliance is required. Some advantages include:

Updates run quickly since no data being duplicated in multiple locations.

Inserts run quickly since there is only a single insertion point for a piece of data and no duplication is required.

Tables are typically smaller than the tables found in non-normalized databases. This usually allows the tables to fit into the buffer, thus offering faster performance.

Data integrity and consistency is an absolute must if the database must be ACID compliant. A normalized database helps immensely with such an undertaking.

Searching, sorting, and creating indexes can be faster, since tables are narrower, and more rows fit on a data page.

Minimizes/avoids data modification issues.

(https://en.wikipedia.org/wiki/ACID_(computer_science))

Summary

The paper defined the concept of functional dependency, which is the basic tool for analyzing relational schemas, and discussed some of its properties. Functional dependencies specify semantic constraints among the attributes of a relation schema. Next it described the normalization process for achieving good designs It presented examples to illustrate how by using the general definition of the normal forms, a given relation may be analyzed and decomposed to eventually yield a set of relations in 3NF. The paper also touches not often used BCNF, 4NF and 5NF normal forms.

Then the paper explains how functional dependencies and normalization are inter-related in the design of a relational model database. It explains the importance of functional dependency and normalization in the design of a relational database. A normalized database is highly efficient and has many advantages.

References

Wyllys, R. E., 2002. Database management principles and applications

Elmasri, Ramez and Shamkant B. Navathe. 2006. Fundamentals of Database Systems. 5th ed. Reading, MA: Addison-Wesley

Russell, Gordon. Chapter 4 – Normalization. Database eLearning

Nguyen Kim Anh, Relational Design Theory. OpenStax CNX

Gaikwad, A.S., Kadri, F.A., Khandagle, S.S., Tava, N.I. (2017) Review on Automation Tool for ERD Normalization. International Research Journal of Engineering and Technology (IRJET) [Online]. 4 (2), pp. 1323-1325. [Accessed 07 May 2017]. Available from: https://www.irjet.net/archives/V4/i2/IRJET-V4I2259.pdf

https://en.wikipedia.org/wiki/ACID_(computer_science)

Tables

Un-normalized Table: sales

sales

date

order_no

cust_no

product_description

price

quantity_sold

cust_name

cust_address

12/12/2018

1001

A320

MP3

10.00

8

Tom

1 Main St, Hartford, CT 06106

12/12/2015

1001

B101

Ipod

100.00

4

Tom

1 Main St, Hartford, CT 06106

01/05/2019

1002

C101

Blu Ray

80.00

3

Aaron

1 Holy Lane, Manchester, 06040

1NF Table: sales

sales

date

order_no

product_no

product_description

price

quantity_sold

cust_name

….

12/12/2018

1001

A320

MP3

10.00

8

Tom

12/12/2015

1001

B101

Ipod

100.00

4

Tom

01/05/2019

1002

C101

Blu Ray

80.00

3

Aaron

Continued..

sales

….

cust_street

cust_city

cust_state

cust_zip

1 Main St

Hartford

CT

06106

1 Main St

Hartford

CT

06106

1 Holy Lane

Manchester

CT

06040

2NF Table: order

order

date

order_no

cust_no

12/12/2018

1001

101

01/05/2019

1002

102

2NF Table: product

product

product_no

product_description

price

A320

MP3

10.00

B101

Ipod

100.00

C101

Blu Ray

80.00

2NF Table: order_detail

order_detail

order_no

product_no

quantity_sold

1001

A320

8

1001

B101

4

1002

C101

3

2NF Table: customer

customer

cust_no

cust_name

cust_street

cust_city

cust_state

cust_zip

101

Tom

1 Main Street

Hartford

CT

06106

102

Aaron

1 Holy Lane

Manchester

CT

06040

3NF Table: customer

customer

cust_no

cust_name

cust_street

zip_code

101

Tom

1 Main Street

06106

102

Aaron

1 Holy Lane

06040

3NF Table: zip

zip

zip_code

city

State

06106

Hartford

CT

06040

Manchester

CT