Data Table Relation's Normalization
Introduces functional dependencies, and the database paradigms 1NF, 2NF, 3NF, BCNF, and 4NF. and use examples to explain their concepts and connections.
Functional Dependencies
In relational databases, functional dependencies describe the dependency relationships between attributes in a relational schema. If, given a relation , the values of attribute set uniquely determine (through conditions or function arguments, for example) a value in attribute set , then we say that is functionally dependent on , typically denoted by the symbol .
Specifically, if for every record in relation , whenever two records have the same values for attribute set , they must also have the same values for attribute set , then we say that is functionally dependent on .
For example, suppose we have a relation containing attribute set , where the values of attribute uniquely determine the values of attribute , i.e., . Then, we say that attribute is functionally dependent on attribute .
Functional dependency is an important concept in relational database design. It helps in understanding the relationships between data, normalizing database designs to reduce redundant data, while ensuring the consistency and integrity of data.
Functional dependencies can be classified into trivial functional dependencies and non-trivial functional dependencies.
If and , then is called a trivial functional dependency. Otherwise, it is a non-trivial functional dependency.
For functional dependencies, the dependency relationship between and can also be further classified into partial functional dependency, full functional dependency, and transitive functional dependency.
- Partial Functional Dependency: Let be two attribute sets of relation . If and , where there exists such that , then is said to be partially functionally dependent on , denoted as . In other words, partial functional dependency means that some attributes can be removed without affecting the dependency.
For example: In a student basic information table (student ID, passport number, name), of course, the student ID attribute values are unique. In relation , (student ID, passport number) → (name), (student ID) → (name), (passport number) → (name); therefore, the name is partially functionally dependent on (student ID, passport number).
- Full Functional Dependency: Let be two attribute sets of relation . If , , and for all , , then is said to be fully functionally dependent on , denoted as . In other words, for full functional dependency, no extra attributes can be deleted, otherwise, the property of dependency will not be maintained.
Example: In a student basic information table (student ID, class, name), assuming different classes can have the same student ID, and student IDs within a class cannot be the same. In relation , (student ID, class) → (name), but (student ID) → (name) does not hold, and (class) → (name) does not hold either. Therefore, the name is fully functionally dependent on (student ID, class).
- Transitive Functional Dependency: Let be three attribute sets of relation , where , and for all , . Then is said to be transitively functionally dependent on . This means that indirectly depends on .
Example: In relation (student ID, dormitory, fee), (student ID) → (dormitory), dormitory ≠ student ID, (dormitory) → (fee), fee ≠ dormitory. Thus, it meets the requirements of transitive functional dependency.
Multi-Valued Dependencies
The previous section introduced functional dependencies, which are actually a special case of multi-valued dependencies. Multi-valued dependencies extend the concept of functional dependencies.
Let be a relational schema over the attribute set , and let , , and be subsets of such that . A multi-valued dependency holds if and only if for any relation in , each value on corresponds to a set of values on , and this set of values depends only on the values of and is independent of the values of .
Consider a relational schema with the attribute set , representing students' course enrollments and the textbooks they use. If a student uses multiple textbooks for a single course, and the combination of student and course uniquely determines the textbooks used, while the combination of student and course is independent, then we have a multi-valued dependency.
Suppose we have the following relation instance:
In this example, the choice of textbook depends on the combination of student and course. For instance, Alice uses both Calculus and Linear Algebra textbooks for her Math course, while she uses Mechanics for her Physics course. However, whether Linear Algebra and Calculus are chosen depends solely on whether the course is Math. This scenario demonstrates a multi-valued dependency, where the combination of student and course uniquely determines the set of textbooks used, while the relationship between student and course is independent, yet textbooks are not independent of courses as one course can have multiple textbooks.
Therefore, in this scenario, there exists a multi-valued dependency relationship .
Keys
- Let be an attribute or a combination of attributes in , and , then is a candidate key for .
- If there are multiple candidate keys, one of them can be designated as the primary key.
- Any attribute set that contains a candidate key is called a prime attribute. Otherwise, it is a non-prime attribute.
- If the entire attribute set constitutes the candidate key, then this attribute set is called a superkey.
- Both primary keys and candidate keys are commonly referred to as keys.
Database Normalization
1NF
(First Normal Form) is one of the fundamental normal forms in relational databases. It requires that each attribute in a relation schema is atomic, meaning it cannot be further divided. In other words, each attribute in the relation schema should be single-valued, rather than containing multiple values or complex data types.
Specifically, requires that each cell in the relation contains only one value, rather than multiple values or complex data types. This helps ensure the atomicity of data, simplifying data processing and querying.
For example, consider a student table containing names and phone numbers. If phone numbers are stored as a comma-separated list of multiple numbers, then this table does not conform to . To adhere to , phone numbers should be split into separate attributes, with each attribute containing only one phone number.
2NF
(Second Normal Form) is another normal form in relational databases, built upon the foundation of the first normal form (). requires that each non-prime attribute in a relation schema is fully functionally dependent on the candidate key, rather than partially dependent on the candidate key.
Specifically, if a relation schema has multiple attributes forming the candidate key, then each non-prime attribute should depend on all combinations of these attributes, rather than just depending on some of them.
For example, consider a relation schema containing the following attributes: StudentID, CourseID, CourseName, StudentName, and Grade. In this schema, (StudentID, CourseID) forms the candidate key, representing the enrollment of students in courses.
In this example, the StudentName attribute is a non-prime attribute and is partially dependent on the candidate key (StudentID), rather than fully dependent on all attributes of the candidate key (StudentID, CourseID). For instance, the name of student Alice is only associated with Student ID 101, irrespective of the course enrollment.
Therefore, this relation schema does not satisfy the second normal form (). To adhere to , we should remove the StudentName attribute from this relation schema and place it in a separate table, where the candidate key is . This way, the student name becomes fully dependent on the candidate key, rather than partially dependent on it.
3NF
Given a relation schema and no set of attributes , attribute group , and non-candidate attribute such that , it belongs to . In other words, if a relation is in and each non-candidate attribute does not transitively depend on the candidate key, then it is in .
Let's continue with the example of student course enrollment mentioned earlier and attempt to conform to the third normal form ().
In this example, we have already removed the student name from the main table and placed it in a separate table as follows:
Student Table:
Enrollment Table:
Now, let's see if it conforms to . In this relation schema, we have a transitive dependency: Student ID (StudentID) determines Student Name (StudentName), while Course ID (CourseID) determines Student ID (StudentID). Hence, there exists a transitive functional dependency: .
To conform to , we need to eliminate this transitive dependency. One way to do this is by creating a new table that maps and to to eliminate this transitive dependency. This way, we can ensure that each non-prime attribute is fully dependent on the primary key.
Student Enrollment Relation Table:
Student Table:
Course Table:
Now, each table conforms to the third normal form (). The student table and the course table are associated with the student enrollment relation table without any transitive dependency.
BCNF
(Boyce-Codd Normal Form) is a more stringent form of normalization in relational databases, resulting from normalizing relation schemas further from the third normal form (). requires that every non-trivial functional dependency (non-trivial , where is not a superset of ) is determined by the keys of the relation.
In other words, if is a relation schema and is a non-trivial functional dependency of , and does not contain any candidate keys, then does not conform to , meaning every determinant must include a key.
For example, consider a scheduling table now, recording student ID, course ID, and teacher ID. A teacher teaches only one course, and a course can be taught by multiple teachers, and each student selects a specific course, which corresponds to a fixed teacher. Assuming the candidate key is student ID, course ID. We can list the following functional relationships:
This relation does not conform to because we find that in the third functional dependency, does not contain any keys. While the first and second have keys.
4NF
is a form of database normalization that further emphasizes the elimination of multi-valued dependencies on top of . A relation schema satisfies if, in addition to satisfying the requirements of , it also has no non-trivial multi-valued dependencies. Non-trivial multi-valued dependencies refer to those where neither the left side nor the right side is a superkey of the relation.
Let's consider an example that complies with but not with .
Suppose we have a relation schema containing attributes: Student, Course, Grade, and Teacher. And there exist the following functional dependencies:
Assume that the candidate key for the relation schema is .
This schema conforms to because every functional dependency is determined by the candidate key.
However, it does not conform to due to the presence of the third multi-valued dependency. This is because each student may attend multiple courses, and each course may have different teachers.
Therefore, although this relation schema satisfies , it does not satisfy .
Relationship
Understanding from to is progressive. The relationship between them is as follows:
- eliminates partial functional dependencies of non-prime attributes on the candidate key to achieve .
- eliminates transitive functional dependencies of non-prime attributes on the candidate key to achieve .
- eliminates partial and transitive functional dependencies of prime attributes on the candidate key to achieve .
- eliminates non-trivial and functional dependencies of multi-valued dependencies to achieve .