Database Normalization

Normalization is the systematic process of organizing a database schema to reduce redundancy and maintain data integrity. It is achieved through step-by-step 'lossless decomposition'—breaking down large, poorly designed tables into smaller, well-structured ones without losing any information.

When a database is not normalized, data is repeated unnecessarily. This redundancy inevitably leads to Anomalies, which are critical errors that occur when you try to insert, update, or delete data.

The Three Deadly Anomalies

Insertion Anomaly: You cannot insert a new piece of data because it requires unrelated data to exist first. For example, if a table stores both Course and Student data together, you cannot create a new Course until at least one Student enrolls in it!
Update Anomaly: Because data is duplicated across multiple rows, changing a single fact (like a professor's phone number) requires updating every single row they appear in. If you miss even one row, your database becomes inconsistent and corrupt.
Deletion Anomaly: Deleting one piece of data accidentally wipes out unrelated data. For example, deleting the only student enrolled in a course might completely erase the course's existence from the database.

First Normal Form (1NF): Atomicity

The Golden Rule of 1NF: Every attribute (column) in a relation must contain only atomic, single-valued elements.

You cannot have nested groups, lists, or comma-separated strings inside a single cell.

Why is violating 1NF bad?

Imagine a Student table where the Phone_Numbers column contains multiple phone numbers separated by commas (e.g., '123-4567, 987-6543').

If you need to search for the student who owns the number '987-6543', you cannot do a simple SQL WHERE Phone = '987-6543'. You are forced to do expensive substring string-matching like LIKE '%987-6543%'. It also makes joining tables based on phone numbers completely impossible.

Student_Unnormalized (Violation)

Student_ID	Name	Phone_Numbers
101	Alice	123-4567, 987-6543
102	Bob	555-1234

The Solution: To achieve 1NF, we must split these multi-valued attributes. We extract the phone numbers into a separate child table that references the parent table.

Student (1NF)

Student_ID	Name
101	Alice
102	Bob

Student_Phones (1NF)

Student_ID	Phone
101	123-4567
101	987-6543
102	555-1234

Second Normal Form (2NF): No Partial Dependencies

The Golden Rule of 2NF: The table must be in 1NF, and it must contain no partial dependencies.

A partial dependency can *only* occur if your table has a Composite Primary Key (a primary key made of two or more columns). It happens when a non-key column depends on only *part* of that composite primary key, rather than the entire key.

Why is violating 2NF bad?

Consider an Enrollment table tracking which student is taking which course. The Primary Key must be the combination of {Student_ID, Course_ID} because one student takes many courses, and one course has many students.

However, the column Course_Name depends ONLY on Course_ID. It has absolutely nothing to do with the Student_ID.

The Anomaly: If 100 students enroll in 'Databases', the string 'Databases' is written into the database 100 times. If the course name changes to 'Advanced Databases', you have to update 100 rows. If you miss one, your database is corrupted (Update Anomaly)!

Enrollment_Unnormalized (Violation)

Student_ID	Course_ID	Course_Name
101	CS101	Databases
102	CS101	Databases

The Solution: We decompose the table. Everything that depends ONLY on Course_ID gets moved into its own Course table.

Enrollment (2NF)

Student_ID	Course_ID
101	CS101
102	CS101

Course (2NF)

Course_ID	Course_Name
CS101	Databases

Third Normal Form (3NF): No Transitive Dependencies

The Golden Rule of 3NF: The table must be in 2NF, and it must contain no transitive dependencies.

A transitive dependency occurs when a non-key column depends on another non-key column, which in turn depends on the primary key. Put simply: A → B and B → C. Therefore, C is transitively dependent on the primary key A through B.

Why is violating 3NF bad?

Consider an Employee table where the Primary Key is Emp_ID. We also store the employee's Dept_ID and the Dept_Location.

The Dept_Location depends entirely on the Dept_ID (not the employee!). But Dept_ID depends on Emp_ID.

The Anomaly: If the Sales Department (D1) moves from New York to Chicago, we have to update the Dept_Location column for every single employee in the Sales department. Furthermore, if a new department is created but has no employees yet, we cannot store its location (Insertion Anomaly).

Employee_Unnormalized (Violation)

Emp_ID	Dept_ID	Dept_Location
1	D1	New York
2	D1	New York

The Solution: We decompose the table to break the chain. The intermediate dependency (Dept_ID) becomes the primary key of a new Department table.

Employee (3NF)

Emp_ID	Dept_ID
1	D1
2	D1

Department (3NF)

Dept_ID	Dept_Location
D1	New York

Boyce-Codd Normal Form (BCNF): The Stricter 3NF

The Golden Rule of BCNF: For every non-trivial functional dependency α → β, the determinant (α) MUST be a Super Key.

BCNF is a stricter version of 3NF. It handles very specific edge cases where a table has overlapping candidate keys.

Why is violating BCNF bad?

Consider a Booking(Student_ID, Course_ID, Professor_ID) table.

Rule 1: A student has exactly 1 professor per course. So, the Primary Key is {Student_ID, Course_ID}.

Rule 2: A professor only teaches one specific course. This means Professor_ID → Course_ID.

The Violation: We have a dependency Professor_ID → Course_ID. But is Professor_ID a super key? No! A professor might teach multiple students, so their ID alone does not uniquely identify a row. Because a determinant is NOT a super key, it violates BCNF.

Booking_Unnormalized (Violation)

Student_ID	Course_ID	Professor_ID
101	CS101	P1
102	CS101	P1

The Solution: We decompose it so that Professor_ID becomes a proper primary key in its own table.

Student_Prof (BCNF)

Student_ID	Professor_ID
101	P1
102	P1

Prof_Course (BCNF)

Professor_ID	Course_ID
P1	CS101

Relationship Between Normal Forms Diagram

Placement Takeaways

Normalization is one of the most frequently tested topics in DBMS interviews. You must be able to instantly spot partial and transitive dependencies.

Pro Tip for Interviews

If you are asked 'What is the difference between 3NF and BCNF?', simply state: '3NF allows a non-prime attribute to determine a prime attribute. BCNF strictly forbids this by forcing the determinant to ALWAYS be a super key. Therefore, BCNF is strictly stronger than 3NF.'

Fill in the Blank

Sort the Concepts

Drag the dependency elimination rule into its correct Normal Form bucket:

1NF

2NF

3NF

Unsorted Items:

Eliminate multi-valued columns

Eliminate partial dependencies

Ensure atomic values

Eliminate transitive dependencies

Database Normalization
The Three Deadly Anomalies
First Normal Form (1NF): Atomicity
Why is violating 1NF bad?
Second Normal Form (2NF): No Partial Dependencies
Why is violating 2NF bad?
Third Normal Form (3NF): No Transitive Dependencies
Why is violating 3NF bad?
Boyce-Codd Normal Form (BCNF): The Stricter 3NF
Why is violating BCNF bad?
Placement Takeaways