Database Design Methodology - Information and Communication Technology - HKDSE

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.

Database Design Methodology: Your Blueprint for Awesome Databases!

Hey everyone! Welcome to the world of database design. Think of yourself as an architect, but instead of buildings, you're designing organized systems to store information. A good design is everything! It makes sure your data is reliable, efficient, and easy to use. A bad design? Well, that's like building a house with no doors – a total mess!

In this chapter, we'll learn how to plan and structure a database from scratch. We'll cover how to draw blueprints for our data (called ER diagrams), how to organize it neatly (using a process called normalisation), and how to keep it secure. Don't worry if it sounds tricky, we'll break it down step-by-step with easy examples. Let's get started!

The Blueprint: Entity-Relationship (ER) Diagrams

Before you build anything, you need a plan. In database design, our plan is the Entity-Relationship Diagram (ERD or ER diagram). It's a visual way to show what data we need to store and how the different pieces of data are connected to each other.

The Building Blocks of an ER Diagram

An ER diagram has a few key components. Let's learn the symbols!

Entity: A person, place, object, or concept we want to store information about. Think of it as a noun. Examples: Student, Book, Course.
Attribute: A property or characteristic of an entity. Examples: A Student has a Student Name, a Book has a Title.
Key Attribute: An attribute that uniquely identifies an entity. This becomes the Primary Key. No two entities can have the same value for this attribute. Example: Student ID.
Relationship: How two or more entities are connected. Think of it as a verb. Example: A Student borrows a Book.

ER Diagram Symbols Cheat Sheet

Here are the official symbols you need to know for your exam. Make sure you use them correctly!

Entity: A rectangle

Attribute: An oval

Key Attribute: An oval with the name underlined

Relationship: A diamond

How Things Connect: Types of Relationships

The lines connecting entities in an ER diagram tell us the type of relationship, also known as cardinality. It describes how many instances of one entity can be related to instances of another entity.

1. One-to-One (1:1) Relationship

One instance of Entity A is connected to exactly one instance of Entity B.

Example: In a school, one Principal is in charge of one School.

2. One-to-Many (1:M) Relationship

One instance of Entity A can be connected to many instances of Entity B, but each instance of Entity B is connected to only one instance of Entity A.

Example: One Teacher can teach many Students in a class.

3. Many-to-Many (M:N) Relationship

Many instances of Entity A can be connected to many instances of Entity B, and vice-versa.

Example: Many Students can enrol in many different Courses. One student can take multiple courses, and one course can have multiple students.

What about those little lines and circles? Participation Constraints

Sometimes you'll see extra symbols on the lines connecting to the relationship diamond. These tell us if the relationship is compulsory or optional.

Mandatory Participation ( | ): The entity MUST participate in the relationship. It's like saying, "Every single student must be enrolled in at least one course."
Optional Participation ( O ): The entity does NOT have to participate. It's like saying, "A professor can be employed by the university, but they might be on research leave and not currently teaching any course."

Key Takeaway: Relationships

Relationships define the rules of how your data interacts. Getting the cardinality (1:1, 1:M, M:N) right is one of the most important steps in database design!

Creating an ER Diagram: A Step-by-Step Guide

Let's design a simple database for a school library. The scenario is: "Students can borrow many books, and a book can be borrowed by many students over time."

Step 1: Identify the Entities (the Nouns)

The main things are Student and Book.

Step 2: Identify the Attributes for each Entity

For Student: StudentID, StudentName, Class
For Book: BookID, Title, Author

Step 3: Identify the Primary Keys

StudentID uniquely identifies each student. BookID uniquely identifies each book. Let's underline them.

Step 4: Determine the Relationship (the Verb)

A student borrows a book. So, the relationship is Borrows.

Step 5: Determine the Cardinality

Can one student borrow many books? Yes.
Can one book be borrowed by many students? Yes (over time).

This is a classic Many-to-Many (M:N) relationship.

Step 6: Draw it!

You would draw a rectangle for 'Student', another for 'Book', and a diamond 'Borrows' in between, connecting them with lines. Add the ovals for the attributes around their entities. Finally, label the relationship lines with 'M' and 'N'.

Key Takeaway: Drawing ERDs

Follow the steps: Entities -> Attributes -> Primary Keys -> Relationships -> Cardinality (EAP-RC). This process turns a real-world problem into a clear database plan.

Cleaning Up the Mess: Data Redundancy and Normalisation

Imagine writing your full address on every single page of your textbook. That would be a waste of space and if you moved house, you'd have to update every single page! This problem is called data redundancy.

Data Redundancy is the unnecessary repetition of data in a database. It's bad because it can lead to:

Update Anomaly: If you change data in one place, you might forget to change it everywhere.
Insertion Anomaly: You can't add new information unless another piece of information is already there.
Deletion Anomaly: Deleting one piece of data might accidentally delete other, unrelated data.

The process we use to fix this and reduce redundancy is called Normalisation. Think of it as tidying up a very messy room and putting everything into the correct drawers.

There are several levels, or "normal forms", but for HKDSE, you need to know the first three.

First Normal Form (1NF): One Value Per Cell

The Rule: A table is in 1NF if every cell contains a single, atomic (indivisible) value. There should be no repeating groups.

Bad Example (Not in 1NF):

A student has multiple phone numbers in one cell.

How to fix it: Create a separate row for each phone number, repeating the student's information.

Second Normal Form (2NF): No Partial Dependencies

The Rule: The table must be in 1NF, AND every non-key attribute must depend on the entire primary key. This rule only matters when you have a composite primary key (a primary key made of two or more columns).

Partial Dependency: This is when a non-key attribute only depends on part of the composite primary key.

How to fix it: If you find a partial dependency, split the table. Move the partially dependent columns into a new table, along with the part of the key they depend on.

Third Normal Form (3NF): No Transitive Dependencies

The Rule: The table must be in 2NF, AND there are no transitive dependencies.

Transitive Dependency: This is when a non-key attribute depends on another non-key attribute, instead of depending on the primary key. (Think: A -> B -> C. C depends on B, and B depends on A. The key is A).

Example: A table has StudentID, TeacherName, and TeacherOffice. The TeacherOffice depends on the TeacherName, not the StudentID. This is a transitive dependency.

How to fix it: Split the table again! Move the transitively dependent columns (and the column they depend on) into a new table.

Memory Aid: "The Key, The Whole Key, and Nothing But the Key"

1NF: The data depends on the key. (Single values)
2NF: The data depends on the whole key. (No partial dependencies)
3NF: The data depends on nothing but the key. (No transitive dependencies)

Key Takeaway: Normalisation

Normalisation is a crucial process to create an efficient and reliable database by eliminating data redundancy and anomalies. Most well-designed databases are in 3NF.

Going Backwards: When to use Denormalisation

Wait, after all that cleaning, why would we want to make it messy again? Sometimes, having too many split-up tables (highly normalised) can make retrieving data slow. To get a simple report, the computer might have to join many tables together, which takes time.

Denormalisation is the process of intentionally adding some redundancy back into a database to improve query performance. It's a trade-off: you sacrifice some data purity to gain speed. This is an advanced technique used when speed is absolutely critical.

From Blueprint to Reality: Transforming ER Diagrams to Tables

Once your ERD is perfect and your data structure is normalised, it's time to create the actual database tables. Here are the rules:

Rule 1: Map Entities
Every entity in your ERD becomes a table in your database. The entity's name becomes the table's name.

Rule 2: Map Attributes
Every attribute becomes a column in that table. The key attribute becomes the primary key column.

Rule 3: Map Relationships (The Important Part!)

For a 1:M Relationship: Take the primary key from the '1' side and add it as a new column in the table on the 'M' side. This new column is called a foreign key. It links the two tables together.
For a M:N Relationship: You cannot link them directly! You must create a new table, called a linking table or junction table.
- This new table will contain the primary keys from BOTH of the original tables. These keys act as foreign keys.
- Together, these two foreign keys usually form a composite primary key for the linking table.
- This process successfully "resolves" the M:N relationship into two 1:M relationships.

Example: Our Library ERD

Our `Student` (M) -- `Borrows` -- (N) `Book` ERD would become THREE tables:

1. Student Table (StudentID, StudentName, Class)

2. Book Table (BookID, Title, Author)

3. Borrows Table (Linking Table) (StudentID, BookID, DateBorrowed)

Now, the `Student` table has a 1:M relationship with the `Borrows` table, and the `Book` table also has a 1:M relationship with the `Borrows` table. Problem solved!

Key Takeaway: Transforming ERDs

Mapping ERDs to tables is a mechanical process with clear rules. The most important rule to remember is how to handle M:N relationships by creating a linking table.

Who Gets the Keys? Data Privacy and Access Rights

A database contains valuable information. You wouldn't want a student to be able to see or change another student's grades, right? This is where data privacy comes in.

We protect data by controlling who can do what. This is managed through access rights (also called permissions).

Think of it like keycards in an office building. The CEO's card opens every door, while an intern's card might only open the main entrance and the break room.

In a database, we can grant specific permissions to different users, such as:

SELECT: The right to read/view data.
INSERT: The right to add new data.
UPDATE: The right to change existing data.
DELETE: The right to remove data.

By giving users only the minimum permissions they need to do their jobs (this is called the Principle of Least Privilege), we can ensure data privacy and security. For example, a library front-desk staff might have `SELECT` and `INSERT` rights on the borrowing records table, but they won't have the `DELETE` right on the main student information table.

Key Takeaway: Access Rights

Using access rights is fundamental to protecting data. It ensures that only authorised people can view or modify sensitive information, which is a core part of building a secure and trustworthy system.