Welcome to the World of Relational Databases!
Hey there! Ever wondered how websites like Instagram or YouTube keep track of millions of users, photos, and comments so efficiently? They don't just use a giant spreadsheet! They use something much smarter and more organised: a relational database. In this chapter, we're going to pull back the curtain and see how it all works. Don't worry if this sounds complicated; we'll break it down with simple examples you can relate to. Let's get started!
The Basic Building Blocks of a Database
Imagine you're building with LEGO. You need to know what the different bricks are before you can build a castle. It's the same with databases! Let's learn the basic bricks.
Entity: The 'Things' We Store Data About
An entity is simply a person, place, object, event, or concept that you want to store information about. It's the main subject of a table.
- Example: In a school database, the main entities would be Student, Teacher, and Course.
- Example: In a library database, the entities might be Book and Member.
Attribute: The Details of the 'Things'
An attribute is a property or characteristic of an entity. Think of attributes as the columns in a table that describe the entity.
- Example: For the Student entity, the attributes could be StudentID, Name, DateOfBirth, and Class.
- Example: For the Book entity, attributes could be BookID, Title, and Author.
Analogy Time: Your Phone Contacts
Think about your phone's contact list. Each contact is an entity. The details you save for each contact, like their name, phone number, and email address, are the attributes. Simple, right?
Key Takeaway
Entities are the nouns (the "things") you want to track, and attributes are the adjectives (the "descriptions") of those things.
The Heart of the Database: Keys!
Imagine a school with three students named "Chan Tai Man". How do you know which one you're talking about? You need a unique identifier. In databases, we use keys for this.
Primary Key (The Boss Key)
A primary key (PK) is an attribute (or a set of attributes) that uniquely identifies every single record in a table. It's the ultimate unique ID.
A Primary Key has two very important rules:
- It must be UNIQUE for every record. (No two students can have the same Student ID).
- It CANNOT BE NULL (empty). (Every student MUST have a Student ID).
- Example: In a `Students` table, the StudentID would be a perfect primary key. A student's name isn't a good choice because names can be repeated.
Candidate Key (The Potential Bosses)
A candidate key is any attribute that could potentially be the primary key. It must also be unique for every record. A table can have several candidate keys, but you can only choose one to be the official primary key.
- Example: In a `Students` table, both StudentID and HKID_Number are unique for each student. Both are candidate keys. We might choose StudentID as the primary key because it's more relevant to the school.
Foreign Key (The Bridge Key)
A foreign key (FK) is the secret to making databases "relational". It's a key used to link two tables together. A foreign key is simply a copy of the primary key from one table that you place into another table.
- Example: Let's say we have a `Book_Loans` table to track which student borrowed which book. We would include StudentID (from the `Students` table) and BookID (from the `Books` table). Here, StudentID and BookID are foreign keys in the `Book_Loans` table. They act as bridges, connecting a specific loan record to a specific student and a specific book.
Memory Aid: Keys
Primary Key = The Principal, unique ID for a table.
Foreign Key = A key from a Foreign table, used to build a bridge.
Quick Review: Key Types
Primary Key: The main, unique ID for a table (e.g., `StudentID` in `Students` table).
Candidate Key: Any field that could be a primary key (e.g., `StudentID` or `HKID_Number`).
Foreign Key: A primary key from another table used to link them (e.g., `StudentID` in `Book_Loans` table).
Connecting the Dots: Relationships & Organisation
The whole point of a "relational" database is to store data in separate, organised tables and then create relationships between them. This avoids chaos and keeps data tidy.
Creating a Simple Relational Database
Imagine you run a small shop and use a single spreadsheet to track sales. It might look like this:
SaleID: 101, CustomerName: Tom, CustomerPhone: 555-1234, Item: Apple, Price: 5, Quantity: 10
SaleID: 102, CustomerName: Mary, CustomerPhone: 555-5678, Item: Orange, Price: 4, Quantity: 20
SaleID: 103, CustomerName: Tom, CustomerPhone: 555-1234, Item: Orange, Price: 4, Quantity: 15
Notice the problem? Tom's name and phone number are repeated. If he changes his number, you have to find and update it everywhere! This is called data redundancy. A relational database fixes this.
Step 1: Identify the entities. Here, we have `Customers`, `Products`, and `Sales`.
Step 2: Create a separate table for each entity.
- `Customers` Table: `CustomerID` (PK), `CustomerName`, `CustomerPhone`
- `Products` Table: `ProductID` (PK), `ItemName`, `Price`
- `Sales` Table: `SaleID` (PK), `CustomerID` (FK), `ProductID` (FK), `Quantity`
Step 3: Establish relationships using foreign keys.
The `Sales` table uses `CustomerID` to point to the `Customers` table and `ProductID` to point to the `Products` table. Now, Tom's information is stored only once. If he changes his phone number, you only need to update it in one place! This is how we organise data sensibly and establish relationships.
Keeping Data Clean and Trustworthy: Data Integrity
Data integrity is all about making sure the data in your database is accurate, consistent, and reliable. Think of these as the strict "rules" the database must follow.
Entity Integrity (The 'No Nameless Citizen' Rule)
This rule is very simple: the primary key of a table cannot be NULL (empty). This ensures that every record has a unique identity and can be found. It's like saying every citizen in a country must have an ID number.
Referential Integrity (The 'No Ghost Reference' Rule)
This is a rule for foreign keys. It states that a foreign key must either:
- Match an existing primary key value in the table it's pointing to.
- Be NULL (if allowed).
Example: You cannot add a record to the `Book_Loans` table with a `StudentID` of 'S1234' if there is no student with that ID in the `Students` table. This prevents "orphan" records that point to nothing, ensuring all links are valid.
Domain Integrity (The 'Right Box' Rule)
This rule ensures that all values in a column are of the correct data type and are taken from a set of allowed values, known as the domain.
- A domain is the set of all possible legal values for an attribute.
- Example: The domain for a 'Gender' attribute could be the set {'M', 'F'}. Domain integrity would prevent someone from entering 'Cat' into this field.
- Example: A 'Score' attribute might have a domain of integers from 0 to 100. You couldn't enter 101 or "Hello".
Key Takeaway
Integrity rules are the police of the database! Entity Integrity protects the Primary Key, Referential Integrity protects the Foreign Key relationships, and Domain Integrity protects the data in the columns.
Making Things Fast: The Index
Imagine trying to find a specific word in a 1,000-page book with no index. You'd have to read every page! An index in a database works the same way as the index at the back of a textbook.
- It's a special lookup table that the database search engine can use to speed up data retrieval.
- Instead of searching through every row in the table (a "full table scan"), the database can use the index to find the location of the data directly.
- Indexes are usually created on columns that are frequently used in searches, like names, IDs, or dates.
Did you know? Databases automatically create an index on the primary key, because it's the most common way to look up a record!
Oops! The Undo Button: Rollback
What happens if you're in the middle of a critical process and the power goes out? Think about a bank transfer: money is taken from your account, but before it's put into your friend's account, the system crashes. You wouldn't want your money to just disappear!
This is where transactions and rollbacks come in. A transaction is a sequence of database operations that must be completed as a single, all-or-nothing unit.
Purpose of a Rollback
A rollback is a command that undoes all the changes made by a transaction that was not successfully completed. Its main purposes are:
- To Undo Changes: If any part of a transaction fails (or if the user cancels it), a rollback is issued to return the database to the state it was in before the transaction started.
- To Ensure Data Consistency: It prevents the database from being left in an incomplete or inconsistent state. In our bank example, the rollback would put the money back into your account, as if the transfer never happened.
- To Maintain Integrity: It protects the integrity of the data by ensuring that only complete, successful operations are permanently saved.
Key Takeaway
A rollback is the ultimate "undo" button for a database transaction. It's a safety net that ensures your data stays consistent and correct, even when things go wrong.