Welcome to the World of Relational Databases!
Databases are the backbone of almost every modern business, from your favourite streaming service to your school's records system. This chapter dives into Relational Databases, which are all about organising data efficiently and logically so that computers can retrieve information quickly and reliably.
Don't worry if terms like 'normalisation' sound complicated—we'll break them down using simple analogies. By the end of this section, you'll understand how data models are built, how tables link together, and how to talk to a database using the powerful language called SQL. Ready to become a data architect? Let’s go!
1. Conceptual Data Models and Entity Relationship Modelling
What is an Entity?
An Entity is simply a real-world object or concept about which we want to store data. Think of it like a major noun in your database structure.
Examples: CUSTOMER, PRODUCT, ORDER, COURSE.
Attributes (Fields)
Attributes are the characteristics or properties of an entity. These are the pieces of information you actually store.
In an implemented database (the real software), attributes are called fields or columns.
Example: The STUDENT entity might have attributes like: StudentID, Name, DateOfBirth.
Entity Identifier (Primary Key)
The Entity Identifier (or Primary Key, PK) is one or more attributes that uniquely identify every record (row) within the entity (table).
In data modelling, we indicate the entity identifier by underlining the attribute.
Example: The full entity description is written as:
STUDENT(StudentID, Name, DateOfBirth)
Sometimes, a unique identifier must be formed by combining several attributes. This is called a Composite Entity Identifier or Composite Primary Key.
Example: If a class list doesn't use StudentID, you might use (FirstName, LastName, DateOfBirth) to uniquely identify a person.
Degrees of Relationship
Relationships define how entities interact with each other. The degree of relationship tells us how many instances of one entity can relate to how many instances of another entity.
In Entity Relationship Diagrams (ERDs), these are typically shown with specific notation:
One-to-Many (1:M)
An instance in Entity A relates to zero, one, or many instances in Entity B. But, an instance in Entity B relates to only one instance in Entity A.
Analogy: A single HEAD OF DEPARTMENT manages MANY TEACHERS.
Notation Example: HEAD_OF_DEPT --< TEACHER
Many-to-One (M:1)
This is the reverse of One-to-Many.
Analogy: MANY TEACHERS report to ONE HEAD OF DEPARTMENT.
Notation Example: TEACHER >-- HEAD_OF_DEPT
One-to-One (1:1)
An instance in Entity A relates to exactly one instance in Entity B, and vice versa.
Analogy: A PERSON holds ONE PASSPORT. A PASSPORT belongs to ONE PERSON.
Notation Example: PERSON ---- PASSPORT
Many-to-Many (M:M)
An instance in Entity A can relate to many instances in Entity B, and an instance in Entity B can relate to many instances in Entity A.
Note: M:M relationships must be resolved into two 1:M relationships in an actual relational database design (often using a junction table), but you must be familiar with the conceptual model.
Analogy: A STUDENT registers for MANY COURSES. A COURSE is taken by MANY STUDENTS.
Notation Example: STUDENT >--< COURSE
2. Key Concepts of Relational Databases
The Relational Database Concept
A Relational Database is a collection of structured data items organised as a set of formally described tables (relations), from which data can be accessed or reassembled in many different ways without having to reorganise the database tables themselves.
The key idea is that data is stored in separate tables (relations) that are linked together using special attributes called Foreign Keys.
Relating Abstract Terms to Implementation Terms
When discussing abstract data models, we use certain terms. When we talk about an actual implemented database, we use different (but equivalent) terms:
- Relation/Table: A database table where data is stored.
- Attribute/Field: A column heading in the table.
- Entity Identifier/Primary Key (PK): The field(s) that uniquely identify a row.
The Foreign Key (FK)
A Foreign Key is an attribute in one table that refers to the Primary Key of another table. It is the crucial link that establishes relationships between relations (tables).
Memory Aid: A Foreign Key is "foreign" because it belongs to a different table, but it's used here to link back home.
Example: If the TEACHER table has a DeptID (Foreign Key), this DeptID must match an existing DeptID (Primary Key) in the DEPARTMENT table.
3. Database Design and Normalisation Techniques
Why Normalise?
Normalisation is the process of designing a database structure to reduce data redundancy (storing the same data multiple times) and improve data integrity (ensuring data is accurate and consistent).
If you don't normalise, you risk:
- Update Anomalies: Having to change the same piece of information in multiple places. (If the Course name changes, you must update every student's record).
- Insertion Anomalies: Not being able to add new data unless you also have data for the primary key.
- Deletion Anomalies: Deleting one piece of data causing the loss of other, non-related data.
Third Normal Form (3NF)
The syllabus requires you to understand Third Normal Form (3NF) and why databases are normalised. You do not need to differentiate between First, Second, and Third Normal Forms.
A relation (table) is in 3NF if it satisfies certain properties, the most important of which is eliminating Transitive Dependencies.
The Key Property of 3NF:
In a relation in 3NF, every non-key attribute is dependent only on the whole primary key. Essentially, no non-key field should depend on another non-key field.
Example of a problem (not in 3NF):
A table ORDER(OrderID, CustomerID, CustomerName, ItemPrice).
Here, CustomerName depends on CustomerID, not directly on the OrderID (the Primary Key). If CustomerID changes their name, you would have to update this fact for every order they have ever placed. To fix this, we split it into two tables: ORDER and CUSTOMER.
By achieving 3NF, we ensure that the database is robust, flexible, and efficient to update.
4. Structured Query Language (SQL)
SQL is the standard language used to interact with relational databases. It allows you to define data structures (DDL) and manipulate the data (DML).
Data Definition Language (DDL): Defining Tables
You must know how to use SQL to define a database table, specifying keys and data types.
Key Data Types to Know:
- Integer and Real/Float numbers.
- String (text) and Boolean values (True/False).
- Date/Time values.
Example of creating a table:
CREATE TABLE Products (
ProductID INTEGER PRIMARY KEY,
Name STRING,
Price REAL
);
You also need to know how to add a simple Foreign Key constraint using a single field:
CREATE TABLE Orders (
OrderID INTEGER PRIMARY KEY,
CustomerID INTEGER,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
Data Manipulation Language (DML): Managing Data
You must be familiar with the following basic SQL commands:
1. Retrieving Data (SELECT)
This is used to read data from the database.
- SELECT [attributes] FROM [table] WHERE [condition] ORDER BY [attribute]
- Example:
SELECT Name, Price FROM Products WHERE Price > 100.00 ORDER BY Price DESC;
2. Inserting Data (INSERT)
Used to add new records (rows).
- INSERT INTO [table] VALUES (value1, value2, ...);
- Example:
INSERT INTO Products VALUES (101, 'Laptop', 1200.00);
3. Updating Data (UPDATE)
Used to change existing records. ALWAYS use WHERE, or you will update every record!
- UPDATE [table] SET [attribute] = [new value] WHERE [condition];
- Example:
UPDATE Products SET Price = 1250.00 WHERE ProductID = 101;
4. Deleting Data (DELETE)
Used to remove records. Again, ALWAYS use WHERE!
- DELETE FROM [table] WHERE [condition];
- Example:
DELETE FROM Products WHERE ProductID = 101;
Aggregate SQL Functions
These functions perform calculations across a set of rows and return a single summary value. You often use them with GROUP BY to apply the calculation to subsets of data.
- COUNT: Returns the number of rows that match a specified criteria.
- SUM: Returns the total sum of a numeric column.
- AVG: Calculates the average value of a numeric column.
- MIN: Returns the smallest value in a column.
- MAX: Returns the largest value in a column.
Example: SELECT COUNT(StudentID) FROM Students WHERE Course = 'CS';
Example with GROUP BY: SELECT Course, AVG(Score) FROM Results GROUP BY Course;
5. Client-Server Databases
In most large systems, the database is hosted on a central server, accessed by many different users (clients). This setup is known as a Client Server Database System.
Concurrent Access
Concurrent Access means that multiple clients (users) can attempt to access and modify the same data simultaneously. This is great for efficiency, but it introduces a major risk: the Lost Update Problem.
The Lost Update Problem
This problem occurs when two or more users try to update the same record at almost the exact same time, and the updates interfere with each other.
Step-by-step example:
- Client A reads a bank account balance: $100.
- Client B reads the exact same bank account balance: $100.
- Client A calculates a deposit ($100 + $50 = $150) and writes $150 back to the database.
- Client B calculates a withdrawal ($100 - $20 = $80) and writes $80 back to the database.
Result: Client A’s $50 deposit is lost. The final balance is $80, when it should have been $130.
Record Locks
To maintain data integrity (ensuring the data is correct), concurrent access must be controlled using Record Locks.
A record lock prevents other users from accessing or modifying a record while one user is currently processing or updating it.
How it works: When Client A starts editing the bank balance, the system places a lock on that record. Client B must wait until Client A has committed (saved) their change and the lock is released before they can access the record.
6. Big Data
Did you know? We generate more data every two days now than we did from the dawn of civilisation up until 2003!
Big Data is a catch-all term for datasets that are so large or complex that they cannot be easily stored, processed, or analysed using traditional database methods (like the relational databases we just discussed).
The Three Vs of Big Data
Big Data is typically defined by three major characteristics:
- Volume: The sheer amount of data is too big to fit onto a single server or be processed by traditional tools.
- Velocity: The data is often streaming in very quickly (e.g., stock market prices, social media feeds), requiring real-time response (milliseconds to seconds).
- Variety: The data comes in many forms (structured, unstructured text, videos, sensor readings, etc.). This lack of uniform structure makes relational tables unsuitable.
Distributed Processing and Functional Programming
Since the data volume is too large for one server, processing must be distributed across multiple machines.
Functional Programming (FP) languages (which you may study later) are often used to process Big Data because they are excellent for writing correct and efficient distributed code. This is because FP relies on:
- Immutable Data Structures: Data cannot be changed after creation, preventing concurrency problems across servers.
- Statelessness: Functions don't rely on or change external state, making them easy to run independently on different servers.
- Higher-Order Functions (Map-Reduce): Techniques like Map-Reduce (mapping a function to all elements, then reducing the results into a single value) are used to combine results from processing done on many different servers.
Data Models for Big Data (Non-Relational)
Because Big Data often lacks strict structure, non-relational models are needed.
Fact-Based Model
In this model, every single piece of information is captured as a single fact.
Example: Instead of a row containing (Name, Age, Address), you store three separate facts: (Name: John), (Age: 30), (Address: London).
Graph Schema (Nodes, Edges, Properties)
Graph databases are used to represent complex relationships (like social networks or recommendation systems).
- Node (Entity): Represents an individual entity (shown as an oval).
- Edge (Relationship): A solid line connecting two nodes, labelled with text describing the relationship (e.g., 'Likes', 'Is Friends With').
- Properties (Attributes): Characteristics of the entity (shown as rectangles attached to the oval).
Think of Facebook: You are a Node. Your city is a Property. The relationship "Is Friends With" is an Edge linking you to another Node.