Unit 4: Database Solutions – Comprehensive Study Notes
Welcome to the fascinating world of databases! Don't worry if this topic feels a bit technical at first. Databases are simply organised electronic filing cabinets, and mastering them is one of the most valuable skills in Information Technology today.
In this chapter, we will learn how to structure, manage, and design robust database systems, ensuring data is accurate, efficient, and easy to retrieve. Let’s get started!
1. Understanding Data and Database Structures
1.1 Data versus Information
Before diving into databases, it's essential to understand the difference between raw facts and usable knowledge.
- Data: Raw, unprocessed facts and figures. Data has no meaning on its own.
Example: 90210, J. Smith, 45 - Information: Data that has been processed, organised, structured, or presented in a given context to make it meaningful and useful.
Example: The student with ID 90210 is J. Smith, who scored 45 marks in the exam.
1.2 Flat File vs. Relational Databases
How we structure our data determines how efficient and robust our system is.
1. Flat File Database
A flat file database is essentially one big table or spreadsheet. All the data is stored in a single structure.
- Pros: Simple to set up for small, basic needs.
- Cons: Highly prone to data redundancy (storing the same information multiple times) and data inconsistency (if one copy is updated but another isn't).
- Analogy: Imagine writing every detail about every student, including their full address, repeatedly on every single report card they receive.
2. Relational Database (RDB)
A relational database uses multiple tables (or relations) linked together by common fields. This is the industry standard for complexity and scalability.
- Pros: Reduces redundancy, ensures data consistency, flexible, and powerful.
- Cons: More complex to design and set up initially.
- Analogy: Instead of repeating the full address on every report card, you store the address once in an "Address Table" and just link the report card to that specific address using a unique ID number.
Flat File = One big table.
Relational = Many linked tables (better!).
2. Building Blocks of a Relational Database
Let’s define the key terms we use when referring to the parts of a table (or relation).
2.1 Entities, Attributes, Records, and Fields
- Entity: A real-world object, person, place, or event about which data is stored. In database terms, an entity becomes a Table.
Example Entities: STUDENT, COURSE, TEACHER. - Attribute: A characteristic or property of an entity. In database terms, an attribute becomes a Column or Field header.
Example Attributes for the STUDENT entity: Name, Date of Birth, ID Number. - Record (Tuple): A complete set of attributes for a single instance of the entity. A Row in the table.
Example: One specific student’s complete details (Name, DOB, ID, etc.). - Field: The intersection of a record and an attribute. The smallest piece of data stored.
2.2 Understanding Database Keys
Keys are critical! They are special fields used to uniquely identify records and establish links between tables.
1. Primary Key (PK)
A field (or combination of fields) that uniquely identifies every record in a table.
- It must be unique for every record.
- It cannot be Null (empty).
- Example: A Student ID Number.
2. Foreign Key (FK)
A field in one table that refers to the Primary Key in another table. The Foreign Key is the mechanism used to create the link (relationship) between tables.
- A Foreign Key does not need to be unique within its own table.
- Example: In the ENROLMENT table, the StudentID field is a Foreign Key, linking back to the Student table's Primary Key.
3. Composite Key
A Primary Key that is made up of two or more attributes combined. This is often necessary when no single attribute can uniquely identify a record.
- Example: In a table tracking exam scores, the primary key might be the combination of (StudentID + CourseID). Neither field alone is unique, but the combination is.
Primary Key = Protector (Unique Identifier).
Foreign Key = Follower (Link to the Protector).
3. Relationships and Referential Integrity
Relationships define how entities connect, and integrity rules ensure those connections remain valid.
3.1 Types of Relationships
The type of relationship dictates where you place the Foreign Key.
1. One-to-One (1:1)
- A single record in Table A relates to only one record in Table B, and vice versa.
- Example: A single passport is issued to one person, and that person only has one valid passport. (PERSON 1:1 PASSPORT)
- Implementation: The foreign key can be placed in either table.
2. One-to-Many (1:M)
- A single record in Table A relates to one or more records in Table B. This is the most common type.
- Example: One customer can place many orders. (CUSTOMER 1:M ORDER)
- Implementation: The foreign key goes into the "Many" side table (e.g., CustomerID goes into the ORDER table).
3. Many-to-Many (M:N or M:M)
- A record in Table A relates to many records in Table B, and a record in Table B relates to many records in Table A.
- Example: A student can enrol in many courses, and a course has many students. (STUDENT M:N COURSE)
- Crucial Step: M:N relationships cannot be implemented directly. They must be broken down into two 1:M relationships using an intermediate table, often called a Junction Table or Linking Table.
In the example above, the linking table would be ENROLMENT.
Don't worry if M:N seems tricky. Just remember the rule: M:N relationships require an extra table in the middle!
3.2 Referential Integrity
This is a rule set by the Database Management System (DBMS) to ensure that relationships between tables are valid and that you don't accidentally create errors by deleting or changing data.
- Definition: Referential Integrity ensures that every value in a Foreign Key column corresponds to an existing value in the Primary Key column it references.
- What it prevents:
- You cannot add an Order record (child table) if the CustomerID (Foreign Key) doesn't exist in the Customer table (parent table).
- You cannot delete a Customer (parent record) if there are still active Orders linked to that Customer (child records). This prevents orphan records (records that reference a parent that no longer exists).
4. Achieving Quality Design: Normalization
Normalization is the systematic process of organising the tables and fields in a relational database to minimise redundancy and dependency. It leads to a much more stable and reliable database.
4.1 The Problems Normalization Solves
If a database is not normalized, it suffers from redundancy, leading to three types of anomalies:
- Insertion Anomaly: Cannot add new data unless related data is also available.
Example: Can't add a new course until a student enrols in it. - Deletion Anomaly: Deleting one piece of data accidentally deletes other, unrelated data.
Example: Deleting the last student enrolled in a course might also delete all the descriptive information about the course. - Update Anomaly: Having to update the same information in multiple places, risking inconsistency.
Example: If a department name is stored five times, you must update all five occurrences if the name changes.
4.2 The Normal Forms (1NF, 2NF, 3NF)
We aim for Third Normal Form (3NF) for most practical business applications.
Step 1: First Normal Form (1NF)
A table is in 1NF if:
- Every attribute (field) contains atomic (single, indivisible) values.
Example: Separate the Address field into Street, City, and Postal Code. - There are no repeating groups of data.
Example: Do not have Course1, Course2, Course3 as separate columns in the Student table. Instead, move courses to a separate, linked table.
Step 2: Second Normal Form (2NF)
A table is in 2NF if:
- It is already in 1NF.
- All non-key attributes are fully dependent on the entire Primary Key. (This rule only applies if the table uses a Composite Key.)
The Issue to Fix (Partial Dependency): If you have a composite key (A+B), and attribute C only depends on A (and not B), this is a partial dependency. You must move C into a separate table where A is the PK.
Example: If the PK is (StudentID, CourseID), and the field StudentName only depends on StudentID, not the CourseID, then StudentName must be moved to the separate STUDENT table.
Step 3: Third Normal Form (3NF)
A table is in 3NF if:
- It is already in 2NF.
- There are no transitive dependencies.
The Issue to Fix (Transitive Dependency): A non-key attribute depends on another non-key attribute. If A determines B, and B determines C, then C is transitively dependent on A.
Example: In the STUDENT table, suppose we have fields (StudentID, City, City_Post_Code). StudentID determines City, and City determines the City_Post_Code. Since Post_Code depends on City (another non-key field), this dependency must be moved to a separate POSTAL_CODE table.
1NF: Atomic values, no repeating groups.
2NF: Must be 1NF + No Partial Dependencies (Only if Composite Key).
3NF: Must be 2NF + No Transitive Dependencies.
5. Database Management Systems (DBMS)
5.1 What is a DBMS?
A Database Management System (DBMS) is the software package that allows users and other applications to interact with a database. It handles the storage, organisation, retrieval, and security of data.
- Common DBMS Examples: Microsoft Access, MySQL, Oracle, SQL Server.
5.2 Key Functions of a DBMS
The DBMS performs several vital roles, acting as the intermediary between the user/application and the raw data files.
- Data Definition: Defining the structure (schema) of the database, including creating tables, setting field data types (e.g., Text, Integer, Date), and defining keys.
- Data Manipulation: Allowing users to add, modify, delete, and retrieve data (using languages like SQL).
- Data Security and Integrity: Enforcing rules (like Referential Integrity), managing user access permissions, and ensuring data consistency.
- Data Dictionary (Metadata Management): Storing information about the data itself (e.g., field names, data types, constraints, and descriptions).
- Data Backup and Recovery: Providing tools to save the database structure and data, and restoring it in case of failure.
5.3 The Role of SQL
The primary language used to communicate with a Relational DBMS is Structured Query Language (SQL).
- SQL Purpose: Used to manage and query data in the database.
- DDL (Data Definition Language): Used to define the database structure (e.g., CREATE TABLE, DROP TABLE).
- DML (Data Manipulation Language): Used to manage the data within the structure (e.g., INSERT, UPDATE, DELETE, SELECT).
Did you know? Even complex applications like Instagram or Amazon rely heavily on powerful relational databases (or similar NoSQL structures) to handle billions of pieces of data and ensure fast retrieval times.
You have now mastered the fundamental concepts of designing and structuring relational databases. This knowledge forms the foundation for secure and efficient information systems!