Unit 1: Data and Databases – Comprehensive Study Notes

Welcome! This chapter is the foundation of modern Information Technology. Everything you use online—from your school registration system to streaming services like Netflix—relies on databases. Don't worry if concepts like 'keys' and 'relationships' sound complex; we will break them down into simple, manageable steps. By the end of these notes, you’ll understand exactly how vast amounts of raw data are turned into useful, structured information!

Section 1: The Basics – Data, Information, and Structures

1.1 Data versus Information

In IT, these two terms are often confused, but the difference is critical.

  • Data: Raw, unprocessed facts and figures. Data has no meaning on its own. Example: The number "175," the name "Sarah," or the date "2023-01-15."
  • Information: Data that has been processed, organized, structured, or presented in context, making it meaningful and useful. Example: "Student ID 175 belongs to Sarah, who achieved an A grade in the exam taken on 2023-01-15."

Analogy: Think of data as ingredients (flour, sugar, eggs). Information is the finished cake (the useful product). Processing is the baking.

1.2 Database Structures: Flat File vs. Relational

A database is an organized collection of structured data, typically stored electronically. How that data is structured determines the type of database system used.

1. Flat File Database:

  • All data is stored in a single table or file.
  • It is simple and easy to set up for small amounts of data.
  • Limitation: Leads to high data redundancy (repeated data) and makes updating information difficult and error-prone.
  • Example: A simple contacts list in a spreadsheet program like Excel, where you repeat the company address for every employee.

2. Relational Database:

  • Data is divided into multiple, separate tables, which are then linked together by shared fields (keys).
  • This is the standard for almost all modern, large systems.
  • Advantage: Minimizes data redundancy, improves data integrity (accuracy), and allows for complex queries and reporting.
  • Example: A database for a school where student details are in one table, course details in another, and their enrolment link is in a third table.
Quick Review:

The main purpose of moving from a Flat File to a Relational Database is to reduce data redundancy (stops us typing the same thing over and over) and improve data integrity (makes sure the data is accurate).

Section 2: The Anatomy of a Table

To build a database, we use specific terminology to describe the pieces of data inside a table.

2.1 Fields, Records, and Tables

Imagine a table storing student details.

  1. Table (File / Entity): The complete collection of related data. This is the main structure. Example: The 'Students' Table.
  2. Field (Attribute): A column in the table. It describes a single characteristic of the entity. Every record must have a value for this field. Example: 'FirstName', 'DateOfBirth', 'StudentID'.
  3. Record (Tuple): A row in the table. It represents a complete set of data about one item, person, or event. Example: All the information (Name, DOB, ID) relating to student Jane Doe.

Memory Aid: A Field runs up and down (like a column). A Record runs left and right (like a row).

2.2 Data Types

Every field must be assigned a data type. This tells the database what kind of information is expected in that column, which helps with validation and storage efficiency.

  • Text / String: Used for names, addresses, or codes that won't be used in calculations (even if they contain numbers, like a product code).
  • Numeric / Integer / Real: Used for numbers that need to be calculated (e.g., quantity, price, age).
  • Boolean / Logical: Stores only one of two values: True/False, Yes/No, 1/0. Example: IsAdmin? (Yes/No).
  • Date/Time: Stores dates and times in a standard format, allowing for sorting and calculations (e.g., calculating age).

Did you know? Using the correct data type is essential for saving disk space and speeding up database operations. Storing an ID number as Text when it should be Numeric is a common beginner mistake!

Section 3: The Power of Relationships – Keys

This is the core concept of a relational database. Keys are special fields that uniquely identify records and link tables together.

3.1 Primary Keys (PK)

The Primary Key is the ultimate identifier.

  • A field (or combination of fields) that uniquely identifies every single record in a table.
  • It must be unique (no two records can have the same PK value).
  • It cannot contain null values (i.e., it cannot be empty).
  • Example: In a 'Students' table, the 'StudentID' is the perfect Primary Key.

Don't worry if this seems tricky at first: The Primary Key is like your national ID card number—it is the one thing that differentiates you from every other person in the country.

3.2 Foreign Keys (FK)

The Foreign Key is the link between tables.

  • A Foreign Key is a field in one table that refers to the Primary Key in another table.
  • It establishes the relationship between the two tables.
  • The values in the Foreign Key field must match a value that already exists in the related Primary Key field.
  • Example: If the 'Courses' table has 'CourseID' as its PK, we include 'CourseID' in the 'Students' table to show which course the student is enrolled in. In the Students table, 'CourseID' is the Foreign Key.
3.3 Understanding Relationships

Relationships define how data flows between tables using Primary and Foreign Keys. The most important relationships are:

  1. One-to-One (1:1): One record in Table A relates to exactly one record in Table B, and vice-versa. (Rarely used in practice for core data). Example: A Person table linked to a Passport Details table (one person has one passport).
  2. One-to-Many (1:M): One record in Table A can relate to zero, one, or many records in Table B. This is the most common and essential relationship. Example: One Teacher (Table A) teaches Many Students (Table B). The Primary Key of the Teacher table is used as the Foreign Key in the Students table.
  3. Many-to-Many (M:M): Many records in Table A can relate to many records in Table B. (This requires a third, 'linking' or 'junction' table to function correctly). Example: Many Students enrol on Many Courses.
Key Takeaway on Keys:

PK (Primary Key) = Unique ID for THIS table.
FK (Foreign Key) = The link, a copy of another table’s PK.

Section 4: Ensuring Data Quality (Integrity, Validation, Verification)

For data to be useful (to become information), it must be trustworthy. This is achieved through strict controls over quality.

4.1 Data Integrity and Redundancy
  • Data Integrity: The overall accuracy, completeness, and consistency of data. If the data is correct and reliable, it has high integrity.
  • Data Redundancy: Storing the same piece of data multiple times (e.g., typing the college address 100 times for 100 students). Redundancy severely lowers integrity because if you update the address in one place, you might forget the other 99. Relational databases are designed to eliminate this.
4.2 Validation (Making sure data is sensible)

Data Validation checks that the data entered is reasonable, sensible, and meets specific rules. It does not check if the data is factually correct.

Common Validation Checks (Step-by-Step):

  1. Range Check: Ensures a number or date falls within acceptable limits. Example: An 'Age' field must be between 18 and 65.
  2. Format Check (Picture Check): Ensures the data follows a specific pattern. Example: A UK Postcode must follow the format LLNN NLL (where L=Letter, N=Number).
  3. Presence Check: Ensures the field is not left blank (must contain data). Example: The 'Surname' field is required.
  4. Length Check: Ensures the input is the correct number of characters (minimum or maximum). Example: A 'Password' must be at least 8 characters long.
  5. Look-up Check (or Restricted Choice): Compares input against a pre-approved list of values. Example: The 'Department' field must be one of 'IT', 'Marketing', or 'HR'.
4.3 Verification (Making sure data is correct)

Data Verification checks that the data entered into the system matches the original source data exactly.

  • Visual Check (Proofreading): A person checks the data entered on screen against the source document.
  • Double Entry: The data is typed in twice by either the same person or two different people. The computer compares the two entries, and if they don't match, the user is prompted to check again.

Common Mistake to Avoid: A Range Check (Validation) confirms an age is between 1 and 100. Double Entry (Verification) confirms that if the user typed "42," the computer received "42." Validation checks the rules; Verification checks the typing accuracy.

Section 5: Database Management Systems (DBMS)

A Database Management System (DBMS) is the software that allows users and other applications to interact with the database. Examples include Microsoft Access, MySQL, and Oracle.

5.1 The Role of the DBMS

The DBMS does a lot of heavy lifting behind the scenes. Its main functions include:

  • Data Definition: Defining the structure of the database (creating tables, defining fields and data types, setting up relationships).
  • Data Manipulation: Allowing users to insert new data, modify existing data, delete data, and retrieve data (using queries like SQL).
  • Data Security: Managing user accounts, permissions, and access rights (e.g., ensuring a student can only view their own grades, not everyone else’s).
  • Data Integrity Management: Enforcing the rules and constraints you set up (like Primary Key uniqueness and referential integrity—ensuring Foreign Keys point to valid Primary Keys).


The DBMS acts as a powerful security guard and librarian, making sure only the right people access the right books, and that the books are shelved correctly.

Chapter Wrap-up and Final Check

You have successfully navigated the foundational concepts of databases! If you understand the difference between a Primary Key (PK) and a Foreign Key (FK), and why we link tables (to reduce redundancy), you are ready for the next stage.

  • Remember the goal: Turn raw data into useful information.
  • Crucial Distinction: Validation (Is the data sensible?) vs. Verification (Is the data typed correctly?).

Keep practising these definitions, and soon, you'll be building your own efficient relational systems!