IT 9626 Study Notes: Database and File Concepts (Topic 10)
Hello future IT expert! This chapter, Database and File Concepts, is crucial for both your theoretical knowledge (Paper 1) and your practical skills (Paper 2), especially for data management tasks.
Don't worry if terms like 'Normalisation' sound complicated—we're going to break them down using simple analogies. By the end of these notes, you'll understand how to design and manage robust, efficient databases, which is a key skill for any organization!
10.1 Creating a Database: The Building Blocks
A database is essentially an organized collection of structured data, typically stored electronically. Before you build one, you need to understand the data fields.
Data Types and Field Size
Choosing the correct data type and size ensures data integrity (accuracy) and saves storage space.
- Text/Alphanumeric: Used for names, addresses, product codes. Alphanumeric allows letters, numbers, and symbols.
- Numeric: Stores numbers only.
- Integer: Whole numbers (e.g., quantity of items).
- Decimal (or Float/Real): Numbers with fractional parts (e.g., price $19.99).
- Date/Time: Stores calendar dates and times (e.g., date of birth, order time).
- Boolean/Logical: Stores only two values (Yes/No, True/False, 0/1). Example: IsPaid (Yes/No).
The Field Size defines the maximum number of characters or bytes allowed. If a student ID is always 6 characters, setting the field size to 6 prevents accidental long entries and saves storage.
Understanding Relationships
Databases often consist of multiple tables linked together—this is a Relational Database. The links are called relationships:
- One-to-One (1:1): One record in Table A relates to exactly one record in Table B. Example: A single employee might have one company parking spot assigned.
- One-to-Many (1:M): One record in Table A relates to multiple records in Table B. This is the most common type. Example: One customer can place many orders.
- Many-to-Many (M:M): Multiple records in Table A relate to multiple records in Table B. (These require a third 'linking' table, often called an Intersection Table or Junction Table, to resolve them into two 1:M relationships). Example: Many students can take many courses.
Quick Tip: Think of the relationship direction. Customer (One) -> Order (Many).
The Function of Key Fields
Keys are fields used to identify records uniquely and link tables.
- Primary Key (PK): A field (or combination of fields) that uniquely identifies each record in a table. It must contain unique, non-null values. Example: EmployeeID.
- Foreign Key (FK): A field in one table that refers to the Primary Key in another table. It forms the link between the tables. Example: In the Orders table, the CustomerID is the Foreign Key linking back to the Customer table.
- Compound Key: A primary key made up of two or more fields where the combination of values is unique, but individual values may not be unique. (Used interchangeably with Composite Key in many contexts.) Example: A timetable might use (Day + Time) as a unique identifier for a lesson slot.
- Composite Key: Often used synonymously with Compound Key, referring to a primary key composed of multiple attributes.
Referential Integrity and Its Importance
Referential Integrity is a set of rules that ensures the relationships between tables remain consistent and valid.
- It prevents you from entering data into a Foreign Key field if that value doesn't already exist as a Primary Key in the linked table. Example: You cannot place an order for a CustomerID (FK) that doesn't exist in the Customer table (PK).
- It prevents accidental deletion of records that other records depend on. Example: If a customer (PK) has placed 10 orders, referential integrity might stop you from deleting the customer record until those orders are deleted or reassigned.
Key Takeaway: Referential integrity keeps your links reliable and prevents "orphan records" (data without a parent reference).
Flat File vs. Relational Database
The structure of your data depends on complexity and need:
| Feature | Flat File | Relational Database |
| Structure | Single table or file. | Multiple interconnected tables (relations). |
| Data Redundancy | High (data repeated in many records). | Low (data stored once, linked via keys). |
| Complexity | Simple to set up, suitable for small data sets. | Complex to design, robust for large systems. |
| Suitability | Simple mailing lists, single-purpose data storage. | Transactional systems, payroll, large businesses. |
A Flat File might be more appropriate for very small, non-critical data sets where quick setup is prioritised, but a Relational Database is superior for data integrity, reduced redundancy, and complex querying.
Entity Relationship Diagrams (ERDs)
An ERD is a visual map showing the structure of a database, including entities (tables) and the relationships between them.
- Conceptual ERD: High-level, abstract view of entities and relationships, independent of any specific DBMS software.
- Logical ERD: Defines entities, attributes (fields), and primary/foreign keys, but still doesn't specify data types or field sizes (yet).
- Physical ERD: The detailed design used for implementation, specifying all tables, fields, data types, indexes, and keys.
10.2 Normalisation to Third Normal Form (3NF)
Normalisation is the process of structuring a relational database in order to minimize data redundancy and improve data integrity. We strive to reach Third Normal Form (3NF).
The Normal Forms (UNF, 1NF, 2NF, 3NF)
Don't worry if this feels tricky! Normalisation is just tidying up your data step-by-step.
1. Unnormalised Form (UNF):
- Characteristics: Contains repeating groups of data or multiple values in a single cell. It looks like a messy spreadsheet.
2. First Normal Form (1NF):
- Requirements:
- Eliminate repeating groups (i.e., each cell must contain only one value—data must be atomic).
- Identify a Primary Key.
- How to achieve it: Separate repeating data into a new table and link it back to the original using the primary key.
3. Second Normal Form (2NF):
- Requirements:
- Must already be in 1NF.
- No Partial Dependency: All non-key attributes must depend on the entire Primary Key. (This rule only matters if you have a Compound Key).
- How to achieve it: If a non-key field depends on only *part* of a compound key, move that partial dependency into a new table.
4. Third Normal Form (3NF):
- Requirements:
- Must already be in 2NF.
- No Transitive Dependency: No non-key attributes can depend on another non-key attribute.
- How to achieve it: If a non-key field determines the value of another non-key field, move both into a new table. Example: If the PostCode determines the City, you would separate PostCode and City into a separate table.
Advantages of Normalisation (to 3NF):
- Reduces data redundancy (less storage space required).
- Improves data integrity (data is only entered once).
- Simplifies querying and sorting.
- Makes maintenance (updates, insertions, deletions) easier and safer.
Disadvantages of Normalisation:
- Can be more complex to set up.
- Queries sometimes take longer because the system has to link (join) multiple tables to retrieve simple data.
10.3 Data Dictionary
A Data Dictionary (or data definition) is a central repository of information about data, such as meaning, relationships to other data, origin, usage, and format. It describes the structure of the database, not the actual data itself.
Components of a Data Dictionary
When you create a data dictionary for a field, you must include:
- Field Name: (e.g., StudentID, FirstName)
- Data Type: (e.g., Text, Integer, Date, Currency, Boolean)
- Field Size: (e.g., 6 characters, 8 bytes)
- Description/Purpose: (e.g., Unique identifier for the student)
- Validation Rules: (e.g., >0, IsNotNull)
- Format: (e.g., ###-##, DD/MM/YYYY)
- Index Status: (Yes/No, Duplicates OK/No Duplicates)
Did you know? Creating a comprehensive data dictionary before building the database is essential for team projects, ensuring everyone uses the data correctly.
10.4 File and Data Management
File Types and Access Methods
We need to understand how files are stored and accessed, especially large data files.
1. File Formats:
- Generic File Formats: Non-proprietary formats that can be read by a wide variety of software and systems. Example: CSV, TXT, PDF, JPEG.
- Proprietary File Formats: Owned and controlled by a specific company (e.g., Adobe .psd, Microsoft .docx). They offer specific features but require the vendor's software.
- Open-Source File Formats: Formats where the specifications are publicly available, allowing anyone to develop software to use them. These are often preferred for interoperability. Example: ODF, PNG.
2. File Access Methods:
- Sequential Access: Data is accessed in the order it was written (like tracks on a tape). To reach the 100th record, you must read the first 99. Used for batch processing, like calculating monthly utility bills.
- Direct File Access (Random Access): Data can be retrieved immediately without reading prior records. This uses an address or key. Used for real-time systems, like retrieving a specific customer's bank balance.
- Indexed Sequential Access: A compromise. Records are stored sequentially, but an index (a lookup table) is maintained to allow faster direct access. If you need all records, you read sequentially; if you need one, you use the index.
Database Management Systems (DBMS)
A DBMS is software that handles the creation, maintenance, and use of a database. We need to know the different models:
- Relational DBMS (RDBMS): The standard model (covered in 10.1). Data is stored in tables linked by keys. (Most common today).
- Hierarchical DBMS: Data is structured in a tree-like manner (parent-child relationship). If the parent node is deleted, all child nodes are lost. Fast, but rigid.
- Network DBMS: Similar to Hierarchical, but allows a record to have multiple parent records (more flexible connections).
- Object-Oriented DBMS: Stores data as objects, incorporating both the data and the methods (instructions) to manipulate that data.
Advantages of RDBMSs: Reduced redundancy, consistent data, easier data manipulation via SQL.
Management Information System (MIS)
An MIS is a system used by organisations to collect data from various sources and present it in reports and summaries for decision-making.
- Features of an MIS:
- Data collection and integration from operational systems.
- Data reporting, often using dashboards, charts, and summaries.
- Decision-support tools (e.g., forecasting, what-if analysis).
- Use of an MIS by Organisations:
- Monitoring sales performance (Are we hitting targets?).
- Tracking inventory and supply chain efficiency.
- Financial reporting and budgeting.
Quick Review: Practical Database Tasks (10.1 continued)
Queries (Searching and Retrieval)
Queries allow you to extract specific information from the database.
- Simple Query: Uses a single criterion (e.g., Show me all students from London).
- Complex Query: Uses multiple criteria linked by Boolean operators (AND, OR, NOT). (e.g., Show me students older than 18 AND who study IT).
- Nested Queries: One query is run based on the results of another query.
- Summary Queries (Cross-tab/Pivot Tables): Used to summarise data by grouping and performing calculations (SUM, COUNT, AVG) across rows and columns. Example: Total sales per region, broken down by month.
Parameters:
- Static Parameter Query: The criterion value is fixed and defined when the query is designed (e.g., always searching for 'UK').
- Dynamic Parameter Query: The user is prompted to enter the criterion value each time the query is run (e.g., Enter the Region: [User Input]).
Forms and Reports
Forms and reports are the primary ways users interact with the data.
- Forms: Used for inputting, viewing, and editing data. Design requires appropriate font styles and sizes, good spacing (use of white space), and appropriate controls (radio buttons, drop-down menus, form controls) for usability. They can include linked subforms to show related data (e.g., Customer form showing a subform of all their Orders).
- Reports: Used for presenting data outputs, often grouped (e.g., grouping sales by department) and including calculated controls (total sums, averages). Reports are designed for printing or screen viewing.
Data Entry Checks (Validation and Verification)
Ensuring data quality is essential in a database.
- Validation Rules: Automated checks performed by the system to ensure data is *reasonable* and *adheres to rules*.
- Examples: Presence check (field is not empty), Range check (number is between 1 and 100), Format check (matches a defined pattern like DD/MM/YY), Lookup check (value matches a list of approved values).
- Verification: Checks performed manually or through redundancy to ensure data is *accurate* (correctly copied or transcribed).
- Examples: Double data entry (entering data twice and comparing), Visual checking (a person checks the source document against the entered data).