✨ Advanced Data Structures: Graphs – Study Notes

Welcome to the exciting world of Graphs! You’ve already mastered linear structures like Arrays, Lists, Stacks, and Queues, and now we are moving onto structures that handle much more complex, non-linear relationships.

Graphs are one of the most powerful concepts in Computer Science, essential for modelling real-world problems from social networks to navigation systems. Don't worry if this seems tricky at first—we will break down the terminology and methods step-by-step!

1. Defining the Graph Data Structure

A graph is a data structure used to represent complex relationships or connections between objects.

Think of a graph like a map of a city's transport system.

1.1 Core Terminology
  • Vertex (or Node): This is an individual entity or point on the graph.
    (Analogy: A city or a bus stop on the map.)
  • Edge (or Arc): This is the connection or relationship between two vertices.
    (Analogy: The road or bus route connecting two cities.)
  • Graph: The collection of all the vertices and edges.
1.2 Types of Graphs

Graphs come in several flavours, defined by the properties of their edges:

a) Weighted vs. Unweighted Graphs
  • Weighted Graph: Each edge has an associated value, or weight (often called cost).
    (Example: If the vertices are cities, the weight might be the distance, travel time, or cost of the flight between them.)
  • Unweighted Graph: All edges are considered equal (or have a weight of 1). The goal is usually just finding the fewest number of steps.
b) Directed vs. Undirected Graphs
  • Undirected Graph: Edges only show that a connection exists, but not the flow of movement. If Vertex A is connected to Vertex B, you can travel from A to B, and B to A.
    (Example: A country road that allows two-way traffic.)
  • Directed Graph (Digraph): Edges have an arrow or direction, meaning travel is only possible in one direction. If there is an edge A → B, you can only go from A to B, not B to A (unless a separate edge B → A exists).
    (Example: A one-way street or a social media follower relationship.)

Key Takeaway: Graphs model relationships. Vertices are items, Edges are connections. Weights add cost, and Direction adds flow control.

2. Representing Graphs in Memory

To store a graph structure inside a computer, we typically use one of two main methods: the Adjacency Matrix or the Adjacency List.

2.1 Adjacency Matrix

The adjacency matrix uses a two-dimensional array to store the connections between vertices.

  • If the graph has \(N\) vertices, the matrix will be \(N \times N\).
  • The entry at position (i, j) represents the edge from Vertex \(i\) to Vertex \(j\).
  • For Unweighted Graphs: (i, j) is typically 1 (connected) or 0 (not connected).
  • For Weighted Graphs: (i, j) stores the weight (cost) of the edge. If there is no connection, we use 0 or a special value like infinity.

Did you know? In an undirected graph, an adjacency matrix is always symmetric (the value at (i, j) equals the value at (j, i)).

2.2 Adjacency List

The adjacency list uses an array (or list) where each index corresponds to a vertex. Each index then stores a list of its adjacent neighbours.

Example: If Vertex A is connected to B and C, the entry for A in the main list would contain a list holding [B, C]. For weighted graphs, this list would store pairs: [(B, 5), (C, 10)], where the second number is the weight.

2.3 Comparing Matrix vs. List (Crucial for Exams!)

Choosing the right representation depends heavily on the graph's properties and what operations are performed most often.

Adjacency Matrix Advantages (O(1) lookups)
  • Testing Edge Presence: Checking if an edge exists between two specific vertices is extremely fast (just check one array position). This is frequently needed.
  • Adding/Removing Edges: Easy and quick—just change one cell value.
  • When to use: When the graph is dense (meaning it has many edges relative to the number of vertices).
Adjacency List Advantages (Space Efficiency)
  • Space Efficiency: It only stores actual edges. If the graph is sparse (meaning it has few edges), the list saves a huge amount of memory compared to the matrix (which always uses \(N^2\) space, regardless of how few edges exist).
  • When to use: When the graph is sparse (few edges relative to vertices).
  • Finding Neighbours: Efficient for finding all neighbours of a specific vertex.
Quick Review: Matrix vs. List

Matrix: Best for DENSE graphs and checking if an edge exists (fast presence test).
List: Best for SPARSE graphs (saves memory).

3. Graph Traversal Algorithms

Traversal means systematically visiting every vertex in the graph. We need methods to ensure we don't get lost or visit the same vertex twice.

3.1 Breadth-First Search (BFS)

BFS is like dropping a pebble in a pond—it explores all vertices at the current depth (or distance) before moving to the next level.

  • Strategy: Uses a Queue (FIFO) to manage the order of vertices to visit.
  • Process:
    1. Start at a source vertex (A). Add A to the queue and mark it as visited.
    2. While the queue is not empty:
    3. Dequeue the current vertex (V).
    4. Find all unvisited neighbours of V, mark them as visited, and Enqueue them.
  • Typical Application: Finding the shortest path in an unweighted graph (since it checks level by level, the first time it finds the destination, it guarantees the minimum number of edges).
3.2 Depth-First Search (DFS)

DFS is like going down one corridor in a maze as far as you can until you hit a dead end, then backtracking to try a different path.

  • Strategy: Uses a Stack (LIFO), or more commonly, recursion, to manage the process.
  • Process:
    1. Start at a source vertex (A). Mark A as visited.
    2. Explore a neighbour (B) of A.
    3. Recursively apply DFS on B.
    4. If B hits a dead end, backtrack to A and try A’s next unvisited neighbour.
  • Typical Application: Navigating a maze or checking connectivity (whether all parts of a graph are reachable).

Tricky Point Alert: Both BFS and DFS require a mechanism (like a Boolean array) to track which nodes have been visited. This prevents infinite loops in graphs that contain cycles (paths that lead back to themselves).

4. Shortest Path Algorithms (Dijkstra's)

What if the graph is weighted? Now, just counting edges isn't enough; we need to minimise the total weight (cost). This is where Dijkstra's algorithm comes in.

4.1 Dijkstra's Shortest Path Algorithm

Dijkstra’s algorithm finds the shortest path from a single starting vertex to all other vertices in a weighted graph (as long as those weights are non-negative).

  • Purpose: To find the minimum total weight (cost) required to travel from a starting node to every other node.
  • Application: Used extensively in network routing (like GPS and internet traffic routing) to calculate the cheapest or fastest route.

Important Note for Students (Syllabus 3.11.1):
You must be able to trace Dijkstra's shortest path algorithm using a set of data.
However, you are not expected to recall the steps or write program code to implement it, unless the algorithm steps are explicitly given to you in the exam question. Focus on understanding the *concept* of accumulating minimum costs.

Key Takeaway: BFS finds shortest paths in unweighted graphs (fewest edges). Dijkstra's finds shortest paths in weighted graphs (lowest cost).

Common Mistake to Avoid

Do not use BFS to find the shortest path on a weighted graph. BFS only guarantees the path with the fewest edges, not the lowest total cost. You must use Dijkstra's algorithm for weighted shortest path problems.