Welcome to Advanced Data Structures: Trees!
Hello future Computer Scientists! You've already mastered basic data structures like arrays, queues, and stacks. Now, we are entering the world of hierarchical data. This chapter on Trees is crucial because trees are one of the most powerful ways to store and organize information for super-efficient searching and sorting.
Don't worry if this seems tricky at first. Think of a tree not just as a mathematical concept, but as a structure you use every day, like a file system on your computer or a family tree. We’ll break down the jargon and focus on the fundamental concepts required by the 9645 syllabus. Let’s get started!
Quick Review: What is a Tree? (The Basic Definition)
Before diving into the specifics, it helps to know that trees are a special type of structure derived from Graphs (which you studied in Section 3.10.1).
A Tree is formally defined as a connected, undirected graph with no cycles.
- Connected: You can get from any node to any other node.
- Undirected: The links (edges) don't have arrows; movement can go both ways.
- No Cycles: There are no loops or closed paths. If you start at one node and follow the edges, you cannot return to that node without retracing your steps.
Did you know? While a general "tree" doesn't have a designated starting point, most trees used in Computer Science (especially in the exam) are Rooted Trees.
4.1 Rooted Trees and Essential Terminology (Syllabus 3.10.2)
4.1.1 Defining a Rooted Tree
A Rooted Tree is a tree in which one specific vertex (node) has been designated as the root. This simple designation creates a clear hierarchy and introduces important relationships.
Key Characteristics of a Rooted Tree:
1. The Root is the only node with no parent.
2. All other nodes are descendants of the root.
3. The structure establishes parent-child relationships between nodes.
Analogy: Think of the CEO or founder at the top of an organization chart. They are the Root, and everyone else descends from them.
4.1.2 Important Tree Terminology
You must be familiar with the following terms:
- Node (or Vertex): An element that stores data within the tree.
- Edge (or Arc): The connection between two nodes.
- Root: The starting node of the tree (has no parent).
- Parent: A node that has branches leading to other nodes immediately below it.
- Child: A node immediately below another node (its parent).
- Leaf Node (or Terminal Node): A node that has no children. These are at the "end" of the branches.
- Subtree: A section of the tree that is itself a tree, rooted at one of the original tree's nodes.
- Ancestor/Descendant: All the nodes on the path from the root to node X are its ancestors. All nodes below node X are its descendants.
Quick Review Box: Rooted Trees
The key takeaway here is hierarchy. Rooted trees establish relationships (parent-child) which are essential for applications like Object-Oriented Programming (OOP) class hierarchies, where Object is often the root class from which all others descend.
4.2 Binary Trees and Binary Search Trees (BST)
4.2.1 Binary Trees (Syllabus 3.10.2)
A Binary Tree is a special type of rooted tree defined by a strict rule on the number of connections:
Definition: A rooted tree in which each node has at most two children.
The two children are typically distinguished as the left child and the right child. If a node only has one child, it must be designated as either left or right.
4.2.2 Binary Search Trees (BSTs) (Syllabus 3.10.2)
While a general binary tree simply dictates structure (max two children), a Binary Search Tree (BST) dictates order. This ordering rule is what makes them incredibly useful for rapid data retrieval.
Definition: A binary tree where, for any given node, all values in its left subtree are less than the node's value, and all values in its right subtree are greater than the node's value.
The BST Rule:
- Left child < Parent
- Right child > Parent
Common Mistake to Avoid: The rule applies to the *entire subtree*, not just the immediate children. For example, the great-great-grandchild in the left subtree must still be less than the root.
Why use a BST?
BSTs dramatically speed up searching. If you are looking for the number 50 and the root is 75, you immediately know you only need to check the left subtree. This eliminates about half the data in one step, similar to how a Binary Search works on a sorted array, making the search time very efficient (logarithmic complexity, O(log n)).
Key Takeaway: BST Efficiency
A Binary Search Tree is an ordered data structure used for efficient searching, insertion, and deletion of data. The ordering rules allow for quick elimination of half the remaining nodes in each comparison.
4.3 Tree Traversal Algorithms (Syllabus 3.11.2)
Tree Traversal refers to the process of visiting (processing or examining) every node in the tree exactly once, in a systematic order. Unlike linear structures (like lists or queues), which have only one obvious order, trees have many, but we focus on three main types based on the relative position of the root/parent node in the output sequence.
4.3.1 The Three Traversal Methods
All three methods are typically implemented recursively (a subroutine calling itself) because the definitions apply naturally to a node and its subtrees.
To remember the three types, focus on when the Parent node is visited relative to its Left and Right children:
1. Pre-order Traversal (P L R)
The current node (Root/Parent) is visited before its children.
Steps:
1. Visit the Root node (Process data).
2. Recursively traverse the Left subtree.
3. Recursively traverse the Right subtree.
2. In-order Traversal (L P R)
The current node (Parent) is visited in between its Left and Right children.
Steps:
1. Recursively traverse the Left subtree.
2. Visit the Root node (Process data).
3. Recursively traverse the Right subtree.
3. Post-order Traversal (L R P)
The current node (Parent) is visited after both its children.
Steps:
1. Recursively traverse the Left subtree.
2. Recursively traverse the Right subtree.
3. Visit the Root node (Process data).
Memory Aid for Traversal Order
Pre, In, Post describes the location of the Parent (P).
P: P L R (Parent is first)
I: L P R (Parent is in the middle)
T: L R P (Parent is last – think Terminal or last)
4.3.2 Tracing Traversal (Step-by-Step Example)
Imagine a small BST:
50 (Root)
/ \
25 75
Tracing Pre-order (P L R):
1. Start at 50 (P). Print 50.
2. Go Left (L) to 25. Print 25.
3. Go Left from 25 (None). Go Right from 25 (None).
4. Back up to 50. Go Right (R) to 75. Print 75.
Result: 50, 25, 75
Tracing In-order (L P R):
1. Start at 50 (P). Go Left (L) to 25.
2. From 25, go Left (L) (None).
3. Print 25 (P).
4. Go Right (R) from 25 (None).
5. Back up to 50. Print 50 (P).
6. Go Right (R) to 75. Go Left from 75 (None).
7. Print 75 (P).
Result: 25, 50, 75
Important Connection: Tracing a BST using In-order traversal (L P R) always outputs the elements in ascending order. This is the primary use case for In-order traversal.
Tracing Post-order (L R P):
1. Start at 50 (P). Go Left (L) to 25.
2. From 25, go Left (L) (None). Go Right (R) (None).
3. Print 25 (P).
4. Back up to 50. Go Right (R) to 75.
5. From 75, go Left (L) (None). Go Right (R) (None).
6. Print 75 (P).
7. Back up to 50. Print 50 (P).
Result: 25, 75, 50
4.4 Uses of Tree Traversal (Syllabus 3.11.2)
Different traversal methods are useful depending on the task you need to perform.
4.4.1 Applications of In-order Traversal
The main use is specifically for a Binary Search Tree:
1. Outputting the contents of a BST in ascending order.
4.4.2 Applications of Pre-order Traversal
Since the parent is visited first, this method is useful for setting up the structure before dealing with the details.
1. Copying a tree: If you traverse in pre-order, you create the root first, then its left branch, then its right branch—exactly the order needed to reconstruct a copy.
2. Producing a prefix expression: This applies to expression trees, where operators are placed before their operands (e.g., \(+ A B\)).
4.4.3 Applications of Post-order Traversal
Since the parent is visited last, this is often used for operations where you need to process all children before manipulating the parent (or destroying the structure).
1. Emptying or deleting a tree: You must delete all children (leaf nodes) before you delete the parent node (otherwise you lose access to the children first). Post-order ensures that deletion starts at the lowest level.
2. Producing a postfix expression: This applies to expression trees, where operators are placed after their operands (e.g., \(A B +\)). This is also known as Reverse Polish Notation (RPN).
Key Takeaway: Traversal Uses
Remember the critical link: In-order traversal of a BST = Sorted Output.
P-reorder (Parent first) is great for P-reparing structure (Copying).
Post-order (Parent last) is great for P-ost-mortem cleanup (Emptying/Deleting).