Study Notes: Instruction Format (Computer Science 9645)
Welcome to Machine Code!
Hi there! This chapter is your key to understanding the very core of how a computer works. We are moving beyond high-level code like Python and stepping into the raw language the Central Processing Unit (CPU) actually understands: Machine Code. Don't worry if it seems complex—we'll break down the blueprint of every computer command: the Instruction Format.
Understanding instruction format is crucial because it explains:
- How a simple command like "Add 5 to Register R1" is translated into a sequence of bits.
- How the computer knows *what* operation to perform and *where* to find the data it needs.
Section 1: The Processor Instruction Set
When you write a program, the CPU needs to know exactly which actions it can perform. This list of valid actions is known as the Processor Instruction Set.
What is an Instruction Set?
An Instruction Set is the complete collection of all the instructions (commands) that a specific CPU can understand and execute. Think of it as the CPU's dictionary and vocabulary.
- Every command, from basic arithmetic (add, subtract) to memory access (load, store), is defined within this set.
Processor Specificity
It is very important to remember that the instruction set is processor specific. This means:
- The instructions used by an Intel CPU (like those in many desktop computers) are different from the instructions used by an ARM CPU (like those commonly found in smartphones).
- If you write a program using the instruction set for Processor A, it generally cannot run directly on Processor B, unless Processor B is designed to emulate (imitate) Processor A.
Key Takeaway: The CPU's language (the Instruction Set) is unique to that type of processor.
Section 2: The Anatomy of an Instruction
Every single instruction (whether in machine code or assembly language) follows a specific format. It must contain three fundamental components:
- Opcode (Operation Code)
- Addressing Mode
- Operand(s)
Analogy: A Recipe Instruction
Think of an instruction as a step in a recipe:
"Mix (Opcode) the contents directly (Addressing Mode) from the bowl labeled 'Ingredients' (Operand)."
1. The Opcode (Operation Code)
The opcode tells the CPU what operation to perform.
- It is the function of the instruction, such as ADD, SUBTRACT, LOAD, or STORE.
- In machine code, the opcode is represented by a specific pattern of bits (e.g., 0010 could mean ADD).
2. The Operand(s)
The operand specifies what data the instruction should operate on, or where to find that data.
The operand can represent one of three things:
- A Value (the actual data itself, e.g., the number 5).
- A Memory Address (a location in RAM).
- A Register (a small, fast storage location inside the CPU).
3. The Addressing Mode
The addressing mode tells the CPU how to interpret the operand—is the operand the data itself, or is it a location?
The format (the layout and size of the bits) of a machine code instruction can vary greatly depending on the type of instruction (e.g., a simple HALT instruction might only need an opcode, while a complex calculation needs an opcode, an addressing mode, and multiple operands).
Instruction = Opcode (What to do) + Addressing Mode (How to find data) + Operand(s) (Data or Location).
Section 3: Instruction Representation
We need two ways to represent instructions, one for the machine and one for us humans.
Machine Code (Binary)
In its native form, machine code is expressed entirely in binary (sequences of 0s and 1s).
- This is the lowest-level language; it is directly executable by the CPU.
- Example: 0010 0101 00001010 (This might mean: ADD, immediate addressing, value 10).
Assembly Language (Mnemonics)
Assembly language uses mnemonics—short, human-readable codes—to represent machine code instructions.
- Mnemonics are much easier for programmers to read and write than long binary strings.
- Example: ADD #10, R5
Did You Know? Mnemonics are not understood by the CPU directly. They must be translated into machine code binary by a piece of software called an Assembler.
Key Takeaway: Machine code is binary (0s/1s); Assembly Language uses mnemonics (like ADD, LOAD, STORE).
Section 4: Understanding Addressing Modes (Crucial!)
The Addressing Mode determines how the CPU interprets the operand field to locate the data it needs. The syllabus requires you to understand and apply three main types.
1. Immediate Addressing
In immediate addressing, the operand field is the actual value (data) to be used by the instruction.
Analogy: The Note on the Fridge
If you tell someone, "The value is 5," you don't need to tell them where to find it; you gave them the value immediately.
- How it works: The instruction contains the value itself.
- Use: Loading constants or immediate values into a register.
- Example (Assembly):
LOAD #5, R1(Load the value 5 into Register R1). The '#' symbol often indicates immediate addressing.
2. Direct Addressing
In direct addressing, the operand field is the address (location) where the data can be found.
Analogy: The Postbox Number
If you tell someone, "Find the value at address 100," they must go to location 100 to fetch the item.
- How it works: The CPU interprets the operand as a main memory address or a register number. It then goes to that location to fetch the data.
- Use: Accessing variables stored in memory.
- Example (Assembly):
LOAD 100, R1(Load the value stored at memory address 100 into Register R1).
3. Indirect Addressing
In indirect addressing, the operand field specifies a register, and that register holds the actual address of the data in main memory.
This is a two-step process:
- Look at the operand to find the register number (R_X).
- Go to R_X to find the memory address (A).
- Go to memory address (A) to find the actual data.
Analogy: The Index Card
You are told, "Look in the envelope (Register R1). Inside the envelope is a piece of paper that says 'Address 500'. Now go to Address 500 to find the item."
- How it works: The operand points to a register, and that register indirectly points to the required memory location.
- Use: Useful for iterating through arrays or complex data structures, as the register holding the address can be easily incremented (increased).
- Example (Assembly):
LOAD (R2), R1(The brackets indicate indirect addressing. Load the value from the memory address stored inside Register R2, and put that value into Register R1).
Comparing the Addressing Modes
| Mode | Operand Contains... | Steps to Data |
|---|---|---|
| Immediate | The Value itself | 0 steps (data is provided immediately) |
| Direct | The Address of the data (in memory or register) | 1 step (go straight to the address) |
| Indirect | A Register that holds the address of the data | 2 steps (go to register, then to memory address) |
Key Takeaway: Mastering the three addressing modes (Immediate, Direct, Indirect) is essential for understanding how instructions locate data.