Assembly Language Programming: Talking Directly to the CPU
Welcome to the world of Assembly Language! Don't worry if this sounds intimidating. This chapter bridges the gap between the high-level code you might write in Python or Java and the raw binary instructions (Machine Code) that the Central Processing Unit (CPU) actually executes.
Understanding assembly language programming is vital because it shows you exactly how a computer works at its most fundamental level—how data moves, how calculations happen, and how control flows through a program. It's the ultimate insider's look!
Key Takeaway from the Introduction:
Assembly language is a low-level programming language that uses simple text commands (mnemonics) to represent machine code instructions.
1. Machine Code and Assembly Language: The Low-Level View
Programming languages are categorised by how close they are to the hardware.
Low-Level Languages (Close to the Hardware)
These languages require deep knowledge of the specific computer architecture (like the CPU's design and registers). They include:
1. Machine Code:
- This is the language of the CPU itself.
- It is represented entirely in binary (0s and 1s).
- Every instruction is unique to a specific type of processor (it is processor specific).
- It is extremely difficult for humans to read or write.
2. Assembly Language:
- This is a symbolic representation of machine code.
- It uses short, memorable codes called mnemonics (like ADD, LOAD, SUB) instead of binary digits.
- There is usually a one-to-one relationship: one assembly instruction translates into one machine code instruction.
- It is also processor specific, meaning assembly code written for one type of CPU (e.g., ARM) will not work on another (e.g., x86) without modification.
Did you know? The name mnemonic comes from the Greek word meaning 'of memory,' helping you remember the function (e.g., ADD for addition).
Comparing Low-Level vs. High-Level Languages
You might be wondering why anyone would use assembly when languages like Python are so much easier.
Advantages of Assembly Language:
- Speed/Efficiency: Programs run much faster because the programmer has direct control over the hardware, allowing them to optimize resource use (time efficiency and memory/space efficiency).
- Access to Hardware: Allows direct manipulation of registers and memory, which is essential for writing operating system components or device drivers.
- Size: Produces very small program files.
Disadvantages of Assembly Language:
- Difficulty: Much harder and slower to write and debug than high-level code.
- Portability: Code is not portable; it must be rewritten for different processor types.
- Maintainability: Complex to maintain, as only the original programmer often fully understands the logic.
Quick Review: The Translator Role
A low-level program needs a translator before the CPU can run it:
Assembly Language (Mnemonics) \(\rightarrow\) Assembler \(\rightarrow\) Machine Code (Binary)
2. The Instruction Format: The Blueprint
Every instruction, whether in binary machine code or assembly mnemonics, follows a standard blueprint.
A processor's entire vocabulary is called the processor instruction set. All instructions in this set typically consist of three main parts:
1. Opcode (Operation Code)
This is the part of the instruction that specifies *what* operation is to be performed.
- Example: ADD, SUB, LOAD, JMP (Jump).
- Analogy: The verb in a sentence (e.g., "Add").
2. Addressing Mode
This specifies *how* the operand (the data) should be interpreted—is it the value itself, or is it a memory address?
- We cover this tricky concept in detail in Section 3!
3. Operands
These are the items (data, memory addresses, or registers) that the instruction will operate on.
- Example: If the instruction is "ADD 5, R1", the operands are 5 and Register R1.
- Analogy: The nouns or objects in the sentence (e.g., "5" and "Register One").
Although you don't need to memorise specific formats, know that instructions are often split into bit groups. For example, the first 4 bits might be the Opcode, the next 1 bit is the Addressing Mode, and the remaining 27 bits are the Operands.
3. Addressing Modes: Finding the Data
The addressing mode tells the CPU where to find the data required for the operation. It is crucial to get these three types clear!
1. Immediate Addressing (IMM)
In immediate addressing, the operand itself is the actual value to be used in the instruction.
- The value is ready instantly (immediately).
- Mnemonic Example:
LOAD R1, #5(Load the number 5 into Register R1). - Analogy: Giving someone an object directly: "Take this 5 pounds."
2. Direct Addressing (DIR)
In direct addressing, the operand is the memory address where the data is stored.
- The CPU must go to that address to fetch the value.
- Mnemonic Example:
ADD R1, 100(Add the value stored at memory address 100 to R1). - Analogy: Giving someone a location: "Go to Bank Box 100 and take the value inside."
Note: In some contexts, Direct Addressing might refer to using a register number instead of a main memory address.
3. Indirect Addressing (IND)
This is the trickiest one, but incredibly powerful! In indirect addressing, the operand (usually a register or a memory location) holds the address of the actual data.
- The CPU reads the operand to find the address, and then goes to that address to fetch the value (a two-step lookup).
- Mnemonic Example:
LOAD R1, (R2)(Load into R1 the value stored at the memory address currently held in Register R2). - Analogy: Giving someone a key to a locker, where the locker contains the address of the treasure chest: "Go to Locker R2, read the address written inside (e.g., 500), and then go to address 500 to find the actual value."
Common Mistake Alert!
Students often confuse Direct and Indirect Addressing:
- Direct: Operand = Value's Address.
- Indirect: Operand = Address that contains the Value's Address.
4. Basic Assembly Language Operations (3.8.2)
Assembly instructions can be grouped into categories based on their function. You must understand and be able to apply these basic operations when writing or tracing programs.
4.1 Data Transfer Operations
- LOAD: Moves data from memory (or an immediate value) into a register in the CPU.
- STORE: Moves data from a register into a specific location in main memory.
Example: LOAD R1, #10 (Immediate load)
Example: STORE R1, 200 (Store the contents of R1 into memory address 200)
4.2 Arithmetic Operations
These perform mathematical calculations, usually involving values held in registers.
- ADD: Adds two values (operands) and places the result in a specified location (often a register).
- SUB: Subtracts two values.
4.3 Control Flow Operations (Branching and Comparison)
These instructions alter the sequence in which instructions are executed, enabling decisions, loops, and subroutines.
- COMPARE (CMP): Compares two values. This does not change the registers, but it sets flags in the Status Register (SR).
- BRANCH (Conditional/Unconditional): Changes the value of the Program Counter (PC) to jump to a new instruction address.
- Unconditional Branch (e.g., JMP, B): Jumps regardless of the comparison result.
- Conditional Branch (e.g., JNE, BEQ): Jumps only if a certain condition (set by the previous COMPARE instruction) is met (e.g., Jump if Not Equal, Branch if Equal).
- HALT: Stops the execution of the program.
4.4 Logical and Bitwise Operations
These instructions perform Boolean operations on individual bits within a byte or word. They are essential for masking, setting, or checking specific flags/bits.
- Logical Bitwise Operators: AND, OR, NOT, XOR (Exclusive OR).
- Example: Bitwise AND can be used to check if a specific bit in a status register is set to 1.
4.5 Shift Operations
These move all the bits in a register either to the left or to the right. This is an extremely efficient way to perform multiplication or division by powers of two.
- SHIFT RIGHT: Shifts all bits to the right. This effectively performs integer division by 2.
- SHIFT LEFT: Shifts all bits to the left. This effectively performs multiplication by 2.
Note: When shifting, the bit that falls off the end is usually discarded, and a zero is inserted into the newly empty position (e.g., the leftmost position for a right shift).
Key Takeaway from Operations:
Assembly operations are simple and atomic (do one small thing). Complex programs are built by combining hundreds or thousands of these simple instructions in sequence.
5. Tracing and Converting Assembly Programs
In the exam, you will need to demonstrate that you can follow logic by tracing an assembly program and also write simple assembly programs based on requirements or pseudocode.
The Role of Registers in Tracing
When tracing assembly code, you must keep track of the contents of the key CPU components, especially the general-purpose registers and memory addresses. Instructions typically modify these locations.
- Analogy: Registers are like small, super-fast scratch pads on the CPU used for immediate calculations.
Converting Pseudocode to Assembly (and Vice Versa)
You must be able to convert algorithms expressed in pseudocode into assembly language and interpret assembly language back into pseudocode.
Example Conversion Scenario (Pseudocode to Assembly):
Pseudocode:
IF NumberA > NumberB THEN Output = NumberA
Simplified Assembly Logic:
- LOAD R1, NumberA (Directly address NumberA)
- LOAD R2, NumberB (Directly address NumberB)
- COMPARE R1, R2
- BRANCH_LE END_IF (Branch if Less than or Equal to)
- STORE R1, Output (Store the value of NumberA into Output address)
- END_IF: HALT
Encouragement: Converting loops and selections requires careful use of COMPARE and BRANCH instructions to manage the flow of the Program Counter (PC).
6. Program Translation: The Assembler (3.6.3.2)
To run assembly language, we need a translator called an assembler.
What is an Assembler?
An assembler is a type of system software whose sole purpose is to convert assembly language (source code written using mnemonics) into machine code (object/executable code).
Source Code vs. Object Code
- Source Code: The human-readable program code (e.g., the Assembly mnemonics:
ADD R1, #5). - Object (Executable) Code: The binary code output by the translator that the CPU can execute directly (e.g.,
01001001 00000101).
Because assembly language has a direct one-to-one mapping with machine code, the translation process performed by an assembler is typically simpler and faster than compilation or interpretation required for high-level languages.
Chapter Summary: Assembly Language Core Concepts
1. Assembly is a low-level language using mnemonics, translated by an assembler.
2. Instructions are composed of an Opcode, an Addressing Mode, and Operands.
3. The three addressing modes are: Immediate (value is operand), Direct (address is operand), and Indirect (register holds the address of the data).
4. Core operations involve data movement (LOAD/STORE), arithmetic (ADD/SUB), flow control (COMPARE/BRANCH), and bit manipulation (Logical/Shift).