Chapter: Data Representation - From Our World to the Computer's World
Hello everyone! Ever wondered how your computer understands your photos, music, and even the words you're reading right now? It's not magic! Computers have their own special language, and in this chapter, we're going to learn how to speak it. We'll explore how all kinds of information, from a simple number to a complex video, get translated into a format that a computer can understand.
Understanding this is a fundamental skill in ICT. It's like learning the alphabet before you can read. Let's dive in and decode the digital world together!
1. Analog vs. Digital Data: The Two Worlds
In our world, most things are analog. In the computer's world, everything is digital. Let's see what that means.
What is Analog Data?
Think of a dimmer switch for a light. You can turn the knob smoothly to get any level of brightness, from completely off to fully on, and all the infinite levels in between.
Analog data is continuous, meaning it can have an infinite number of values within a range.
Real-world examples: The sound waves from your voice, the temperature in a room, the colours in a real-life sunset.
What is Digital Data?
Now, think of a standard light switch. It has only two positions: ON or OFF. There is nothing in between.
Digital data is discrete, meaning it's represented by a fixed number of separate values. In computers, this is simplified to just two values: 0 (Off) and 1 (On). Each 0 or 1 is called a bit (short for binary digit).
Real-world examples: The text in a book (made of separate letters), the score in a football game, the number of steps you've walked today.
Converting Between Worlds
Computers can't understand the smooth, continuous nature of the real world. So, we need to convert analog data into digital data for a computer to process it.
Analog to Digital Conversion (ADC): This is when we capture real-world data and turn it into 0s and 1s.
Example: Recording your voice with a microphone on your phone. The microphone captures the analog sound waves, and the phone's hardware converts them into a digital audio file.Digital to Analog Conversion (DAC): This is when the computer turns its 0s and 1s back into something we can experience.
Example: Playing an MP3 file. The computer sends the digital data to your headphones or speakers, which convert it back into analog sound waves that you can hear.
Why Do Computers Love Digital Data?
So, why go through all this trouble? Why don't we just build analog computers? IT systems use digital data for several key reasons:
Reliability: Digital signals are more resistant to interference (noise). A '1' is still a '1' even if it gets a little distorted, but any distortion to an analog signal changes the information itself.
Perfect Copies: You can copy a digital file millions of times, and the last copy will be identical to the first. Think about copying an MP3 file versus an old cassette tape - the tape's quality gets worse with each copy!
Easy to Process: Computers are fundamentally built on billions of tiny switches (transistors) that can be either ON or OFF. This matches perfectly with the 1s and 0s of digital data.
Efficient Storage: Massive amounts of digital data can be stored in very small physical spaces (like on a tiny memory card).
The Power of Bits
The number of bits you use determines how many different things you can represent. The relationship is simple: with n bits, you can create $$2^n$$ unique combinations.
1 bit can represent $$2^1$$ = 2 things (e.g., On/Off, True/False)
2 bits can represent $$2^2$$ = 4 things (e.g., 00, 01, 10, 11)
3 bits can represent $$2^3$$ = 8 things (e.g., eight different colours)
A group of 8 bits is called a byte. A byte can represent $$2^8$$ = 256 different combinations.
This simple idea is the foundation of everything that follows!
Key Takeaway
Data in the real world is mostly analog (continuous), but computers only understand digital data (discrete values, 0s and 1s). We convert between these two forms to interact with computers. Computers use digital data because it's reliable, easy to copy perfectly, and efficient to process and store.
2. Number Systems: The Language of Computers
We are used to the denary (or decimal, base-10) system, which uses ten digits (0-9). Computers use other systems based on the power of 2. Let's learn to translate between them. Don't worry, it's just like learning a new code!
Denary (Base-10) to Binary (Base-2)
To convert a denary number to binary, we use the method of repeated division by 2.
Step-by-Step Guide: Convert Denary 25 to Binary
1. Divide the number by 2 and write down the remainder.
2. Use the result (quotient) as your new number and repeat the process.
3. Keep going until the result is 0.
4. The binary number is the remainders read from the bottom up.
25 ÷ 2 = 12 remainder 1
12 ÷ 2 = 6 remainder 0
6 ÷ 2 = 3 remainder 0
3 ÷ 2 = 1 remainder 1
1 ÷ 2 = 0 remainder 1
Reading the remainders from the bottom up, we get 11001. So, $$25_{10} = 11001_2$$.
Binary (Base-2) to Denary (Base-10)
To convert binary to denary, we use place values, which are powers of 2.
Step-by-Step Guide: Convert Binary 11001 to Denary
1. Write down the binary number.
2. Above each digit (from right to left), write its place value, starting with $$2^0=1$$, then $$2^1=2$$, $$2^2=4$$, and so on.
3. Add up the place values for every digit that is a 1.
Place Values: 16 8 4 2 1
Binary Number: 1 1 0 0 1
Now, add the values where there is a '1': $$16 + 8 + 1 = 25$$. So, $$11001_2 = 25_{10}$$. It works!
Hexadecimal (Base-16)
Binary numbers can get very long and hard for humans to read. Hexadecimal (or Hex) is a base-16 system used as a shorthand for binary. It uses digits 0-9 and then letters A-F to represent values 10-15.
Hex Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A (10), B (11), C (12), D (13), E (14), F (15)
Converting Between Binary and Hexadecimal
This is the easy part! Because $$16 = 2^4$$, every one hexadecimal digit corresponds to exactly four binary digits.
Memory Aid: Binary to Hex Conversion Table
0000 = 0 | 0100 = 4 | 1000 = 8 | 1100 = C (12)
0001 = 1 | 0101 = 5 | 1001 = 9 | 1101 = D (13)
0010 = 2 | 0110 = 6 | 1010 = A (10) | 1110 = E (14)
0011 = 3 | 0111 = 7 | 1011 = B (11) | 1111 = F (15)
Step-by-Step Guide: Binary to Hex
Convert the binary number 11011010 to Hex.
1. Split the binary number into groups of 4 digits, starting from the right.
1101 1010
2. Convert each 4-bit group into its hex equivalent using the table.
1101 = D
1010 = A
3. Combine the results. So, $$11011010_2 = DA_{16}$$.
Step-by-Step Guide: Hex to Binary
Convert the hex number 5F to Binary.
1. Take each hex digit individually.
5 F
2. Convert each one to its 4-bit binary group.
5 = 0101
F = 1111
3. Combine the results. So, $$5F_{16} = 01011111_2$$.
Did you know? Hexadecimal is often used to represent colours in web design. For example, #FF0000 represents pure red. The first two digits (FF) are for Red, the next two (00) for Green, and the last two (00) for Blue.
Key Takeaway
Computers use binary (base-2). We use denary (base-10). Hexadecimal (base-16) is a convenient shorthand for long binary numbers. You must be able to convert between these three systems. The key is to remember the conversion methods: repeated division and place values.
3. Handling Negative Numbers: Two's Complement
So far, we've only looked at positive numbers. But how does a computer represent -25? It can't just put a minus sign in front of the 1s and 0s. The most common method is called Two's Complement.
Don't worry if this seems tricky at first, it's a process that you can learn step-by-step. We'll use an 8-bit number as our example. In an 8-bit signed number, the first bit (the Most Significant Bit or MSB) tells us the sign: 0 for positive, 1 for negative.
How to Find the Two's Complement
Step-by-Step Guide: Represent -25 using 8-bit Two's Complement
1. Start with the positive binary number.
First, find the binary for +25. We know it's 11001. In 8 bits, we add leading zeros: $$00011001$$.
2. Invert all the bits.
Flip every 0 to a 1, and every 1 to a 0. This step is called the "one's complement".
From: 0001 1001
To: 1110 0110
3. Add 1 to the result.
Take the inverted number and add 1 to it.
1110 0110
+ 1
-----------------
= 1110 0111
So, in 8-bit two's complement, -25 is represented as 11100111. Notice the first bit is a 1, indicating a negative number!
Why use this weird method?
The main reason is that it makes subtraction incredibly simple for the computer. To subtract a number, the computer just adds its negative version. So, $$10 - 5$$ becomes $$10 + (-5)$$. This means the computer's circuitry (the ALU) only needs to know how to add, which simplifies its design.
Key Takeaway
Two's complement is the standard method for representing negative integers in computers. To find the two's complement of a number, you start with its positive binary form, invert the bits, and then add 1.
4. Binary Arithmetic: How Computers Calculate
Now that we know how to represent positive and negative numbers, let's see how the computer performs simple calculations.
Binary Addition
Binary addition follows four simple rules:
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0, and you carry a 1 over to the next column.
Example: Add 9 (1001) and 5 (0101)
1 (carry)
1 0 0 1 (9)
+ 0 1 0 1 (5)
--------------
1 1 1 0 (14)
Let's check: 8+4+2 = 14. Correct!
Binary Subtraction using Two's Complement
As we mentioned, computers "cheat" at subtraction. To calculate A - B, the computer actually does A + (-B), where -B is the two's complement of B.
Example: Calculate 9 - 5 using 8-bit two's complement
1. Represent the numbers:
+9 = 0000 1001
+5 = 0000 0101
2. Find the two's complement of the number you are subtracting (-5):
+5 = 0000 0101
Invert: 1111 1010
Add 1: 1111 1011 (This is -5)
3. Add the first number and the two's complement result:
1 1 1 1 1 (carry)
0000 1001 (9)
+ 1111 1011 (-5)
-----------------
(1)0000 0100 (4)
We get the answer 0000 0100, which is 4. Notice that there was an extra 'carry' bit at the end. In subtraction using two's complement, we simply ignore this final carry bit.
Overflow Errors
Computers use a fixed number of bits to store numbers (e.g., 8-bit, 16-bit). What happens if the result of a calculation is too big to fit? This is called an overflow error.
Quick Review: Number Ranges
An n-bit space can hold a certain range of numbers. For example, with 8 bits using two's complement:
The smallest number is -128 ($$-2^{7}$$)
The largest number is +127 ($$2^{7} - 1$$)
Example of an Overflow Error
Let's add 100 + 50 using 8 bits. The result should be 150.
100 = 0110 0100
50 = 0011 0010
1 (carry)
0110 0100 (100)
+ 0011 0010 (50)
-----------------
1001 0110
The answer we get is 10010110. Look at the first bit! It's a 1, which means the computer thinks this is a negative number. We added two positive numbers and got a negative result. This is a clear sign of an overflow error. The number 150 is too large to fit into an 8-bit signed integer.
Common Mistakes to Avoid
An overflow can only happen when:
You add two positive numbers and the result's sign bit is 1 (negative).
You add two negative numbers and the result's sign bit is 0 (positive).
Adding a positive and a negative number can NEVER cause an overflow.
Key Takeaway
Computers perform addition using simple rules and handle subtraction by adding the two's complement. An overflow error occurs when the result of a calculation is outside the range of values that can be stored in the available number of bits.
5. Representing Characters: From 'A' to '字'
Numbers are great, but what about text? How does the computer store the letter 'A' or the character '?'? The answer is simple: every character is assigned a unique binary number. These assignments are defined in standards called character sets.
ASCII (American Standard Code for Information Interchange)
This is one of the earliest and simplest standards.
It uses 7 bits, allowing for $$2^7 = 128$$ different characters.
This is enough for all uppercase and lowercase English letters, numbers 0-9, and common punctuation symbols.
An "Extended ASCII" later used 8 bits ($$2^8 = 256$$ characters) to add symbols and characters from some European languages.
Problem with ASCII: 256 characters is not nearly enough to represent characters from all the world's languages!
Big-5 and Guobiao (GB)
To solve the problem for Chinese, different character sets were created.
Big-5: A popular standard in Taiwan and Hong Kong for representing Traditional Chinese characters.
Guobiao (GB): A standard used in mainland China for representing Simplified Chinese characters.
Problem with these: These systems could conflict. A document created using the Big-5 code would look like gibberish on a system that expected a GB code. This created a lot of confusion.
Unicode
Unicode is the modern, universal solution.
Its goal is to provide a unique number for every character in every language, no matter what platform or program is being used.
It can represent over 140,000 characters, including modern and historic scripts, symbols, and even emojis! 😊
UTF-8 is the most common way to implement Unicode and is the dominant character encoding on the World Wide Web.
Unicode solves the problem of conflicting standards by creating one giant, universal character set for everyone.
Key Takeaway
Characters are represented by unique binary codes defined in character sets. ASCII was an early standard for English. Big-5 and GB were developed for Chinese, but could conflict. Unicode is the modern, universal standard that aims to include every character from every language, solving these conflicts. The more bits a character set uses, the more characters it can represent.
Important: For your exam, you need to know the purpose of these standards, not memorize the specific binary codes for any letters!
6. Digitising Multimedia: Bringing Images, Sound, and Video to Life
We've covered numbers and text, but what about the rich media we use every day? All multimedia must also be digitised (converted to binary).
Representing Images
The most common type of digital image is a bitmap.
Imagine the image is a grid. Each square in the grid is a pixel (picture element).
The computer stores the colour value for each and every pixel.
Resolution: The number of pixels in the grid (e.g., 1920 x 1080). Higher resolution means a sharper image, but a larger file size.
Colour Depth: The number of bits used to store the colour of each pixel. More bits mean more possible colours, but also a larger file size. For example, 24-bit "True Color" uses 8 bits for Red, 8 for Green, and 8 for Blue, allowing for $$2^{24}$$ (about 16.7 million) different colours!
Image File Formats
BMP (Bitmap): Uncompressed. Stores data for every single pixel. This means perfect quality but very large file sizes.
JPG/JPEG (Joint Photographic Experts Group): Uses lossy compression. It cleverly throws away some image data that the human eye is least likely to notice. This results in much smaller file sizes, which is great for photos on the web. However, some quality is lost forever.
PNG (Portable Network Graphics): Uses lossless compression. It finds clever ways to store the pixel data more efficiently without throwing any of it away. The quality is perfect (like BMP), but the file size is smaller. It's great for graphics, logos, and images that need a transparent background.
Representing Sound
To digitise a continuous sound wave, a process called sampling is used.
The height (amplitude) of the sound wave is measured at thousands of regular intervals every second.
Each measurement, or "sample," is stored as a binary number.
When you play the sound back, the computer reads these numbers and generates a sound wave that is a very close approximation of the original.
Sound File Formats
WAV (Waveform Audio File): Uncompressed audio. It stores the value of every single sample. This gives excellent, studio-level quality but results in very large files (about 10MB for one minute of music).
MP3 (MPEG Audio Layer-3): Uses lossy compression. Like JPGs for images, it removes parts of the sound that humans are least likely to hear. This dramatically reduces the file size (about 1MB for one minute of music), making it perfect for portable music players and streaming.
Representing Video
A video is simpler than you might think: it's just a sequence of images (called frames) displayed very quickly, combined with an audio track.
The frame rate (frames per second, or fps) determines how smooth the motion appears. Common frame rates are 24, 30, and 60 fps.
Video files are almost always compressed because uncompressed video data is enormous.
Video File Formats
AVI (Audio Video Interleave): An older container format. It can hold video and audio that is compressed in many different ways. File sizes can vary wildly.
MPEG-4 (or MP4): A modern standard that is highly compressed using very efficient methods. It provides good quality video at relatively small file sizes, making it the most popular format for online video (e.g., on YouTube) and mobile devices.
Document Formats
TXT (Plain Text): Stores only the characters themselves (using ASCII or Unicode). No formatting like bold, italics, or font sizes. Very small file size.
PDF (Portable Document Format): Stores text, fonts, images, and formatting in a fixed layout. The document will look the same on any computer, which is its main advantage.
DOCX / ODT: These are complex formats used by word processors like Microsoft Word (DOCX) and LibreOffice Writer (ODT). They store text, formatting information, images, tables, and much more.
Key Takeaway
All multimedia is digitised. Images are grids of pixels. Sound is captured by sampling a sound wave. Video is a sequence of image frames with audio. File formats use different compression techniques to manage file size. Lossless compression reduces size with no quality loss (PNG). Lossy compression makes files much smaller but permanently removes some data (JPG, MP3).