Welcome to the Internet: Networking Core Concepts (9645)
Hello! In this crucial chapter, we move from programming concepts to the exciting world of Networking and Cyber Security. We’re specifically diving deep into how the Internet actually works—the fundamental architecture, addressing systems, and the crucial protocols that allow billions of devices worldwide to communicate seamlessly.
Understanding the Internet isn't just theory; it’s the backbone of modern Computer Science. Let's get started!
1. The Structure of the Internet and Data Delivery (3.14.3)
1.1 Packet Switching and Routers
The Internet is not a single entity; it is a giant network of interconnected networks. To handle the massive volume of data flowing through it, the Internet uses a mechanism called Packet Switching.
What is Packet Switching?
Imagine you want to send a long letter (your data) across the world. Instead of sending the whole letter in one heavy envelope, you tear it up into tiny, manageable postcards (packets).
- Definition: Packet switching breaks down data into small, manageable blocks called packets.
- These packets are then sent individually across the network, potentially taking different routes, before being reassembled at the destination.
Analogy: Think of old telephone calls (Circuit Switching) where you needed a dedicated line the whole time, versus sending numerous individual messages (Packet Switching) that share resources, making the network much more efficient!
The Role of Routers
If packets take different routes, who directs them? That’s the job of a Router.
- Routers are specialized devices that connect networks (like your home Wi-Fi network to the Internet Service Provider network).
- They read the destination address contained within each incoming packet.
- Using complex routing tables, they determine the best path (the most efficient next hop) for that packet to reach its final destination.
1.2 The Anatomy of a Packet
For a packet to find its way and be successfully reassembled, it must carry essential information, known as metadata, along with the actual data (the payload).
The main components found in a standard Internet packet include:
- Source Address: The IP address of the sender's device.
- Destination Address: The IP address of the recipient's device.
- Packet Sequence Number: Since packets can arrive out of order, this number tells the destination device the original position of the packet so it can reconstruct the data correctly.
- Time to Live (TTL): A counter that limits the number of hops (routers) a packet can pass through. If the TTL reaches zero, the router discards the packet, preventing packets from endlessly looping in the network (a common mistake to avoid!).
- Payload: The actual chunk of data being transmitted (e.g., part of a web page, or a piece of an email).
- Error Detection/Correction Information: Data (like a checksum) used by the receiver to check if the packet arrived intact and undamaged.
1.3 How Routing is Achieved
Routing is the process of selecting paths in a network along which to send network traffic.
When a router receives a packet, it looks up the destination IP address in its internal routing table. This table contains mappings of destination networks to the next router (the 'next hop') the packet should be sent to. This process is repeated hop-by-hop until the packet reaches the local network of the destination device.
Quick Review: Packet Essentials
Routing is achieved by routers using the Destination Address inside the packet header to decide the next hop, preventing infinite loops using the Time to Live (TTL) counter.
2. Naming and Addressing on the Internet (3.14.3, 3.14.4.3, 3.14.4.4)
If you want to visit www.google.com, you type a name, but computers communicate using numbers. This section explains how those names and numbers work together.
2.1 Uniform Resource Locators (URLs) and Domain Names
A Uniform Resource Locator (URL) is the human-readable address used to locate a resource (like a web page or file) in the context of internetworking.
Example URL: https://www.oxfordaqa.com/cs/index.html
Domain Names and FQDNs
- A Domain Name is the unique name identifying a website (e.g., oxfordaqa.com).
- A Fully Qualified Domain Name (FQDN) includes the host name and the domain name, specifying the exact location of a resource, ending with a final dot (though often omitted in browsers).
- Example: www.oxfordaqa.com. (The www is the host, oxfordaqa is the second-level domain, and com is the top-level domain).
How Domain Names are Organised
Domain names are hierarchical and are organised from right to left:
- Root: (Invisible top level).
- Top-Level Domain (TLD): e.g., .com, .org, .uk.
- Second-Level Domain: e.g., google in google.com.
- Host Name (Subdomain): e.g., www or mail.
2.2 The Domain Name System (DNS)
The core purpose of the Domain Name System (DNS) is to translate human-readable FQDNs into machine-readable IP addresses.
Analogy: DNS is the Internet's phonebook. You look up a person's name (Domain Name) to find their phone number (IP Address).
How DNS Works via Domain Name Servers
- A user types an FQDN (e.g., www.example.com) into their browser.
- The device sends a request to a Domain Name Server (DNS Server).
- If the local DNS server doesn't know the IP address, it queries other servers (starting with the root servers) until it finds the authoritative server for that domain.
- The authoritative server returns the IP address corresponding to the FQDN.
- The IP address is returned to the user’s device, allowing it to communicate directly with the web server.
2.3 IP Addressing (3.14.4.3)
An IP address (Internet Protocol address) is a unique numerical label assigned to every device participating in a computer network.
Network Identifier vs. Host Identifier
An IP address is split into two parts:
- Network Identifier: Identifies the specific network the device is on. All devices on the same local network share this part.
- Host Identifier: Identifies the specific device (host) within that network. This must be unique for every device on that network.
Example: If your IP is 192.168.1.50, maybe 192.168.1 identifies your home network, and 50 identifies your laptop.
The Role of the Subnet Mask
How does a computer know which part is the network and which is the host? It uses a subnet mask.
A subnet mask is a number used to "mask" the IP address. By performing a bitwise AND operation, the device can instantly isolate the network identifier part of the IP address.
IPv4 vs IPv6
- IPv4 (IP version 4): Uses 32 bits, typically displayed as four decimal numbers separated by dots (e.g., 192.168.1.1). It provides approximately 4.3 billion unique addresses.
- IPv6 (IP version 6): Uses 128 bits, providing a virtually unlimited number of unique addresses.
Why was IPv6 introduced?
The world ran out of available IPv4 addresses due to the sheer number of devices connecting to the Internet (phones, IoT devices, etc.). IPv6 was necessary to ensure the continued growth and functionality of the Internet.
Routable and Non-Routable IP Addresses
- Routable IP Addresses (Public IPs): These are unique globally on the Internet. They are assigned by an ISP and are used by routers to direct packets between different networks across the world.
- Non-Routable IP Addresses (Private IPs): These are reserved for use within private networks (like your home or school LAN). They are not visible or routable directly on the public Internet. They allow internal devices to communicate without needing a globally unique address (saving the precious public addresses).
2.4 Internet Registries
Internet registries (like ICANN, IANA, and Regional Internet Registries) are essential services that manage the allocation of IP addresses and domain names globally.
They ensure that every assigned IP address and domain name is globally unique, preventing conflicts and maintaining the structured organisation of the Internet.
3. The Transmission Control Protocol/Internet Protocol (TCP/IP) Model (3.14.4)
Communication requires rules. On the Internet, these rules are called protocols. The most fundamental set of protocols is the TCP/IP stack, which governs how data is packaged, addressed, sent, and received.
3.1 Why Protocol Layering is Used
Networking protocols like TCP/IP use layers for several key reasons:
- Decomposition and Simplification: Complex networking tasks (like sending a file) are broken down into smaller, simpler, independent tasks handled by different layers.
- Standardisation and Interoperability: Each layer is responsible only for its specific function, allowing developers to create compatible hardware and software without needing to worry about the entire stack simultaneously.
- Modularity: If one layer's technology changes (e.g., upgrading from copper cables to fibre optic), only the protocol in that specific layer needs updating, leaving the other layers unaffected.
Analogy: Think of building a house. The Foundation layer handles the ground, the Frame layer handles the walls, etc. Each layer has specific rules and doesn't interfere with the others.
3.2 The Four Layers of the TCP/IP Stack
The TCP/IP stack is typically described using four layers, working from top (closest to the user) to bottom (closest to the physical hardware):
-
Application Layer:
- Role: Provides services directly to the user's application (like a web browser or email client).
- Protocols used here include HTTP, FTP, SMTP, POP3, IMAP, etc.
-
Transport Layer (TCP or UDP):
- Role: Manages the connection end-to-end. It breaks the data into segments (in TCP) and ensures reliable data transfer (correct order, no loss) or fast, unreliable transfer (in UDP).
- TCP (Transmission Control Protocol) is responsible for reassembling the packets and confirming successful delivery.
-
Internet Layer (IP):
- Role: Handles addressing and routing. This is where IP addresses are used to ensure the packet reaches the correct destination network.
- The main protocol here is IP (Internet Protocol). Routers operate primarily at this layer.
-
Link Layer (Network Access Layer):
- Role: Responsible for the physical transmission of data between devices on the same local network segment (e.g., handling Wi-Fi or Ethernet cables).
- This layer uses MAC addresses.
3.3 Sockets and MAC Addresses
Role of Sockets
In the TCP/IP stack (specifically the Transport layer), sockets define the endpoints of communication.
A socket is defined by the combination of an IP address and a port number (IP:Port). This ensures that data doesn't just reach the right computer (IP address), but also reaches the right *application* running on that computer (Port number).
Example: If you access a website, the IP address gets the data to the server, and the port number 80 (for HTTP) or 443 (for HTTPS) ensures the data goes to the web server software.
Role of MAC Addresses
The MAC (Media Access Control) address is a unique physical address permanently embedded into a network interface card (NIC) of a device.
While IP addresses handle routing between different networks (Layer 3), MAC addresses handle communication within the same local network segment (Link Layer, Layer 2).
Remember: IP addresses are like postal addresses (logical, can change), while MAC addresses are like the car's engine serial number (physical, usually permanent).
4. Standard Application Layer Protocols (3.14.4.2)
These protocols sit at the top of the TCP/IP stack and define the specific rules for common Internet services like browsing, email, and file sharing.
4.1 Web Communication: HTTP and HTTPS
The core protocols for retrieving and displaying web pages:
- HTTP (Hypertext Transfer Protocol): The standard protocol used by browsers to retrieve web pages from a web server.
- Web Server Role: Stores web pages, serves them up (usually in text form) when requested by a browser.
- Web Browser Role: Retrieves the web page and all associated resources (images, scripts) using HTTP/HTTPS, and then renders (displays) them accordingly to the user.
- HTTPS (Hypertext Transfer Protocol Secure): This is the secure version of HTTP. It uses SSL/TLS encryption to secure the communication channel, protecting sensitive data like passwords and credit card details.
4.2 File Transfer and Remote Management
- FTP (File Transfer Protocol): Used to transfer files between an FTP client and an FTP server. Access can be anonymous (public files) or authenticated (requiring a username and password).
- Note on FTP Security: FTP is becoming obsolete due to security risks as it transmits data (including passwords) in plain text. It is being replaced by alternatives like SFTP (Secure File Transfer Protocol), which uses encryption.
- SSH (Secure Shell): Used for secure remote management. It allows users to log in securely to a remote computer and execute commands through an encrypted connection.
4.3 Email Protocols
When sending or receiving email, the process typically involves two key steps and several protocols relying on the email server:
-
Sending Email:
- The sender’s email client uses SMTP (Simple Mail Transfer Protocol) to send the email to their local email server.
- The local server then uses SMTP to communicate with the recipient's email server.
-
Receiving Email:
- The recipient’s email client retrieves the email from their email server using either POP3 or IMAP.
- POP3 (Post Office Protocol v3): Generally downloads the email from the server to one device and often deletes it from the server. Suitable if you only check email on one machine.
- IMAP (Internet Message Access Protocol): Synchronises email between the server and multiple devices. The email remains on the server, allowing access from a phone, laptop, or desktop.
- Purpose: DHCP automatically assigns unique IP addresses, subnet masks, gateway addresses, and DNS server information to devices connecting to a network.
- How it works: When a device connects, it sends a broadcast request. The DHCP server responds by offering a configuration lease (a temporary assignment of an IP address) to the device.
- Reduced Administration: Network administrators do not have to manually assign IP configurations to every single device, saving immense time.
- Elimination of Errors: Manual configuration frequently leads to errors, such as assigning the same IP address to two different devices (an IP conflict), which stops both devices from working. DHCP prevents these conflicts.
- Flexibility: When a device leaves the network, the DHCP server reclaims its IP address, allowing it to be reused by a new device later.
POP3 vs. IMAP (The Difference)
5. Dynamic Host Configuration Protocol (DHCP) (3.14.4.4)
5.1 Purpose and Operation of DHCP
The Dynamic Host Configuration Protocol (DHCP) system automates the network configuration process.
5.2 Advantages of DHCP
DHCP offers significant advantages over manual (static) configuration:
Key Takeaway for Networking
The Internet relies on Packet Switching directed by Routers. The TCP/IP stack organises communication into four layers: Application (user services), Transport (reliability/sequencing), Internet (IP addressing/routing), and Link (physical transfer/MAC). DNS translates names (FQDNs) into addresses (IPs).