Network Sockets

Oct 24

Written By Michael Day

“The network is the computer.” - Scott McNealy

Computer rack with network cables — Photo by Taylor Vick on Unsplash

This page is a basic introduction to network socket programming in general.

Network Socket Definition

According to Wikipedia a network socket – hereafter sometimes simplified as only “socket” - is defined as:

A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming interface (API) for the networking architecture. Sockets are created only during the lifetime of a process of an application running in the node.
Because of the standardization of the TCP/IP protocols in the development of the Internet, the term network socket is most commonly used in the context of the Internet protocol suite, and is therefore often also referred to as Internet socket. In this context, a socket is externally identified to other hosts by its socket address, which is the triad of transport protocol, IP address, and port number.

So what is transport protocol? Read on!

Transport Protocol

In the context of network sockets, the term "transport protocol" refers to a set of rules and conventions that dictate how data is transmitted, received, and managed across a network. Transport protocols are a crucial part of the Internet and networking, as they ensure that data is delivered reliably and efficiently from one point to another.

It might helpful to think of the layers of the TCP/IP protocol stack like layers of cake.

Think of the TCP/IP protocols like layers of cake.
Photo by Mikhail Nilov: https://www.pexels.com/photo/a-tall-cake-with-colorful-layers-8245039/

The Transport layer forms one of the 4 layers of the TCP/IP protocol stack. A good summary is found here: What are the 4 layers of the TCP IP model?. In a nutshell the 4 layers are:

1. Application Layer

2. Transport Layer

3. Internet Layer

4. Link Layer

The layers are further abstracted to 7 in the OSI model but the TCP/IP model is sufficient for our present tutorial and henceforth we will refer only to that model.

Application Layer

The Application Layer is the topmost layer in the TCP/IP (Transmission Control Protocol/Internet Protocol) model and is responsible for providing network services directly to end users or applications. This makes it the closest layer to the end-user or application software.

Key functions and characteristics of the Application Layer include:

Application Protocols: This layer defines various application protocols that allow communication between different applications and services. Examples of application layer protocols include HTTP (Hypertext Transfer Protocol), Simple Mail Transfer Protocol, FTP (File Transfer Protocol), and more.
User Interface: Provides the user interface elements that allow applications to interact with the network. This can include things like input fields, buttons, and displays.
Error Handling and Recovery: Error handling and recovery mechanisms are typically implemented at this layer. For example, in HTTP, if a web page is not found (HTTP 404 error), this is handled at the Application Layer.
Authentication and Security: The Application Layer can also handle user authentication and data encryption to ensure the security of data as it is transmitted over the network.

In summary, the Application Layer of the TCP/IP model plays a crucial role in enabling communication between different applications and services over a network. It provides the necessary protocols and services to facilitate this communication while also ensuring security and data integrity.

Transport Layer

The Transport Layer is a critical component of the TCP/IP (Transmission Control Protocol/Internet Protocol) suite, responsible for providing end-to-end communication and data transfer services. This layer serves as a bridge between the Application Layer and the Internet Layer in the TCP/IP model.

Key functions and characteristics of the Transport Layer include:

End-to-End Communication: The Transport Layer ensures reliable communication between two devices on a network, even if they are not directly connected. It abstracts the complexities of the underlying network and provides a logical connection between applications running on different devices.

Segmentation and Reassembly: It divides large messages or data into smaller segments for efficient transmission. Each segment is assigned a Sequence Number, which allows for proper reassembly at the receiving end. This segmentation and reassembly process helps optimize data transfer and manage congestion.
Error Detection: The Transport Layer is responsible for error detection. It uses various mechanisms like Checksums and Acknowledgements to ensure that data is transmitted accurately. In the case of errors, it can request retransmission of lost or corrupted segments.
Flow Control: Flow control mechanisms help in managing data transmission rates between sender and receiver. This prevents the sender from overwhelming the receiver with data, which could lead to congestion or data loss.
Multiplexing and Demultiplexing: The Transport Layer uses port numbers to multiplex (combine) data from different applications on the same device and demultiplex (separate) data at the receiving end to deliver it to the appropriate application.
Congestion Control: The Transport Layer implements mechanisms to monitor and alleviate network congestion. For TCP these are:
- Slow Start: When a new TCP connection is established or after a period of inactivity, TCP uses the "slow start" mechanism to gradually increase the sending rate. This allows the sender to test the network's capacity without overwhelming it.
- Congestion Avoidance: After the slow start phase, TCP enters a congestion avoidance phase, where it incrementally increases its sending rate, but not as aggressively as in the slow start phase. If congestion is detected, TCP will reduce its sending rate to avoid further congestion.
- Congestion Notification: TCP relies on explicit or implicit congestion notifications. Explicit Congestion Notification (ECN) is a mechanism in which routers mark packets to indicate network congestion, and this information is passed back to the sender through acknowledgments. Implicit congestion notification involves monitoring packet loss and delays, which are interpreted as signs of congestion.
- Additive increase/multiplicative decrease (AIMD): TCP uses an AIMD approach to adjust the sending rate. When congestion is detected, TCP decreases its sending rate multiplicatively, but it increases it additively when network conditions are favorable.
Session Management: The Transport Layer can establish, maintain, and terminate communication sessions between devices. For example, in TCP, a connection is established using a 3-Way Handshake Process (SYN, SYN-ACK, ACK), and it can be gracefully terminated.

Essentially the transport layer ensures that data are delivered reliably and in order. The two basic types of delivery are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). UDP is less reliable but is useful in many scenarios.

Even though the standard says “reliable” sometimes that is not what is desired. For example, in the case of a streaming video, a simpler protocol like UDP can ignore missing packets and in most streaming video applications this is what is necessary. TCP on the other hand helps ensure that reliability is built in through the use of things like TCP Sequence Numbers and the Sliding window protocol. Those topics are outside the scope of this article but is covered in brief on StackOverflow: https://stackoverflow.com/questions/6187456/tcp-vs-udp-on-video-stream.

Internet Layer

In the TCP/IP (Transmission Control Protocol/Internet Protocol) model, the Internet Layer is responsible for routing packets of data between different networks and devices.

Key functions and characteristics of the Internet Layer include:

Routing: The primary function of the Internet Layer is to route data packets from the source to the destination across multiple interconnected networks. It makes decisions about the best path for data transmission based on routing algorithms and destination addresses.
Routing in the Internet Layer involves several steps:
- Routing Table: Each router maintains a routing table that contains entries for various networks and information about how to reach them. These entries include network addresses and the next-hop router (gateway) that should be used to forward data to the destination network.
- Routing Protocols: Routers use routing protocols to exchange information about the state of the network and to update their routing tables. Common routing protocols include RIP (Routing Information Protocol), OSPF (Open Shortest Path First), BGP (Border Gateway Protocol), and others.
- Route Selection: Routers use routing algorithms to determine the best path for routing data. These algorithms consider factors like the network's reachability, the number of hops, link quality, and administrative preferences.
- Subnet Masking: Subnet masks are used to identify the network portion and host portion of an IP address. Routers apply subnet masks to IP addresses to determine the network to which the address belongs. There is a whole subsection on subnetting further down in this article.
Logical Addressing: Devices on the Internet are identified by logical addresses known as IP (Internet Protocol) addresses. These addresses are hierarchical and serve to uniquely identify devices on a global scale. IP addresses generally come in two flavors: Internet Protocol version 4 and IPv6.
Packet Forwarding: The Internet Layer forwards data in the form of packets. Each packet typically contains a source and destination IP address, as well as other information like the Time to live (TTL) field, which limits the lifespan of a packet to prevent it from circulating indefinitely.
Fragmentation and Reassembly: The Internet Layer can fragment packets into smaller pieces for transmission and then reassemble them at the destination. This is essential for efficient data transfer across networks with different Maximum transmission unit (MTU) sizes.
Internet Control Message Protocol (ICMP): ICMP is a internet layer protocol used for error reporting and diagnostic functions. It is often used for tools like ping to check network connectivity and diagnose network problems.
Quality of service (QoS): The Internet Layer may support Quality of Service features that allow certain packets to be prioritized over others, ensuring that real-time traffic, such as Voice over IP or video conferencing (e.g., Zoom) is given higher priority.
Network address translation (NAT): In many cases, devices on a local network share a single public IP address using NAT. The Internet Layer plays a role in translating private IP addresses to a single public IP address for outgoing traffic and vice versa for incoming traffic.

In summary, the Internet Layer in the TCP/IP model plays a crucial role in routing data packets across networks, ensuring that they reach their intended destinations. It relies on IP addressing to uniquely identify devices and utilizes routing protocols to determine the best path for data transmission. This layer is essential for the functioning of the global Internet and facilitates end-to-end communication between devices on different networks.

Link & Physical

In the TCP/IP model these are included as single layer, but they are treated separately here.

Link

In the TCP/IP (Transmission Control Protocol/Internet Protocol) model, the Link Layer is closely connected to the physical layer and serves as an interface between the Internet layer and the physical medium.

Key functions and characteristics of the Link Layer in the TCP/IP model include:

Media Access Control (see MAC address): This layer manages access to the physical transmission medium, determining which device can transmit data at a given time. Protocols like Ethernet use CSMA/CD (Carrier-sense multiple access with collision detection) to coordinate access on shared networks.
Framing is a crucial function performed by the Link Layer in the TCP/IP model. It involves the division of data into smaller, manageable units called frames for transmission over a network — it provides structure and organization to the data being sent. An overview of framing:
- Frames: Frames are the basic units of data at the Link Layer. Each frame consists of a header and a trailer. The header contains control and addressing information, while the trailer typically includes error-checking information like a Frame check sequence (FCS).
- Header: The header of a frame contains essential information for the Link Layer to process the frame, such as the source and destination MAC addresses, control information, and frame type. This information helps routers and switches determine where to forward the frame.
- Trailer: The trailer usually includes an error-detection mechanism, such as a Cyclic redundancy check (CRC) or a Checksum, which allows the receiver to check for transmission errors. If errors are detected, the frame can be discarded or corrected.
- Framing is fundamental for data communication, allowing devices on a network to correctly interpret and process the data being transmitted. Different Link Layer protocols, such as Ethernet, Wi-Fi, and PPP, have their own framing formats, but the core purpose of framing is consistent across these protocols.
Error Detection and Correction: The Link Layer may include error detection and correction mechanisms to ensure the integrity of data frames during transmission. This is particularly important for wired and wireless communication where data can be corrupted or lost due to interference.
- Error correction at the Link Layer is a crucial mechanism used to ensure the integrity of data frames during their transmission over a network. The primary goal of error correction is to detect and correct errors that may occur due to noise, interference, or other factors in the network. Various error correction techniques are used to achieve this, including the following:
- Automatic repeat request (ARQ) is a common error correction technique used in the Link Layer. It operates by requesting the retransmission of frames that are detected as having errors. The basic steps in ARQ include:
  - Frame transmission: The sender transmits a data frame.
  - Frame reception: The receiver checks the frame for errors.
  - Error detection: If errors are detected, the receiver discards the frame and requests retransmission.
  - Frame retransmission: The sender retransmits the frame.
  - This process continues until the frame is received without errors or until a predetermined number of retransmissions is reached.
- Forward error correction (FEC) is a proactive error correction technique where the sender adds redundant information to the data before transmission. This redundant information allows the receiver to detect and correct errors without the need for retransmission. Common FEC methods include Reed–Solomon error correction and Hamming codes.
- Selective Repeat ARQ: In this variant of ARQ, the receiver has the ability to selectively request retransmission of specific frames rather than waiting for a lost frame to be retransmitted. This approach can improve the efficiency of error correction in some cases.
- Hybrid Approaches: Some error correction techniques in the Link Layer combine aspects of both ARQ and FEC. For example, a system may use FEC to correct common errors but employ ARQ to handle exceptional cases.
- Specific error correction mechanism used can vary depending on the Link Layer protocol and the nature of the network. For instance, Ethernet typically relies on a combination of CRC (Cyclic Redundancy Check) for error detection and ARQ for error correction. In contrast, wireless networks may utilize FEC in conjunction with ARQ for more robust error correction.
Address Resolution Protocol (ARP) is a protocol used in the Link Layer to map IP addresses to MAC addresses. Devices use ARP to discover the MAC address of the destination device on the local network.
Logical link control (LLC): The LLC sublayer manages communication between the Network Layer and the MAC sublayer. It is responsible for encapsulating network layer packets and adding control information before transmission.
IEEE 802.3 (Ethernet) and IEEE 802.11 (Wi-Fi). Ethernet is one of the most common Link Layer technologies for wired LANs, while IEEE 802.11 (Wi-Fi) is commonly used for wireless LANs.
Switching (see Network switch): In modern networks, switches operate at the Link Layer. They use MAC addresses to make forwarding decisions, which improves network performance by reducing network collisions and segmenting network traffic.
Bridging (see Network bridge): Bridges are devices that operate at the Link Layer and connect different network segments. They use MAC Address Tables to determine whether to forward a frame to another segment.

It's important to note that the specific technologies and protocols used in the Link Layer can vary depending on the type of network (wired or wireless) and the physical medium (e.g., Ethernet, Wi-Fi, fiber optics). The Link Layer is responsible for ensuring that data frames are transmitted reliably within a local network or network segment, whereas the Internet Layer is responsible for routing data between different networks. Together, these layers enable end-to-end data communication in the TCP/IP model.

Physical:

The Physical Layer is responsible for the actual transmission of data over a physical medium. It deals with hardware and physical characteristics of the network. Key functions and characteristics of the Physical Layer include:

Physical Medium: This layer involves the actual physical transmission medium, such as cables, fibers, or wireless radio waves.
Data Encoding: Specifies how digital data is encoded into analog signals for transmission over the physical medium and vice versa.
Bit Rate and Data Rate: The Physical Layer defines the rate at which bits are transmitted (Bit rate) and the rate at which meaningful data is transmitted (Data signaling rate).
Physical Topology: May define the physical layout and organization of devices and cables within a network.
Signal Timing and Synchronization: The Physical Layer ensures that transmitting and receiving devices are synchronized in terms of signal timing.
Signal Quality and Noise: Addresses signal quality and minimizes the impact of noise and interference.
Repeater and Hub Devices: Repeaters and hubs operate at the Physical Layer and amplify or regenerate signals

In summary, the Link Layer is responsible for framing data, error detection, MAC addressing, and logical link control. The Physical Layer is responsible for encoding data into physical signals, signal transmission, synchronization, and handling the physical medium. These two layers work together to ensure the reliable transmission of data over a network.

Port Numbers

Ports are primarily associated with the Application Layer and Transport Layer in the TCP/IP protocol suite.

The Wikipedia definition for a port is pretty good:

In computer networking, a port or port number is a number assigned to uniquely identify a connection endpoint and to direct data to a specific service. At the software level, within an operating system, a port is a logical construct that identifies a specific process or a type of network service.

Port numbers range from 0 to 65535. This a result of the use of binary numbers and corresponds to 2¹⁶-1 = 65535 as TCP and UDP use unsigned 16-bit integers. Certain network services have default ports assigned to them and they are well known across the Internet. For example: Default Port Numbers You Need to Know as a Sysadmin.

Servers and Clients

In the context of network socket programming, the distinction between a client and a server is important:

Server-side:

On the server side, you create a socket, bind it to a specific IP address and port, and then listen for incoming connections using functions like socket(), bind(), and listen(). When a client attempts to connect, the server's listening socket accepts the connection, creating a new socket specifically for that client. This new socket is used for communication with the client. The server can use the accepted socket to send and receive data from the client. Common functions for communication include accept() and send()/recv().

Client-side:

On the client side, you create a socket and then initiate a connection to the server using its IP address and port using functions like socket() and connect(). Once the connection is established, you can use the socket to send requests to the server and receive responses. Communication on the client side also involves using functions like send() and recv().

In summary, a server provides services or resources to clients, while clients request and use those services or resources from the server. Both the client and server sides of a network communication involve socket creation, connection establishment, and data exchange, but they play distinct roles in the communication process.

Diagrams

The following diagrams should help visualize what’s occurring:

Client requesting connection from server. In most cases the client has no control over the port chosen. — Client requesting to connect to the server.

The server accepts the client's connection and then duplicates itself (if allowed). — Server duplicates itself then accepts connection from the client.

Note that the server generally MUST be run before the client.

Subnet Prefix Length or Mask

Subnets masks are a result of the limited IP address space available in IP version 4. IP version 6 has not run into such limitations to date, but there are still a lot of IPv4 machines on the Internet and in local intranets today.

There are 5 classes of IPv4 addresses. They are (see 5 Classes of IPv4 Addresses [Class A, B, C, D and E] (meridianoutpost.com))

Class A Public & Private IP Address Range (Large Networks)
Public IP Range: 1.0.0.0 to 127.0.0.0
First octet value range from 1 to 127
Private IP Range: 10.0.0.0 to 10.255.255.255
Subnet Mask: 255.0.0.0 (8 bits)
Number of Networks: 126
Number of Hosts per Network: 16,777,214
Class B Public & Private IP Address Range (Medium-Sized Networks)
Public IP Range: 128.0.0.0 to 191.255.0.0
First octet value range from 128 to 191
Private IP Range: 172.16.0.0 to 172.31.255.255
Subnet Mask: 255.255.0.0 (16 bits)
Number of Networks: 16,382
Number of Hosts per Network: 65,534
Class C Public & Private IP Address Rangev (Small Networks)
Public IP Range: 192.0.0.0 to 223.255.255.0
First octet value range from 192 to 223
Private IP Range: 192.168.0.0 to 192.168.255.255
Special IP Range: 127.0.0.1 to 127.255.255.255
Subnet Mask: 255.255.255.0 (24 bits)
Number of Networks: 2,097,150
Number of Hosts per Network: 254

Classes D and E are special. They are specifically for multicast (D) and restriced use (E). Multicast is described in detail on Wikipedia: Multicast and IP specifically is covered at IP multicast.

Class D IP Address Range
Class D IP addresses are not allocated to hosts and are used for multicasting. Multicasting allows a single host to send a single stream of data to thousands of hosts across the Internet at the same time. It is often used for audio and video streaming, such as IP-based cable TV networks. Another example is the delivery of real-time stock market data from one source to many brokerage companies.
Range: 224.0.0.0 to 239.255.255.255
First octet value range from 224 to 239

Number of Networks: N/A
Number of Hosts per Network: Multicasting
Class E IP Address Class
Class E IP addresses are not allocated to hosts and are not available for general use. These are reserved for research purposes.
Range: 240.0.0.0 to 255.255.255.255
First octet value range from 240 to 255
Number of Networks: N/A
Number of Hosts per Network: Research/Reserved/Experimental

An IP subnet calculator is available at IP Subnet Calculator.

A more comprehensive description of subnets is given at Subnet - Wikipedia and classful networks get treatment at Classful network - Wikipedia.

Host Order vs Network Order

In the book Gulliver's Travels we find 2 groups of people who are adamant about which end of the egg they open first. One group says to open the egg on the little end and the other says the big end is best. See Lilliput and Blefuscu.

In essence that is what engineers of yesteryear had to decide, do we do byte order based on one end or the other of the byte? When the Internet came out this caused problems and Big Endian was decided as the standard for things like TCP/IP. See network protocols - Big endian or Little endian on net? - Stack Overflow.

The gory details of endianness are covered at Endianness - Wikipedia.