Notes Home

CA644, System Software
Table of Contents

C, Part 2 - Communication with Sockets

CA644, System Software

Dr. Niall McMahon

2022-10-25

If you print these slides, think about using two pages per sheet although don't worry too much about it!

Credits

Dr. Niall McMahon
Drawing on previous work by:
Dr. Michael Scriney
Dr. Long Cheng
And sources credited in the references.
Autumn 2022.

C

Official Documentation

  • You can find the official C specification documents at C - Project status and milestones.
  • C17 is the current version, released in 2011.
  • All the reference material you need is in the C17 document.

Sockets

What is a Socket?

  • Sockets allow communication between processes, i.e.inter-process communication or IPC.
  • These can be running on the same machine or on different machines.
  • In Unix, every input and output reads or writes using a file descriptor id, an integer that represents an open file, network connection or terminal session etc.
  • System calls such as read() and write() work with sockets.
  • Sockets have several different types; two are covered here, stream sockets and datagram sockets.

Why are Sockets Needed?

  • Remember that each process runs in isolation.
  • Sockets allow processes to share data.
  • Sockets also enable network communication.

Network Layering

Overview

  1. OSI (Open Systems Interconnection) model:

    OSI is a conceptual model that characterises and standardises the communication functions of a telecommunication or computing system without regard to its underlying internal structure and technology. See OSI model at Wikipedia for more.

    The OSI model has many layers; this increase the complexity of networking.

  2. Transmission Control Protocol (TCP)/IP (Internet Protocol Suite): simplifies OSI and merges some layers.
OSITCP
Application
Presentation
Session
Application
Transport Transport
Network Internet
Data link
Physical
Network Interface

Advantages

  • Layering makes networking easier.
  • The programmer does not need to worry about lower level mechanics, e.g. IP packets or ethernet frames.
  • The programmer does not need to know about how the layer is actually implemented.
  • In a similar way to syscalls, sockets act as an API to the transport layer.

Technologies and Protocols

TCPTechnologies and Protocols
Application HTTP, FTP, SMTP
Transport TCP, UDP
Internet IP, ARP (Address Resolution Protocol)
Network Interface Ethernet, FDDI (Fiber), ATM (Asynch. Transfer Mode)

Layers

In the OSI seven-layer model, the lowest layer is Layer-1 (L1). The most abstracted layer is Layer-7 (L7).

  • An application layer is the topmost of the abstraction layers; it specifies the shared communications protocols and interface methods used by hosts in a communications network. The application layer abstraction is used in both of the standard models of computer networking: the Internet Protocol Suite (TCP/IP) and the OSI model. Although both models use the same term for their respective highest level layer, the detailed definitions and purposes are different. See application layer at Wikipedia for more.
  • Layer-2 (L2): in the seven-layer OSI model of computer networking, the data link layer is Layer-2. This layer is the protocol layer that transfers data between adjacent network nodes in a wide area network (WAN) or between nodes on the same local area network (LAN) segment. See network layer at Wikipedia for more.
  • Layer-3 (L3): in the seven-layer OSI model of computer networking, the network layer is Layer-3 - just above Layer-3. The network layer is responsible for packet forwarding including routing through intermediate routers. The TCP/IP Internet Layer is a subset of the OSI Network Layer. See network layer at Wikipedia for more.

Streams

A stream, SOCK_STREAM, is a reliable, two-way communication data stream.

  • The data integrity is ensured during transmission, i.e. data will not be lost.
  • Data are transmitted in order, without boundaries.
  • Send and receive is synchronous, i.e. the send blocks until it is complete.
  • It's based on TCP.

Datagrams

A socket datagram, SOCK_DGRAM, lacks data verification but has a higher efficiency.

  • Fast rather than sequential transmission.
  • The transmitted data may be lost or damaged.
  • Send and receive is asynchronous, i.e. non-blocking or send and forget.
  • Based on UDP, the user datagram protocol. (More about UDP another time.)

TCP Framing

TCP operates on streams, not packets: a single send does not send a single packet. A single receive does not necessarily receive the same amount of data. One machine may send twice, each time sending a stream of 5 bytes. The receiving machine may only receive once, a stream of 10 bytes.

A single message can be framed by sending information about its length or by using delimiters.

Socket Programming

Socket programming is based on the transport layer, i.e. TCP/UDP. When two computers communicate:

  • Communication must be at the same level; data can only be transferred from one process to another at the same level.
  • The functions of each layer in each system must be the same; they must use the same network model.
  • Each layer can use the services provided by the lower layer and provide services to the upper layer.

Addressing

The destination process is identified using:

  • Internet protocol (IP) address for identification of the machine on the network.
  • Media access control (MAC) address, (in principle) unique for each network card.
  • Port number to identify a specific process.

High Level Algorithm

  • Set up the socket connection, defining local and remote sockets:
    • Remote machine IP address and host name.
    • Remote process port.
  • Send and receive:
    • Operates like other I/O in Linux.
  • Close socket connection.

Client and Server

See marked up code for a client and server.

Client Server
create socket
define server socket (name)

(Request connection between local and server sockets)
connect
create socket
define server socket (name)
bind
(local socket is bound to server name)
listen
(Accept connection)
accept
(Client/server session)
write
read
(Client/server session)
read
write
(EOF)
close
(EOF)
read
close
Back to listen/accept step

Structures

struct - Definition

Structures or struct are data types that are built using things of other types.

struct - Example

This example is close to what Deitel and Deitel use in their C++ How to Program, in the chapter on classes and data abstraction.

struct Time {
int hour; // 0 - 23
int minute; // 0 - 59
int second; // 0 - 59
};

struct begins the definition. Time is the structure tag. The structure tag is the name of this structure. Variables are associated with the structure tag, i.e. this new structure type. However! Structures can contain a pointer to another Time structure, i.e. Time *timeptr.

In this example, hour, minute and second are the members of Time. Members can be of any type but structures cannot include an instance of themselves, i.e. in this example a structure of type Time.

To declare a structural variable of type Time, write Time NewTimeVariable. Here, NewTimeVariable is created and is of type Time. The hour variable member of NewTimeVariable is accessed by writing NewTimeVariable.hour.

Structs are another kind of abstraction. Another example of how things can be layered to create higher level structures that improve user experience.

The sockaddr_in Struct Type

sockaddr_in - Definition

From IBM, the sockaddr_in struct is defined as follows:

struct sockaddr_in {
short sin_family;
u_short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};

sockaddr_in - Description

  • sin_len : This field contains the length of the address for UNIX 98 specifications. Note: The sin_len field is provided only for BSD 4.4 compatibility. It is not necessary to use this field even for BSD 4.4/ UNIX 98 compatibility. The field is ignored on input addresses.
  • sin_family: This field contains the address family, which is always AF_INET when TCP or User Datagram Protocol (UDP) is used.
  • sin_port: This field contains the port number.
  • sin_addr: This field contains the IP address. sin_addr is of C type union, i.e. a value that may have any of several representations or formats within the same position in memory. When setting the IP address, the exact type and needs to be further specified. s_addr specifies the IP address as one (4 byte) integer.
  • sin_zero: This field is reserved. Set this field to hexadecimal zeros.

There's a little more here about the memory allocated to the socket address structure.

memset()

memset() - Definition

From IBM:

void *memset(void *dest, int c, size_t count)

In this definition, memset is defined with a pointer to an unspecified type, i.e. void *dest; dest contains an address in memory. The memset() function sets the first count bytes at dest to the value c. The value of c is passed in as an int but it is converted to an unsigned character.

The memset() function returns a pointer to dest.

There's a nice description at A detailed tutorial on Memset in C/C++ with usage and examples.

memset() - Example

memset(buffer, 0, sizeof(buffer));

In this example, memset() is used to set all bits of a character array buffer to 0. The array name buffer is the memory address of the starting point of the array.

From the International C Standard:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:

  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
  • if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits

connect()

connect() - Definition

From IBM:

#include <sys/types.h>
#include <sys/socket.h>
int connect(int socket, struct sockaddr *address, int address_len);

Here, the first parameter is this socket's file descriptor id, sock. address is the start location of a socket address structure that contains the address of the second (target) socket. address_len is the length of the second (target) socket address structure.

connect() - Example

From our client example,

connect(sock, (struct sockaddr*) &serv_addr, sizeof(serv_addr));

Here, the first parameter is the socket file descriptor id, sock, of the client process. The second parameter is the address of the socket address structure of the second (server) process. Its type is cast - i.e. changed - to a pointer to a struct using (struct sockaddr*), its name. Finally, the size occupied by the socket address structure describing the second socket is passed as the third parameter.

bind()

bind() - Definition

From IBM:

#include <sys/types.h>
#include <sys/socket.h>
int bind(int socket, struct sockaddr *address, int address_len);

Here, the first parameter is this socket's file descriptor id, sock. address is the start location of a socket address structure that contains the address of the second (target) socket. address_len is the length of the second (target) socket address structure.

bind() - Example

From our client example,

bind(serv_sock, (struct sockaddr*) &serv_addr, sizeof(serv_addr));

Here, the first parameter is the socket file descriptor id, serv_sock, of the client process. The second parameter is the address of the socket address structure of the second (server) process, its name. Its type is cast - i.e. changed - to a pointer to a struct using (struct sockaddr*). Finally, the size occupied by the socket address structure describing the second socket is passed as the third parameter.

listen()

listen() - Definition

From IBM:

#include <sys/types.h>
#include <sys/socket.h>
int listen(int socket, int backlog);

The first parameter is this socket's file descriptor id, sock. The second is the number of client connections that are allowed wait before connection requests are rejected. listen() indicates that sock is where connection requests will be accepted.

listen() - Example

From our client example,

listen(serv_sock, 20);

So serv_sock is the socket to direct connection requests to and there are to be no more than 20 connection request waiting for serv_sock.

read()

read() - Definition

From IBM:

#include <unistd.h>
ssize_t read(int socket, void *buf, ssize_t N);

Where the first parameter is the file descriptor id of the target socket, buf is the memory address of the buffer to write the received information into and N is the length in bytes of the buffer that buf points to. read() returns a value of type size in bytes, signed, i.e. +/-.

As a note:

  • After each socket is created, two buffers are allocated, an input buffer and an output buffer.
  • write() does not immediately transmit data to the network, but first writes the data into the buffer, andthen the TCP protocol sends the data from the buffer to the target machine.
  • The TCP protocol is independent of the write() function.
  • The default size of the input and output buffers is generally 8 kilobytes.

read() - Example

From our client example,

read(sock, buffer, sizeof(buffer)-1);

Here, the first parameter is the socket file descriptor id, sock, of the client process. The second parameter is the address of the start of the buffer array - remember an array name is the start address of the array. Finally, the size occupied by the buffer is passed as the third parameter. One character in buffer is reserved for the termination character, meaning that the available size of buffer is sizeof(buffer) - 1. Character arrays must be null terminated, i.e. the final character must be '\0'.

write()

Similar to read(). As a note:

  • If the free space in the write() buffer is less than the data to be sent, then write() will be blocked until the data in the buffer is sent to the target machine.
  • If TCP is sending data to the network, the output buffer will be locked, and writing is not allowed.
  • If the data to be written is greater than the maximum length of the buffer, then it will be written in batches.
  • The write() will not return until all data is written into the buffer.

htonl() and htons()

TCP/IP uses network byte ordering. A 16-bit integer (short) or a 32-bit integer (long) is sent from a host (a server is a kind of host) using host to network (HTON) with htons() or htonl(). Similarly, integers are received from the network to the host using network to host (NTOH) with ntohs() or ntohl().

Byte ordering:

  • Network order is big-endian.
  • Host order can be big- or little-endian, i.e. x86 is little-endian while SPARC is big-endian.

Conversion:

  • htons(), htonl(): host to network short/long. Short is 16 bit, long is 32 bit.
  • ntohs(), ntohl(): network to host short/long. Short is 16 bit, long is 32 bit.

What is converted?

  • Addresses.
  • Ports.

Byte Ordering

DenaryBinaryHexadecimal
000000
100011
200102
300113
401004
501015
601106
701117
810008
910019
101010A
111011B
121100C
131101D
141110E
151111F

Big- and Little-Endian

By R. S. Shaw - Own work, Public Domain (Wikipedia).

A computer uses the same endianness to store and find the integer value so the output is the same for a another machine using the other endianness.

However, problems can happen when memory is addressed using bytes instead of integers, or when memory contents are transmitted between computers with different endianness.