Notes Home

Dr. Niall McMahon
Lecture Notes
Table of Contents

C, Part 3 - Further Notes

CA644, System Software

Dr. Niall McMahon

2022-11-01

If you print these slides, think about using two pages per sheet although don't worry too much about it!

Credits

Dr. Niall McMahon
Drawing on previous work by:
Dr. Michael Scriney
Dr. Long Cheng
With help from Mark Humphrys. And sources credited in the references.
Autumn 2022.

More About Compiling

In these notes, I draw on material in Structured Computer Organisation by Andrew S. Tanenbaum as well as C++, How to Program by Deitel and Deitel, among other sources.

Compiling C

Machine Language

  • Defined by the hardware design of a machine.
  • Strings of numbers, 1s/0s.
  • Machine dependent.
  • Cumbersome.

Assembly Language

  • English-like abbreviations.
  • Translator programs, assemblers, convert programs to machine language.
  • Assembly is very close to the instruction set architecture.

Interpreters

  • Directly execute high-level language programs instruction by instruction - on the fly - without conversion to machine code.
  • Useful during development.
  • Compiled versions run more efficiently.

Back to Machine Levels

L5 - High-Level Languages

  • Provides enough abstraction to concentrate on solving a problem without worrying about the detail of implementing it on a computer.

L4 - Assembly

  • Can be written by anybody on top of the ISA instruction set.
  • English-like syntax.
  • Very close to the set ISA instruction set, often with 1:1 mapping.
  • There are, however, abstract instructions that map to several ISA instructions.

L3 - OS Level

  • The OS level is assembled from L4.
  • Some instructions directly carried out at L2.
  • Others interpreted by the OS into L2 instructions. These are system calls.

L2 - ISA Level

  • Defined by the manufacturer.
  • The ISA instructions are published and used for all interaction with the processor.
  • They contain all necessary information to write a program.
  • Especially important are the basic operations implemented in the processor.
  • Manufacturers provide a mnemonic (or English-like) version of the instructions along with hexadecimal to make it easier to understand programs written for the processor.
  • Lines and lines of binary are hard for people to understand!
  • The mnemonic version of the ISA instructions form the basis of any assembly language.
  • The assembler takes the assembly/mnemonics and creates the equivalent machine code.

L1 - Microarchitecture

  • Gates arranged into memory units, arithmetic logic unit.

L0 - Components and Basic Structures

  • Individual components put together to make gates and other structures.

Compilation Process

  • The program is made in a high-level language like C.
  • It's compiled into assembly.
  • And from here it's assembled into machine code via the operating system.
  • The operating system takes the program and puts it into the right place in memory by creating a process and running it.
  • Some of the instructions are carried out directly by the processor and others are interpreted by the OS.

C Code

  • Preprocess: gcc -E hello_world.c > hello_world.i. The -E flag explicitly runs the preprocessor.
    Preprocessing replaces includes.
  • Compile: gcc -S hello_world.i. The -S flag explicitly creates assembly code output, .s file.
  • Assemble: gcc -c hello_world.s. The -c flag explicitly creates an object file, .o file.
    This is machine code.
  • Link: gcc -o hello_world hello_world.c. The -o flag explicitly creates the executable by finding the missing functions, for example, printf() and scanf() from the standard C library, libc.
  • You can create the executable in one go using gcc -o hello_world hello_world.c.

The select Function

select is used when waiting for input or output.

These notes are drawn heavily from Section 13.9, Waiting for Input or Output in the The GNU C Library manual. Other useful references include IBM's select() reference.

Multiple Inputs

  • Programs might need to accept input on multiple input channels whenever input arrives.
  • Servers might beed respond to several other processes via pipes or sockets.
  • Responding immediately is best practice.

Read Is Not Suitable

  • You cannot normally use read for this purpose.
  • read blocks the program until input is available on one particular file descriptor; input on other channels won't interrupt it.
  • Using nonblocking mode and polling each file descriptor in turn is possible but very inefficient.
  • A better solution is to use select.
  • This blocks the program until input or output is ready on a specified set of file descriptors, or until a timer expires, whichever comes first.
  • This facility is declared in the header file sys/types.h.

Sockets

  • For server sockets, "input" is available when there are pending connections that could be accepted.
  • accept blocks and interacts with select in the same way that read does for normal input.
  • The file descriptor sets for the select function are specified as fd_set objects.
Here is the description of the data type and some macros for manipulating these objects.

fd_set Data Type

  • The fd_set data type is a bit array that represents file descriptor sets for the select function. (A bit array is a simple array of bits, i.e. 1 or 0, that represent some information about another set of objects arrayed in the same order.)

Macros

int FD_SETSIZE

  • The value of this macro is the maximum number of file descriptors that a fd_set object can hold information about.
  • On systems with a fixed maximum number, FD_SETSIZE is at least that number.
  • On some systems, including GNU, there is no absolute limit on the number of descriptors open, but this macro still has a constant value which controls the number of bits in an fd_set.
  • If you get a file descriptor with a value as high as FD_SETSIZE, you cannot put that descriptor into an fd_set.

void FD_ZERO (fd_set *set)

  • This macro initialises the file descriptor set set to be the empty set.

void FD_SET (int filedes, fd_set *set)

void FD_CLR (int filedes, fd_set *set)

  • This macro removes filedes from the file descriptor set set.
  • The filedes parameter must not have side effects since it is evaluated more than once.

int FD_ISSET (int filedes, const fd_set *set)

  • This macro returns a nonzero value (true) if filedes is a member of the file descriptor set set, and zero (false) otherwise.
  • The filedes parameter must not have side effects since it is evaluated more than once.

select()

int select (int nfds, fd_set *read-fds, fd_set *write-fds, fd_set *except-fds, struct timeval *timeout)

  • The select function blocks the calling process until there is activity on any of the specified sets of file descriptors, or until the timeout period has expired.
  • The file descriptors specified by the read-fds argument are checked to see if they are ready for reading; the write-fds file descriptors are checked to see if they are ready for writing; and the except-fds file descriptors are checked for exceptional conditions. You can pass a null pointer for any of these arguments if you are not interested in checking for that kind of condition.
  • A file descriptor is considered ready for reading if a read call will not block. This usually includes the read offset being at the end of the file or there is an error to report. A server socket is considered ready for reading if there is a pending connection which can be accepted with accept; see Accepting Connections. A client socket is ready for writing when its connection is fully established.
  • "Exceptional conditions" does not mean errors - errors are reported immediately when an erroneous system call is executed, and do not constitute a state of the descriptor. Rather, they include conditions such as the presence of an urgent message on a socket.
  • The select function checks only the first nfds file descriptors. The usual thing is to pass FD_SETSIZE as the value of this argument.
  • The timeout specifies the maximum time to wait. If you pass a null pointer for this argument, it means to block indefinitely until one of the file descriptors is ready. Otherwise, you should provide the time in struct timeval format. Specify zero as the time (a struct timeval containing all zeros) if you want to find out which descriptors are ready without waiting if none are ready.
  • The normal return value from select is the total number of ready file descriptors in all of the sets. Each of the argument sets is overwritten with information about the descriptors that are ready for the corresponding operation. Thus, to see if a particular descriptor desc has input, use FD_ISSET (desc, read-fds) after select returns.

If select returns because the timeout period expires, it returns a value of zero.

  • Any signal will cause select to return immediately. So if your program uses signals, you can't rely on select to keep waiting for the full time specified. If you want to be sure of waiting for a particular amount of time, you must check for EINTR and repeat the select with a newly calculated timeout based on the current time.

Errors

If an error occurs, select returns -1 and does not modify the argument file descriptor sets. The following errno error conditions are defined for this function:

  • EBADF: one of the file descriptor sets specified an invalid file descriptor.
  • EINTR: the operation was interrupted by a signal.
  • EINVAL: the timeout argument is invalid; one of the components is negative or too large.
Portability Note: The select function is a BSD Unix feature.