C, Additional Exercises

CA644, System Software

Dr. Niall McMahon

2022-11-22

If you print these slides, think about using two pages per sheet although don't worry too much about it!

Credits

Dr. Niall McMahon
Drawing on previous work by:
Dr. Michael Scriney
Dr. Long Cheng
With help from Mark Humphrys. And sources credited in the references.
Autumn 2022.

There are various ways of writing C programs and when faced with a task, you should choose the simplest possible approach consistent with sensible efficiency. Make the program work in the most straightforward way and relax. Later you may discover that the program needs to be more efficient and then you can consider ways of doing that.

Working With Chars

getchar Returns an int

The following is a typical C program which copies its input to its output:

#include <stdio.h>

int main()
{
   int c;
   c = getchar();
   while(c != EOF)
   {
      putchar(c);
      c = getchar();
   }

   return 0;
}

c is a character and the program reads characters from the input and and writes out whichever character was input. The program loops as long as there is more input, i.e. as long as the EndOfFile character is not returned. You might wonder why c is declared to be an int. Well, because the getchar() function returns an int by definition. This, in turn may cause you to wonder why getchar() returns an int.

Exercise: why does getchar() return an int?

Concise Code

In fact, C programmers are notoriously concise and frequently do the following:

#include <stdio.h>

int main()
{
   int c;
   while((c = getchar()) != EOF)
   {
      putchar(c);
   }

   return 0;
}

In this case, c is assigned to the return value of getchar before being tested against EOF. This makes the code shorter. When you are used to this concept, then it is just as easy to read. You should be familiar with this convention because it is used a lot in system software.

Braces are not required when a single instruction is controlled by the while loop. Many instructors say it is bad practice to leave them out, but again, system programmers often leave out the braces to give:

#include <stdio.h>

int main()
{
   int c;
   while((c = getchar()) != EOF)
      putchar(c);

   return 0;
}

getchar Reads a Byte

Take the following C program and call it 1.c:

#include <stdio.h>

int main()
{
   int c;
   while((c = getchar()) != EOF)
      putchar(c);

   return 0;
}

Compile it using gcc:

$ gcc 1.c -o 1

And you can run it using:

$ ./1

The program runs and the terminal sits waiting for you to type. As you type, it echos back what you type. You can also pipe the output into the hex editor to see what is going on:

$ ./1 | xxd

Although the getchar function looks as if it should read a character, it actually reads a byte. Try entering a special character. Or else, get the program to print out the int version of the character

As an exercise, how would you do that?

So, let's take the following as our starting program:

#include <stdio.h>

int main()
{
   int c;
   while((c = getchar()) != EOF)
      putchar(c);

   return 0;
}

Adapt it so that it adds one to the character that it outputs:

#include <stdio.h>

int main()
{
   int c;
   while((c = getchar()) != EOF)
      putchar(c + 1);

   return 0;
}

Now, let's modify the code a bit. Let's read two characters at a time and make the while loop go forever:

#include <stdio.h>
#include <stdbool.h>

int main()
{
   int c;
   while(true)
   {
      int c1 = getchar();
      if(c == EOF)
         break;
      int c2 = getchar();
      if(c2 == EOF)
         break;
      putchar(c1);
      putchar(c2);
   }

   return 0;
}

This is a pretty terrible program. It does almost the same thing as the original program but uses more variables and function calls.

Under certain situations it produces a different output to the original program. What circumstances are those?

Big-endian to Little-endian

A slight modification to the above program can make it turn 16-bit big-endian Unicode into 16-bit little-endian Unicode.

Once you have written the 16 bit big-endian to little-endian converter, then you can run it as follows:

Check that it works:

$ ./2

Type some characters, e.g. the digits one to nine. You should notice that they are alternated.

Try some Unicode. First, see what UTF-8 Unicode looks like:

$ printf "áıôÇ" | xxd

Now, run the program on it:

$ printf "áıôÇ" | ./2

The output may not make much sense.

Examine the hex values:

$ printf "áıôÇ" | ./2 | xxd

This should be the reverse of the original output.

Finally, what happens if you run the program twice in succession on the input?

Pointers

Using Pointers

A pointer variable points to memory. When you create a pointer variable, you have to specify the type:

#include <stdio.h>

int main()
{
   int * int_ptr = 0; // this is not sensible.

   // Set that location to zero
   *int_ptr = 0;

   return 0;
}

You can compile and run this program and on my computer it generates a serious error, i.e. a segmentation fault or a core dump.

Although a C program allows you to write to various memory locations, the operating system will stop you. The sensible way is to make the pointer point to an existing object:

#include <stdio.h>

int main()
{
   int num = 5;
   int *int_ptr = &num; // this is not naughty. Use the addressof operator (&) to get the address of a variable

   printf("num has the value %d\n", *int_ptr);

   return 0;
}

This time, the pointer pointer points to an existing variable which exists in memory. Now, you can access the num variable using the pointer with the *, i.e. the dereference operator. This time you are also modifying memory, but you have been given that memory by the operating system and so you are allowed to do that - the C program allocates the memory to the stack area where local variables are stored.

You can also modify the variable pointed to using the pointer:

#include <stdio.h>

int main()
{
   int num = 5;
   int * int_ptr = &num; // this is sensible.

   *int_ptr = 7;
   printf("num has the value %d\n", num);

   return 0;
}

Now, there are two ways to access the variable, you can use *int_ptr or num. This feature is also called aliasing.

Adding and Subtracting From Pointer Variables

A pointer is a variable and often you just make it point to an existing variable. However, you can also add or subtract integer values from a pointer variable. This might seem bizarre and, of course it is bizarre except where variables are grouped together in memory such as with an array or a string.

Here is an example of a program that demonstrates pointer arithmetic:

#include <stdio.h>

int main()
{
   int * int_ptr = 0; // This is OK if we don't try and write to the memory contents
   char * char_ptr = 0; // a pointer to a character 

   printf("int pointer + 1 is %d, char ptr + 1 is %d\n", (int) (int_ptr+1), (int) (char_ptr+1));

   return 0;
}

Here we add one to int_ptr (int_ptr+1) and then convert it to an integer so that we can print its value using a cast, i.e. (int)(int_ptr+1). We do the same with the char_ptr. This will generate scary looking compiler warning messages, but should not generate any error messages. Hackers ignore warnings and so that's what we will do.

Run the program and notice the difference between the int_ptr and the char_ptr. This is very important. How long would you say that an int is on this computer?

Change the int_ptr so that it actually points to a variable of size long. What does it now print?

You could remove the casts and also the warnings if you use %p in the printf statement to print a pointer. Then the code would be:

#include <stdio.h>

int main()
{
   long * int_ptr = 0;
   char * char_ptr = 0;

   printf("int pointer + 1 is %p, char ptr + 1 is %p\n", int_ptr+1, char_ptr+1);

   return 0;
}

You can create and use an array of ints as in the following code:

#include <stdio.h>

int main()
{
   int a[] = {10, 20, 30, 40};
   int * ptr = &a[0];

   for(int i = 0; i < 4; i++)
      printf("%d\n", *ptr++);

   return 0;
}

You can make the same code work with a pointer as follows:

#include <stdio.h>

int main()
{
   int a[] = {10, 20, 30, 40};
   int * ptr = &a[0];

   for(ptr = &a[0]; ptr < &a[4]; ptr++)
      printf("%d\n", *ptr);

   return 0;
}

Note that the pointer is set to point to the first element of the array and then keeps being incremented until it reaches &a[4], i.e. past the end of the array, and it stops.

The contents of the for loop could be changed to:

#include <stdio.h>

int main()
{
   int a[] = {10, 20, 30, 40};
   int * ptr = &a[0];

   for(int i = 0; i < 4; i++)
      printf("%d\n", *ptr++);

   return 0;
}

The *ptr++ will retrieve the value pointed to by ptr and then increment it.

You could also rewrite the code to work backward through the array with:

#include <stdio.h>

int main()
{
   int a[] = {10, 20, 30, 40};
   int * ptr;

   for(ptr = &a[3]; ptr >= &a[0]; ptr--)
      printf("%d\n", *ptr);

   return 0;
}

Be sure to understand how these examples work as the concepts are used frequently in system software.

Pointers Are Like Arrays

Finally, you should note that pointers can be treated in a very similar manner to arrays. In particular, int *ptr = &a[0]; is equivalent to int * ptr = a;. In other words, the array type is really the address of the first location of the array. Secondly, assuming ptr = a and you want to access the second element of the array, then you could use *(ptr+1), that is add 1 to the pointer and get the contents of that address, but you could also treat ptr as an array and simply use ptr[1].

In C, a string is an array of characters terminated by a zero. You could print out every character in a string using:

#include <stdio.h>

int main()
{
   char str[] = "hello";

   int i = 0;
   while(str[i] != 0)
   {
      printf("%c\n", str[i]);
      i++;
   }

   return 0;
}

You can do this more concisely with a pointer variable:

#include <stdio.h>

int main()
{
   char *str = "hello";

   while(*str != 0)
   {
      printf("%c\n", *str++);
   }

   return 0;
}

Pointer variables are much used when dealing with strings. For example, you could calculate the length of a string using:

#include <stdio.h>

int length(char * str)
{
   int i = 0;
   while(str[i] != 0)
      i++;

   return i;
}

int main()
{
   char *str = "hello";

   printf("%d\n", length(str));

   return 0;
}

In practice the i variable is not necessary if you use pointer arithmetic the the length would be calculated as follows:

#include <stdio.h>

int length(char * str)
{
   char * ptr = str;
   while(*ptr++ != 0)
      ;

   return ptr - str;
}

int main()
{
   char *str = "hello";

   printf("%d\n", length(str));

   return 0;
}

Study the code and figure out how it works.