Writing Reentrant and Thread-Safe Code

AIX Version 4.3 General Programming Concepts: Writing and Debugging Programs

Writing Reentrant and Thread-Safe Code

In single-threaded processes there is only one flow of control. The code executed by these processes thus need not to be reentrant or thread-safe. In multi-threaded programs, the same functions and the same resources may be accessed concurrently by several flows of control. To protect resource integrity, code written for multi-threaded programs must be reentrant and thread-safe.

This section provides information for writing reentrant and thread-safe programs. It does not cover the topic of writing thread-efficient programs. Thread-efficient programs are efficiently parallelized programs. This can only be done during the design of the program. Existing single-threaded programs can be made thread-efficient, but this requires that they be completely redesigned and rewritten.

Read the following to learn more about writing reentrant and thread-safe code:

Understanding Reentrance and Thread-Safety

Making a Function Reentrant

Making a Function Thread-Safe

Reentrant and Thread-Safe Libraries

Understanding Reentrance and Thread-Safety

Reentrance and thread-safety are both related to the way functions handle resources. Reentrance and thread-safety are separate concepts: a function can be either reentrant, thread-safe, both, or neither.

Reentrance

A reentrant function does not hold static data over successive calls, nor does it return a pointer to static data. All data is provided by the caller of the function. A reentrant function must not call non-reentrant functions.

A non-reentrant function can often, but not always, be identified by its external interface and its usage. For example, the strtok subroutine is not reentrant, because it holds the string to be broken into tokens. The ctime subroutine is also not reentrant; it returns a pointer to static data that is overwritten by each call.

Thread-Safety

A thread-safe function protects shared resources from concurrent access by locks. Thread-safety concerns only the implementation of a function and does not affect its external interface.

In C, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is trivially thread-safe. For example, the following function is thread-safe:

/* thread-safe function */
int diff(int x, int y)
{
        int delta;
 
        delta = y - x;
        if (delta < 0)
                delta = -delta;
 
        return delta;
}

The use of global data is thread-unsafe. It should be maintained per thread or encapsulated, so that its access can be serialized. A thread may read an error code corresponding to an error caused by another thread. In AIX, each thread has its own errno value.

Making a Function Reentrant

In most cases, non-reentrant functions must be replaced by functions with a modified interface to be reentrant. Non-reentrant functions cannot be used by multiple threads. Furthermore, it may be impossible to make a non-reentrant function thread-safe.

Returning Data

Many non-reentrant functions return a pointer to static data. This can be avoided in two ways:

Returning dynamically allocated data. In this case, it will be the caller's responsibility to free the storage. The benefit is that the interface does not need to be modified. However, backward compatibility is not ensured; existing single-threaded programs using the modified functions without changes would not free the storage, leading to memory leaks.
Using caller-provided storage. This method is recommended, although the interface needs to be modified.

For example, a strtoupper function, converting a string to uppercase, could be implemented as in the following code fragment:

/* non-reentrant function */
char *strtoupper(char *string)
{
        static char buffer[MAX_STRING_SIZE];
        int index;
 
        for (index = 0; string[index]; index++)
                buffer[index] = toupper(string[index]);
        buffer[index] = 0
 
        return buffer;
}

This function is not reentrant (nor thread-safe). Using the first method to make the function reentrant, the function would be similar to the following code fragment:

/* reentrant function (a poor solution) */
char *strtoupper(char *string)
{
        char *buffer;
        int index;
 
        /* error-checking should be performed! */
        buffer = malloc(MAX_STRING_SIZE);
 
        for (index = 0; string[index]; index++)
                buffer[index] = toupper(string[index]);
        buffer[index] = 0
 
        return buffer;
}

A better solution consists of modifying the interface. The caller must provide the storage for both input and output strings, as in the following code fragment:

/* reentrant function (a better solution) */
char *strtoupper_r(char *in_str, char *out_str)
{
        int index;
 
        for (index = 0; in_str[index]; index++)
        out_str[index] = toupper(in_str[index]);
        out_str[index] = 0
 
        return out_str;
}

The non-reentrant standard C library subroutines were made reentrant using the second method. This is discussed below .

Keeping Data over Successive Calls

No data should be kept over successive calls, because different threads may successively call the function. If a function needs to maintain some data over successive calls, such as a working buffer or a pointer, this data should be provided by the caller.

Consider the following example. A function returns the successive lowercase characters of a string. The string is provided only on the first call, as with the strtok subroutine. The function returns 0 when it reaches the end of the string. The function could be implemented as in the following code fragment:

/* non-reentrant function */
char lowercase_c(char *string)
{
        static char *buffer;
        static int index;
        char c = 0;
 
        /* stores the string on first call */
        if (string != NULL) {
                buffer = string;
                index = 0;
        }
 
        /* searches a lowercase character */
        for (; c = buffer[index]; index++) {
                if (islower(c)) {
                        index++;
                        break;
                }
        }
        return c;
}

This function is not reentrant. To make it reentrant, the static data, the index variable, needs to be maintained by the caller. The reentrant version of the function could be implemented as in the following code fragment:

/* reentrant function */
char reentrant_lowercase_c(char *string, int *p_index)
{
        char c = 0;
 
        /* no initialization - the caller should have done it */
 
        /* searches a lowercase character */
        for (; c = string[*p_index]; (*p_index)++) {
                if (islower(c)) {
                        (*p_index)++;
                        break;
                  }
        }
        return c;
}

The interface of the function changed and so did its usage. The caller must provide the string on each call and must initialize the index to 0 before the first call, as in the following code fragment:

char *my_string;
char my_char;
int my_index;
...
my_index = 0;
while (my_char = reentrant_lowercase_c(my_string, &my_index)) {
        ...
}

Making a Function Thread-Safe

In multi-threaded programs, all functions called by multiple threads must be thread-safe. However, there is a workaround for using thread unsafe subroutines in multi-threaded programs. Note also that non-reentrant functions usually are thread-unsafe, but making them reentrant often makes them thread-safe, too.

Locking Shared Resources

Functions that use static data or any other shared resources, such as files or terminals, must serialize the access to these resources by locks in order to be thread-safe. For example, the following function is thread-unsafe:

/* thread-unsafe function */
int increment_counter()
{
        static int counter = 0;
 
        counter++;
        return counter;
}

To be thread-safe, the static variable counter needs to be protected by a static lock, as in the following (pseudo-code) example:

/* pseudo-code thread-safe function */
int increment_counter();
{
        static int counter = 0;
        static lock_type counter_lock = LOCK_INITIALIZER;
 
        lock(counter_lock);
        counter++;
        unlock(counter_lock);
        return counter;
}

In a multi-threaded application program using the threads library, mutexes should be used for serializing shared resources. Independent libraries may need to work outside the context of threads and, thus, use other kinds of locks.

A Workaround for Thread-Unsafe Functions

It is possible to use thread-unsafe functions called by multiple threads using a workaround. This may be useful, especially when using a thread-unsafe library in a multi-threaded program, for testing or while waiting for a thread-safe version of the library to be available. The workaround leads to some overhead, because it consists of serializing the entire function or even a group of functions.

Use a global lock for the library, and lock it each time you use the library (calling a library routine or using a library global variable), as in the following pseudo-code fragments:
```
/* this is pseudo-code! */
 
lock(library_lock);
library_call();
unlock(library_lock);
 
lock(library_lock);
x = library_var;
unlock(library_lock);
```
This solution can create performance bottlenecks because only one thread can access any part of the library at any given time. The solution is acceptable only if the library is seldom accessed, or as an initial, quickly implemented workaround.
Use a lock for each library component (routine or global variable) or group of components, as in the following pseudo-code fragments:
```
/* this is pseudo-code! */
 
lock(library_moduleA_lock);
library_moduleA_call();
unlock(library_moduleA_lock);
 
lock(library_moduleB_lock);
x = library_moduleB_var;
unlock(library_moduleB_lock);
```
This solution is somewhat more complicated to implement than the first one, but it can improve performance.

Because this workaround should only be used in application programs and not in libraries, mutexes can be used for locking the library.

Reentrant and Thread-Safe Libraries

Reentrant and thread-safe libraries are useful in a wide range of parallel (and asynchronous) programming environments, not just within threads. Thus it is a good programming practice to always use and write reentrant and thread-safe functions.

Using Libraries

Several libraries shipped with the AIX Base Operating System are thread-safe. In the current version of AIX, the following libraries are thread-safe:

Standard C library (libc.a)
Berkeley compatibility library (libbsd.a).

Some of the standard C subroutines are non-reentrant, such as the ctime and strtok subroutines. The reentrant version of the subroutines have the name of the original subroutine with a suffix _r (underscore r).

When writing multi-threaded programs, the reentrant versions of subroutines should be used instead of the original version. For example, the following code fragment:

token[0] = strtok(string, separators);
i = 0;
do {
        i++;
        token[i] = strtok(NULL, separators);
} while (token[i] != NULL);

should be replaced in a multi-threaded program by the following code fragment:

char *pointer;
...
token[0] = strtok_r(string, separators, &pointer);
i = 0;
do {
        i++;
        token[i] = strtok_r(NULL, separators, &pointer);
} while (token[i] != NULL);

Thread-unsafe libraries may be used by only one thread in a program. The uniqueness of the thread using the library must be ensured by the programmer; otherwise, the program will have unexpected behavior, or may even crash.

Converting Libraries

This information highlights the main steps in converting an existing library to a reentrant and thread-safe library. It applies only to C language libraries.

Identifying exported global variables. Those variables are usually defined in a header file with the export keyword.
Exported global variables should be encapsulated. The variable should be made private (defined with the static keyword in the library source code). Access (read and write) subroutines should be created.
Identifying static variables and other shared resources. Static variables are usually defined with the static keyword.
Locks should be associated with any shared resource. The granularity of the locking, thus choosing the number of locks, impacts the performance of the library. To initialize the locks, the one-time initialization facility may be used.
Identifying non-reentrant functions and making them reentrant. See Making a Function Reentrant .
Identifying thread-unsafe functions and making them thread-safe. See Making a Function Thread-Safe .

Related Information

Parallel Programming Overview

Thread Programming Concepts