Thread-safety and POSIX.1

Thread-safe Versions of POSIX.1 and C-language Functions

POSIX.1 and C-language functions were written to work in an environment of single-threaded processes. Reentrancy was not an issue in their design: the possibility of a process attempting to "re-enter" a function through concurrent invocations was not considered, because threads - the enabler of concurrency within a process - were not anticipated.

So, as it turns out, some POSIX.1 and C-language functions are inherently non-reentrant with respect to threads; that is, their interface specifications preclude reentrancy.¹ For example, some functions (such as asctime()) return a pointer to a result stored in memory space allocated by the function on a per-process basis. Such a function is non-reentrant, because its result can be overwritten by successive invocations. Other POSIX.1 and C-language functions, while not inherently non-reentrant, may be implemented in ways that lead to non-reentrancy. For example, some functions (such as rand()) store state information (such as a seed value, which survives multiple function invocations) in memory space allocated by the function on a per-process basis. The implementation of such a function is non-reentrant if the implementation fails to synchronize invocations of the function and thus fails to protect the state information. The problem is that when the state information is not protected, concurrent invocations can interfere with one another (for example, see the same seed value).

Functions must be reentrant in an environment of multithreaded processes, in order to ensure that they can be safely invoked by concurrently executing threads. POSIX.1c takes three actions in the pursuit of reentrancy. First, POSIX.1c imposes reentrancy as a general rule: all functions, unless explicitly singled out as exceptions to the rule, must be implemented in a way that preserves reentrancy. Second, POSIX.1c redefines errno, as described below in . Third, for those functions whose interface specifications preclude reentrancy, POSIX.1c defines alternative "reentrant" versions as follows:

As previously noted, some functions are non-reentrant because they return results in per-process library-allocated structures that may be static and thus subject to overwriting by successive calls. These include:
- The POSIX.1 process environment functions getlogin(), ttyname() (see ISO/IEC 9945:1-1996, §4.2.4 and 4.7.2)
- The C-language functions asctime(), ctime(), gmtime() and localtime() (see ISO/IEC 9945:1-1996, §8.3.4-8.3.7)
- The POSIX.1 system database functions getgrgid(), getgrnam(), getpwuid() and getpwnam() (see ISO/IEC 9945:1-1996, §9.2.1 and 9.2.2).
POSIX.1c defines reentrant versions of these functions; the new functions have "_r" appended to the function names (that is, "asctime_r()", and so on). To achieve reentrancy, the new "_r" functions replace library-allocated structures with application-allocated structures that are passed as arguments to the functions at invocation.
Some functions can be reentrant or non-reentrant, depending on their arguments. These include the C-language function tmpnam() and the POSIX.1 process environment function ctermid(). These functions have pointers to character strings as arguments. If the pointers are not NULL, the functions store their results in the character string; however, if the pointers are NULL, the functions store their results in an area that may be static and thus subject to overwriting by successive calls.
To ensure reentrancy of these functions, POSIX.1c simply restricts their arguments to non-NULL (ISO/IEC 9945:1-1996, §4.7.1 and 8.2.5).
As previously noted, some functions are non-reentrant because they communicate across multiple function invocations by maintaining state information in static library-allocated storage, which is shared by all the threads of a process, possibly without the benefit of synchronization. These include the C-language function rand(), which is used to generate a process-wide pseudorandom number sequence. The function rand(), which is called with no arguments, returns the next pseudorandom number in a sequence determined by an initial seed value (set via the function srand()). As a side effect, the function rand() updates the seed value, enabling the sequence to progress. The seed value is held in a library-allocated static memory location. In a multithreaded process, two or more threads might concurrently invoke rand(), read the same seed value, and thus acquire the same pseudorandom number.
POSIX.1c defines a reentrant version, rand_r(), of this function (ISO/IEC 9945:1-1996, §8.3.3). To ensure reentrancy, the rand_r() function is required to synchronize (that is, serialize) calls to itself, so that a thread is forced to "finish" acquiring one pseudorandom number in a sequence before another thread can begin to acquire the next number in the sequence.
In addition to reentrancy, the rand_r() function offers applications flexibility in generating pseudorandom number sequences. It does so through the introduction of an argument: a pointer to an application-supplied memory location that is used to hold the seed value. As indicated above, an application can use rand_r() to generate a "reliable" process-wide pseudorandom number sequence (that is, a sequence without replicates). Alternatively, an application can use rand_r() to generate per-thread pseudorandom number sequences, by having each thread use a distinct seed as its rand_r() argument. In fact, an application can use rand_r() to generate an arbitrary number of uncorrelated sequences of pseudorandom numbers (each sequence governed by a distinct seed), which could prove to be useful in Monte Carlo simulations and other similar applications.
Other functions in this class include:
- The C-language function strtok() (see ISO/IEC 9945:1-1996, §8.3.3), which is used to find the next token in a string.
- The POSIX.1 file and directory function readdir(), which is used to read the next entry in a directory stream. Note that this function also suffers from the problem of returning its result in a library-allocated structure. Both deficiencies are resolved in the reentrant version readdir_r() (ISO/IEC 9945:1-1996, §5.1.2).
The POSIX.1 and C-language functions that operate on character streams (represented by pointers to objects of type FILE) are required by POSIX.1c to be implemented in such a way that reentrancy is achieved (see ISO/IEC 9945:1-1996, §8.2). This requirement has a drawback; it imposes substantial performance penalties because of the synchronization that must be built into the implementations of the functions for the sake of reentrancy. POSIX.1c addresses this tradeoff between reentrancy (safety) and performance by introducing high-performance, but non-reentrant (potentially unsafe), versions of the following C-language standard I/O functions: getc(), getchar(), putc() and putchar(). The non-reentrant versions are named getc_unlocked(), and so on, to stress their unsafeness.
To make it possible for multithreaded applications to use the non-reentrant versions of the standard I/O functions safely, POSIX.1c introduces the following character stream locking functions: flockfile(), ftrylockfile() and funlockfile(). An application thread can use these functions to ensure that a sequence of I/O operations on a given character stream is executed as a unit (without interference from other threads).²
As stated in the description of the character stream locking functions, all standard I/O functions that reference character streams shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of the character streams. Thus, when an application thread locks a character stream, the standard I/O functions cannot be used by other threads to operate on the character stream until the thread holding the lock releases it.

The specifications introduced by POSIX.1c for the purpose of ensuring reentrancy of POSIX.1 and C-language functions are mandatory for operating system implementations that support threads. They are optional for implementations that do not support threads. This is accomplished in the standard by associating the reentrancy specifications with a separate option, {_POSIX_THREAD_SAFE_FUNCTIONS}, which is declared to be mandatory for implementations supporting the threads option. Accordingly, this option is mandatory for conformance to the ISO/IEC 9945:1-1996.

Redefinition of errno

In POSIX.1, errno is defined as an external global variable. But this definition is unacceptable in a multithreaded environment, because its use can result in nondeterministic results. The problem is that two or more threads can encounter errors, all causing the same errno to be set. Under these circumstances, a thread might end up checking errno after it has already been updated by another thread.

To circumvent the resulting nondeterminism, POSIX.1c redefines errno as a service that can access the per-thread error number as follows (ISO/IEC 9945:1-1996, §2.4):

: Some functions may provide the error number in a variable accessed through the symbol errno. The symbol errno is defined by including the header <errno.h>, as specified by the C Standard ... For each thread of a process, the value of errno shall not be affected by function calls or assignments to errno by other threads.

In addition, all POSIX.1c functions avoid using errno and, instead, return the error number directly as the function return value, with a return value of zero indicating that no error was detected. This strategy is, in fact, being followed on a POSIX-wide basis for all new functions.

Footnotes

1.: In POSIX.1c, a "reentrant function" is defined as a "function whose effect, when called by two or more threads, is guaranteed to be as if the threads each executed the function one after another in an undefined order, even if the actual execution is interleaved" (ISO/IEC 9945:1-1996, §2.2.2).
2.: It should be noted that the flockfile() function, like the pthread_mutex_lock() function, can lead to priority inversion. The application developer should take this into account when designing an application and analyzing its performance.

This text is extracted from Chapter 10 of the Authorized Guide to the Single UNIX Specification Version 2.

UNIX is a registered trademark of The Open Group.