The Single UNIX Specification, Version 2 includes in its System Interfaces Specification (XSH) the ISO/IEC 9899:1990/Amendment 1:1995 (E) to ISO/IEC 9899:1990, Programming Languages - C (ISO C). This paper is a brief introduction to this extension. It is assumed that the reader is familiar with the C language, and has some basic understanding of internationalization concepts and character encoding methods.
ISO C Amendment 1 (MSE) was part of the first amendment made to the ISO C standard. The MSE consists of a set of library functions that provide a relatively complete and consistent set of functions for application programming using multibyte and wide-characters.
The other major items included in this amendment are digraphs, alternate spellings for several C tokens, and the header <iso646.h>. These items are not discussed here since they are outside the scope of this paper.
The ISO C standard laid some groundwork for multibyte and wide-character programming by providing a small number of multibyte and wide character functions. The working group decided to wait for the C developer community to acquire more experience with implementing multibyte and wide-character libraries before extending this model further.
A working group (ISO/JTC1/SC22/WG14) was set up to study the various existing implementations and developed the Multibyte Support Extension as part of the first amendment (called C Integrity) to the ISO C standard.
The System Interfaces Specification, XSH, Issue 4, Version 2, which was developed in 1994, incorporated a draft version of the MSE. XSH, Issue 5 incorporates the final version of the MSE. There were a small number of differences between the draft and final versions of the MSE and these are detailed later in this paper.
We traditionally think of characters as one byte entities represented by the char data type. This is simple, but allows for a maximum of 256 distinct characters.
In the MSE model, the concept of a character has been extended. Extended characters can be represented in three ways:
A multibyte character is a sequence of one or more bytes that can be represented as an array of type char; in other words a single character may occupy one or more consecutive bytes. An example of such an encoding is EUC (Extended UNIX Code). EUC provides a structure by which any number of codesets may be encoded into a multibyte encoding.
The primary advantage to the one byte/one character model is that it is very easy to process data in fixed-width chunks. For this reason, the concept of the wide character was invented. A wide character is an abstract data type large enough to contain the largest character that is supported on a particular platform. To date, most system implementors have chosen 32 bits, although there are implementations with 16 bit and 8 bit wide characters. It should be noted that although many vendors have chosen a 32-bit wide character, because the wide character is an abstract type it is not guaranteed to be the same across all platforms.
To support the concept of wide-characters, the MSE defines the integral type wchar_t. However, it does not define the size of wchar_t, but states it shall be as wide as necessary to hold the largest character in the code sets of the locales that an implementation supports.
In addition to the traditional concept of the multibyte character, the MSE has added the concept of the generalized multibyte character - see below.
There are many different multibyte encoding schemes, but these can be broken down into three basic categories:
Restartable multibyte encodings are defined such that if you were to process a multibyte data stream it would be possible to determine the correct separation of characters no matter where you were positioned in the data stream. In the case of stateful encodings, you need one extra piece of information to be able to correctly process characters in the data stream. This extra piece of information is commonly referred to as the state of the data stream.
Why must we be able to unambiguously restart a data stream?
If any byte sequence can have more than one meaning as a sequence
of characters, then the multibyte code is ambiguous; that is, you
could have multiple meanings for the same data stream depending
upon where you started in the data stream.
For example, the following multibyte encoding is not restartable:
0x41 0x42 0x61 0x62 0x43
In this particular encoding, the combination of 0x61 and 0x62 produces
an F.
If we start processing this string at the beginning, all the
characters would be processed correctly and the result would be the
string:
If we start processing the string at 0x62, then the result would be
the partial string:
A B F C
In a restartable encoding, the conversion interfaces would have
recognized the 0x62 as an illegal multibyte character, and
our program could choose to ignore that illegal character and move
on, or perhaps it might try to back up and see if it could form
a complete multibyte character.
b C
In restartable multibyte encodings, each byte sequence in a particular encoding scheme stands for one character; the same character regardless of context. Stateful multibyte encoding schemes have a concept of shift state; certain codes called shift sequences effectively change the data stream to a different shift state, and the meaning of byte sequences is changed according to the current shift state.
If we use the same multibyte encoding and make it a stateful encoding,
we will introduce two new operators called shift state operators,
SS0 and SS1.
The default shift state for this particular codeset is SS0.
In this example, the 0x61 in its shifted state produces an F,
and in its default state produces an a:
Since the default shift state is SS0, the above sequence of bytes
should produce the string:
0x41 0x42 SS1 0x61 SS0 0x43 0x61
The stateful multibyte encodings are not restartable either, because
if we started processing the string after a shift state operator,
we could potentially get the wrong string.
A B F C a
Normally, if you try to pass a string containing multibyte characters to a function that does not know about them, such a function treats a string as a sequence of bytes, and interprets certain byte values specially; for example, the null byte, the slash character. Since it is illegal for a multibyte character to use any of the special byte values as part of its encoding, the function should pass it through as if it were a single byte string. [Note:The multibyte encoding may still use the slash or null byte, it just cannot use them as part of another multibyte character. ]
This is where the concept of the generalized multibyte encoding arises. Traditionally, we think of multibyte encodings as file code and wide characters as process code, where file code resides on disk and process code is used by an application. This is not to say that multibyte encodings are not used by applications. Indeed many applications today use multibyte encodings routinely, but because they do not require the ability to process characters as discreet chunks they have no need to convert the multibyte encodings to wide characters.
In summary, generalized multibyte encodings can be encoded in any way. The special byte values discussed above have no meaning in generalized multibyte encodings. Functions that have no concept of multibyte encodings would fail if they tried to process generalized multibyte encodings. By defining the concept of generalized multibyte encodings, we provide a method by which we can say a particular file is associated with a particular locale, and can only be processed by specific routines running in this locale. Generalized multibyte encodings are more of a logical grouping than a specific definition. They provide us with a way to associate files with specific locales and codesets, and allow us to safely operate on those files as long as we are in the proper locale. The important restriction is that generalized multibyte characters can never be processed directly, they can exist only on disk. [Note:Processed refers to the parsing routines available in C. Any file may be processed as binary data. ]
To take an example of a generalized multibyte encoding,
Unicode is a 16-bit codeset that can be found on Windows 95
and Windows NT.
One of the problems with Unicode is that it has NULL bytes embedded
in its encoding.
For example, the string:
is actually encoded as follows:
a b c
Those who are familiar with any of the string handling routines in C,
can see that these routines will have problems with this string.
Similarly, if you tried to read this file from a disk as a text file
you would have problems.
However, with the concept of generalized multibyte encodings we can say this
file is associated with a Unicode locale and the
stdio
routines can be smart enough to know that when they are in the Unicode
locale they can read the Unicode file properly.
0x00 0x61 0x00 0x62 0x00 0x63 0x00 0x00
The MSE defines two headers to support the new functionality:
Contains the declarations for the functions analogous to those in <ctype.h>; that is, the classification and mapping functions.
Contains the remaining declarations.
The header <wchar.h> declares the following types:
An integer type whose range is large enough to represent all distinct values in any extended character set in the supported locales. Known as the wide character type.
Stores the current parse state of a stream.
An integer type that can hold any wide-character and WEOF.
The following macros are declared:
Maximum value representable by an object of type wchar_t.
Minimum value representable by an object of type wchar_t.
Wide-character end-of-file.
and the following error macro was added to the header <error.h>:
A invalid wide-character encoding, or a sequence of bytes which do not form a valid multibyte character, was encountered.
Two standard macros can be used to find out the maximum possible number of bytes in a character:
Returns the maximum length of a multibyte character for any supported locale as a positive integer. It is defined in <limits.h>.
Returns the maximum number of bytes in a multibyte character in the current locale as a positive integer. The value is never greater than MB_LEN_MAX. It is defined in <stdlib.h>.
Character classification determines whether a particular character code refers to an upper-case alphabetic, lower-case alphabetic, alphanumeric, digit, punctuation, control or space character, or any one of a number of other groupings.
In the past macros were often used to classify character codes. This was possible since the assumption was that an application was dealing with ASCII characters. Today, classification functions are used which classify wide-character codes according to the type rules defined by the category LC_CTYPE of the application's current locale.
In the ISO C standard the behavior of character classification functions is affected by the current locale. Some functions have implementation-dependent behavior when not in the POSIX locale. For example, in the POSIX locale, isupper() returns true (non-zero) only for upper-case letters. The MSE contains no description of how the POSIX locale affects the behavior of the above functions, but states that when a character c causes an isxxx(c) function to return true, the corresponding wide-character wc shall cause the corresponding wide-character function to return true. Note, however, that the converse is not true.
The ISO C standard defines 11 classification (also known as character testing) functions. The MSE defines an analogous set of wide-character classification functions, returning non-zero for true and zero for false, for example iswalnum() is analogous to isalnum().
As the number of defined locales increased, the requirement for additional character classes increased. For example, while a classification function such as isupper() makes perfect sense in the English language, it does not make any sense in a language such as Japanese that has no concept of case. Conversely, a function such as iskana() makes perfect sense for Japanese, but doesn't make any sense in English. For this reason, the MSE defined two extensible wide-character classification functions - wctype() and iswctype() - as general-purpose solutions to this problem.
Name | Purpose | Syntax |
---|---|---|
wctype() | Construct a value with type wctype_t that describes a class of wide characters identified by property | wctype_t wctype(const char *property); |
iswctype() | Determine whether a wide-character has the property identified by | int iswctype(wint_t wc, wctype_t desc); |
These two functions are generally used in combination. However, sometimes the wctype() function is used on its own by an application to test whether a character classification is available in a specific locale. If the current setting of the LC_CTYPE locale changes between calls, the behavior is undefined.
The MSE specifies that the following code segments are equivalent to
each other:
Mapping functions are sometimes called case conversion functions, because the original mapping functions simply mapped upper-case to lower-case and vice versa.
In the past, case conversion was often handled by means of macros. This was possible since the assumption was that an application was dealing with ASCII characters. Mapping functions are used to provide case conversion according to shift tables defined in the LC_CTYPE category of the application's current locale.
The following wide-character mapping functions are provided:
MSE | ISO C | Purpose |
---|---|---|
towlower() | tolower() | Convert an upper-case letter to its corresponding lower-case letter if iswupper() is true and there is a corresponding wide character for which iswlower() is true. |
towupper() | toupper() | Convert a lower-case letter to its corresponding upper-case letter if iswlower() is true and there is a corresponding wide character for which iswupper() is true. |
As the number of defined locales increased, the requirement for additional characters increased. For example, while a function such as toupper() makes perfect sense in the English language, it doesn't make any sense in a language such as Japanese which has no concept of case. Conversely, the tokana() function makes no sense in the English language.
For this reason, the MSE defined two extensible wide-character classification functions - wctrans() and towctrans() - as general-purpose solutions to this problem. The name of the required character conversion is passed as an argument to the wctrans() function to avoid name space pollution.
Name | Purpose | Syntax |
---|---|---|
wctrans() | Construct a value with type wctrans_t that describes the mapping between wide characters identified by property | wctrans_t wctrans(const char *property); |
towctrans() | Map the wide character per specified mapping. | wint_t towctrans(wint_t wc, wctrans_t desc); |
In addition, the MSE specifies that the following code segments are
equivalent to each other:
The wctype() function also enables an application to test whether a character classification is available in a specific locale.
Three new functions are included to facilitate conversion from wide-character strings (also known as wide strings) to a variety of numeric formats.
MSE | ISO C | Purpose | Syntax |
---|---|---|---|
wcstod() | strtod() | Convert the initial portion of a wide string to a | double wcstod(const wchar_t *n, wchar_t **end); |
wcstol() | strtol() | Convert the initial portion of a wide string to a | long wcstol(const wchar_t *n, wchar_t **end, int base); |
wcstoul() | strtoul() | Convert the initial portion of a wide string to an | unsigned long wcstoul(const wchar_t *n, wchar_t **end, int base); |
These functions work as follows:
In other than the POSIX locale, implementation-dependent forms of a subject sequence may be supported.
The function wcstod() has a dependency on the value of the RADIXCHAR item in the applications current locale. In locales where the radix character is not defined, it defaults to a period.
Sixteen new wide-character string functions are defined. Most are similar to their char-based counterparts. For example, wcscopy() is analagous to strcpy(), but operates on wide strings. In general, the data types of some parameters differ, but the purpose of the parameters is the same.
The comparison functions wcscmp() and wcsncmp() compare two wide-character strings by comparing the wide characters based on the character's encoded value, while the wcscoll() function compares each wide character interpreted according to the collating sequence information specified by the LC_COLLATE category of the current locale.
The wcsxfrm() function transforms a wide-character string and places the result in an array of wide characters. The transformation is such that if the wcscmp() function is applied to two transformed wide-character strings, the result is the same as if the two wide-character strings were compared using wcscoll(). Both wide-character strings must be transformed using wcsxfrm(). It is invalid to compare a transformed string to a non-transformed string. Note that no function is defined to restore a transformed string to its original layout.
When wide-character strings are likely to be compared more than once, it is more efficient to transform them using wcsxfrm(), compare them using wcscmp(), and retain the transformed strings for subsequent comparisons.
The MSE also defines a number of wide-character array functions. These functions operate on arrays of type wchar_t whose size is specified by a separate count argument. These functions are not affected by locale and all wchar_t values are treated identically, including the null wide character and wide characters not corresponding to valid multibyte characters. Thus, the wmemcmp() function compares each wide-character array element using the encoded value of each wide-character.
MSE | ISO C | Purpose | Syntax |
---|---|---|---|
wmemchr() | memchr() | Locate first occurrence of wide character c in the initial n wide characters of s | wchar_t *wmemchr(const wchar_t *s, wchar_t c, size_t n); |
wmemcmp() | memcmp() | Compare first n wide characters of s1 and s2 | int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n); |
wmemcpy() | memcpy() | Copy first n wide characters from s2 to s1 | wchar_t *wmemcpy(wchar_t *s1, const wchar_t *s2, size_t n); |
wmemmove() | memmove() | Copy first n wide characters from s2 to s1 | wchar_t *wmemmove(wchar_t *s1, const wchar_t *s2, size_t n); |
wmemset() | memset() | Set first n wide characters of s to wide character c | wchar_t *wmemset(wchar_t *s, wchar_t c, size_t n); |
The wmemmove() function copies the specified number of wide characters from the object pointed to by s2 into s1. However, unlike wmemcpy(), the objects s1 and s2 can safely overlap. Copying occurs as if the required elements of the object s2 are first copied into a temporary area and then copied into the object pointed to by s1.
The MSE input/output model assumes that characters are handled as wide-characters within an application and stored as multibyte characters in files, and that all the wide-character input/output functions begin executing with the stream positioned at the boundary between two multibyte characters.
The definition of a stream was changed to include the concept of an orientation for both text and binary streams. After a stream is associated with a file, but before any operations are performed on the stream, the stream is without orientation. If a wide-character input or output function is applied to a stream without orientation, the stream becomes wide-oriented. Likewise, if a byte input or output operation is applied to a stream with orientation, the stream becomes byte-oriented. Thereafter, only the fwide() or freopen() functions can alter the orientation of a stream.
Byte input/output functions shall not be applied to a wide-oriented stream and wide-character input/output functions shall not be applied to a byte-oriented stream.
While wide-oriented streams are sequences of wide characters, the external file associated with a wide-oriented stream may be an implementation-dependent multibyte encoding. Furthermore, it is acceptable that the file associated with this stream is a generalized multibyte encoding such as Unicode.
The following function is specified to enable applications to determine and/or set the orientation of a stream:
Name | Purpose | Syntax |
---|---|---|
fwide() | Determine the orientation of a stream. | int fwide(FILE *stream, int mode); |
If mode is zero, stream orientation is not altered. If mode is >0, the function first attempts to make the stream wide-oriented. If mode <0, the function first attempts to make the stream byte-oriented.
Note that the input/output model does not preclude applications from storing date in external files as wide characters.
Wide-character input functions read multibyte characters from a stream and convert them to wide characters. An encoding error occurs if the byte sequence does not form a valid wide character in the current locale.
The following table lists the wide-character input functions specified in the MSE together with their equivalent char-based functions:
MSE | ISO C | Purpose | Syntax |
---|---|---|---|
getwc() | getc() | Get a wide character from a stream. | wint_t getwc(FILE *stream); |
getwchar() | getchar() | Get a wide character from stdin | wint_t getwchar(void); |
fgetwc() | fgetc() | Get a wide character from a stream. | wint_t fgetwc(FILE *stream); |
fgetws() | fgetc() | Get a wide-character string from a stream. | wchar_t *fgetws(wchar_t *s, int n, FILE *stream); |
fwscanf() | fscanf() | Get formatted input from a stream. | int fwscanf(FILE *stream, const wchar_t *format, ...); |
swscanf() | sscanf() | Get formatted input from a wide-character string. | int swscanf(const whar_t *s, const wchar_t *format, ...); |
wscanf() | scanf() | Get formatted input from stdin | int wscanf(const wchar_t *format, ....); |
ungetwc() | ungetc() | Push a wide character back on a stream. | wint_t ungetwc(wint_t c, FILE *stream); |
All of the above functions work in a similar manner to their corresponding char-based functions, except that format strings must be wide-character strings.
However, the following format specifiers accept an additional l (ell) qualifier:
Wide-character output functions convert wide characters to multibyte characters and write them to the stream. An encoding error occurs if the wide character does not correspond to a valid multibyte character in the current locale.
The following table lists the wide-character output functions specified in the MSE together with their equivalent char-based functions:
MSE | ISO C | Purpose | Syntax |
---|---|---|---|
putwc() | putc() | Write a wide character to a stream. | wint_t putwc(wchar_t c, FILE *stream); |
putwchar() | putchar() | Write a wide character to | wint_t putwchar(wchar_t c); |
fputwc() | fputc() | Write a wide character to a stream. | wint_t fputwc(wchar_t c, FILE *stream); |
fputws() | fputs() | Write a wide-character string to a stream. | int fputws(const wchar_t *s, FILE *stream); |
fwprintf() | wprintf() | Write to stdout a stream using a wide-character format specification. | int fwprintf(FILE *stream, const wchar_t *format, ...); |
wprintf() | printf() | Write to using a wide-character format specification. | int wprintf(const wchar_t *format, ...); |
swprintf() | sprintf() | Write to a wide-character array using a wide-character format specification. | int swprintf(wchar_t *s, size_t n, const wchar_t *format, ...); |
vfwprintf() | vfprintf() | Equivalent to fwprintf() except using va_list syntax. | int vfwprintf(FILE *stream, const wchar_t *format, va_list arg); |
vwprintf() | vprintf() | Equivalent to wprintf except using va_list syntax. | int vwprintf(const wchar_t *format, va_list arg); |
vswprintf() | vsprintf() | Equivalent to swprintf except using va_list syntax. | int vswprintf(wchar_t *s, size_t n, const wchar_t *format, va_list arg); |
All of the above functions work in a similar manner to their corresponding char-based functions, except that format strings must be wide-character strings.
The following format specifiers accept an additional l (ell) qualifier:
As discussed earlier, multibyte character streams may have state-dependent encodings. To handle state-dependent encodings, the MSE includes the concept of a conversion state that is associated with each FILE object that effects the behavior of a conversion between multibyte and a wide-character encoding.
The conversion state information augments the FILE object's information about the current position of the multibyte character stream with information about the parse state for the next multibyte character to be obtained from the stream. For state-dependent encodings, the remembered shift state is part of this parse state. Every wide-character input or output function makes use of this state information and updates its corresponding FILE object's conversion state accordingly.
The non-array type mbstate_t is defined to encode the conversion state under the rules of the current locale and provide a character accumulator. This implies that encoding rule information is part of the conversion state. No initialization function is provided to initialize mbstate_t. A zero-valued mbstate_t is assumed to describe the initial conversion state. Such a zero-valued mbstate_t object is said to be unbound. Once a multibyte or wide-character conversion function is called with the mbstate_t object as an argument, the object becomes bound and holds the conversion state information which it obtains from the LC_CTYPE category of the current locale. No comparison function is specified for comparing two mbstate_t objects.
The MSE assumes that only wide-character input/output functions can maintain consistency between a stream and its corresponding conversion state. Byte input/output functions do not manipulate or use conversion state information. Wide-character input/output functions are assumed to begin processing a stream at the boundary between two multibyte characters. Seek operations reset the conversion state corresponding to the new file position.
The function mbsinit() is specified because many conversion functions treat the initial shift state as a special case and need a portable means of determining whether an mbstate_t object is at initial conversion state.
Name | Purpose | Syntax |
---|---|---|
mbsinit() | Determine whether the referenced mbstate_t object describes an initial conversion state. | int mbsinit(const mbstate_t *ps); |
The MSE provides a method to distinguish between an invalid sequence of bytes in a multibyte stream and a valid prefix to a still incomplete multibyte character. Upon encountering such an incomplete multibyte sequence, the functions mbrlen() and mbrtowc() return -2 instead of -1, and the character accumulator in the mbstate_t object may store the partial character information. This allows applications to convert streams one byte at a time or even to suspend and resume conversion if required. The conversion functions are thus said to be restartable.
The MSE specifies the following single-byte wide-character conversion functions:
Name | Purpose | Syntax |
---|---|---|
btowc() | Determine whether valid multibyte character is in the initial shift state and return corresponding wide character. | wint_t btowc(int c); |
wctob() | Determine whether member of the extended character set whose multibyte character representation is a single byte when in the initial shift state, and return corresponding single-byte character. | int wctob(wint_t c); |
The function btowc() returns WEOF if the character has a value of EOF or if it is not a valid multibyte character in the initial shift state.
The function wctob() returns EOF if the character does not correspond to a valid multibyte character of length 1 in the initial shift state.
The MSE specifies the following restartable functions which take as their last argument a pointer to an object of type mbstate_t. If the pointer is NULL, each function uses its own internal mbstate_t object instead, which is initialized at startup to the initial conversion state. Note that, unlike their corresponding ISO C standard functions, a function's return value does not represent whether the encoding is state-dependent.
MSE | ISO C | Purpose | Syntax |
---|---|---|---|
mbrlen() | mblen() | Determine the length in bytes of a multibyte character. | size_t mbrlen(const char *mbs, size_t n, mbstate_t *ps); |
mbrtowc() | mbtowc() | Convert a multibyte character into a wide character. | size_t mbrtowc(wchar_t *pwc, const char *s, size_t n, mbstate_t *ps); |
wcrtomb() | wctomb() | Convert a wide character into a multibyte character. | size_t wcrtomb(char *s, whar_t wc, mbstate_t *ps); |
mbsrtowcs() | mbstowcs() | Convert a multibyte string into a wide-character string. | size_t mbsrtowcs(wchar_t *dst, const char **src, size_t len, mbstate_t *ps); |
wcsrtombs() | wcstombs() | Convert a wide-character string into a multibyte string. | size_t wcsrtombs(char *dst, const wchar_t **src, size_t len, mbstate_t *ps); |
A more detailed explanation of two of the above functions will help to clarify the concept of restartable functions.
The function mbrtowc() inspects at most n bytes to determine the number of bytes needed to complete the next multibyte character. If a multibyte character can be completed, mbrtowc() determines the corresponding wide character and returns it in *pwc. If the corresponding wide character is the null wide character, the conversion state is reset to the initial conversion state. This function returns one of the following:
The function mbsrtowcs() is a restartable string conversion routine which converts a sequence of multibyte characters, beginning with the conversion state described by the mbstate_t object pointed to by ps, from the array indirectly pointed to by src into a sequence of corresponding wide characters pointed to by dst. Conversion continues up to and including a terminating null character which is also stored in dst. Each conversion takes place as if by a call to the mbrtowc() function. If an error occurs, errno is set to the macro EILSEQ and mbsrtowcs() returns (size_t)-1.
Conversion stops when one of the following occurs:
MSE | ISO C | Purpose | Syntax |
---|---|---|---|
wcsftime() | strftime() | Convert a date and time to a wide-character string. | size_t wcsftime(wchar_t *wcs, size_t maxsize, const wchar_t *format, const struct tm *tptr); |
The wcsftime() function behaves as if the character string generated by the strftime() function is passed to the mbstowcs() function as the character string parameter, and the mbstowcs() function places the result in the wcs parameter of wcsftime(), up to the limit of the number of wide characters specified by maxsize.
This function uses the local timezone information. The format parameter is a wide-character string consisting of a sequence of wide-character format codes that specify the format of the date and time to be written to wcs.
When XSH, Issue 4, Version 2 was under development, the MSE had not yet been ratified as an amendment to the ISO C standard, but the working draft used (ISO Working Paper SC22/WG14/N204 dated 31st March 1992) was regarded as being stable.
Unfortunately, a number of interfaces changed slightly before the MSE became part of ISO/IEC 9899:1990/Amendment 1:1995 (E), and three functions, wcswcs(), wcswidth(), and wcwidth(), were dropped.
The differences between XSH, Issue 4, Version 2 and XSH, Issue 5 are detailed in the following table:
Name | Purpose | XSH, Issue 4, Version 2 | XSH, Issue 5 |
---|---|---|---|
wcswcs() | Find a wide-character substring in a wide-character string. | Included per draft MSE. | Included but marked EX. Application developers are strongly encouraged to use wcsstr() instead. |
wcswidth() | Number of column positions of a wide-character string. | Included per draft MSE. | Included as an extension. |
wcwidth() | Number of column positions of a wide-character code. | Included per draft MSE. | Included as an extension. |
fputwc() | Put a wide character code on a stream. | wint_t fputwc(wint_t wc, FILE *stream); | wint_t fputwc(wchar_t wc, FILE *stream); |
putwc() | Put a wide character on a stream. | wint_t putwc(wint_t wc, FILE *stream); | wint_t putwc(wchar_t wc, FILE *stream); |
putwchar() | Put a wide character on a stream. | wint_t putwchar(wint_t wc); | wint_t putwchar(wchar_t wc); |
wcsftime() | Convert date and time to wide-character string. | size_t wcsftime(wchar_t *wcs, size_t maxsize, const char *format, const struct tm *timptr); | size_t wcsftime(wchar_t *wcs, size_t maxsize, const wchar_t *format, const struct tm *timptr); |
wcstok() | Split wide-character string into tokens. | wchar_t *wcstok(wchar_t *ws1, const wchar_t *ws2); | wchar_t *wcstok(wchar_t *ws1, const wchar_t *ws2, wchar_t **ptr); |
More information on the Single UNIX Specification, Version 2 can be obtained from the following sources:
David Lindner is a Principal Engineer with Digital Equipment Corporation and a former member of The Open Group Internationalization Technical Working Group.
Finnbarr P. Murphy is a principal software engineer with Digital Equipment Corporation and is Vice-Chair of The Open Group Base Technical Working Group.
Read or download the complete Single UNIX Specification from http://www.UNIX-systems.org/go/unix.
Copyright © 1997-1998 The Open Group
UNIX is a registered trademark of The Open Group.