(libc.info.gz) Shift State
Info Catalog
(libc.info.gz) Non-reentrant String Conversion
(libc.info.gz) Non-reentrant Conversion
6.4.3 States in Non-reentrant Functions
---------------------------------------
In some multibyte character codes, the _meaning_ of any particular byte
sequence is not fixed; it depends on what other sequences have come
earlier in the same string. Typically there are just a few sequences
that can change the meaning of other sequences; these few are called
"shift sequences" and we say that they set the "shift state" for other
sequences that follow.
To illustrate shift state and shift sequences, suppose we decide that
the sequence '0200' (just one byte) enters Japanese mode, in which pairs
of bytes in the range from '0240' to '0377' are single characters, while
'0201' enters Latin-1 mode, in which single bytes in the range from
'0240' to '0377' are characters, and interpreted according to the ISO
Latin-1 character set. This is a multibyte code that has two
alternative shift states ("Japanese mode" and "Latin-1 mode"), and two
shift sequences that specify particular shift states.
When the multibyte character code in use has shift states, then
'mblen', 'mbtowc', and 'wctomb' must maintain and update the current
shift state as they scan the string. To make this work properly, you
must follow these rules:
* Before starting to scan a string, call the function with a null
pointer for the multibyte character address--for example, 'mblen
(NULL, 0)'. This initializes the shift state to its standard
initial value.
* Scan the string one character at a time, in order. Do not "back
up" and rescan characters already scanned, and do not intersperse
the processing of different strings.
Here is an example of using 'mblen' following these rules:
void
scan_string (char *s)
{
int length = strlen (s);
/* Initialize shift state. */
mblen (NULL, 0);
while (1)
{
int thischar = mblen (s, length);
/* Deal with end of string and invalid characters. */
if (thischar == 0)
break;
if (thischar == -1)
{
error ("invalid multibyte character");
break;
}
/* Advance past this character. */
s += thischar;
length -= thischar;
}
}
The functions 'mblen', 'mbtowc' and 'wctomb' are not reentrant when
using a multibyte code that uses a shift state. However, no other
library functions call these functions, so you don't have to worry that
the shift state will be changed mysteriously.
Info Catalog
(libc.info.gz) Non-reentrant String Conversion
(libc.info.gz) Non-reentrant Conversion
automatically generated by
info2html