National Language Support (NLS) provides a base for internationalization to allow data to be changed from one code set to another. You may need to convert text files or message catalogs. There are several standard converters for this purpose.
This section discusses the following aspects of conversion:
When a program sends data to another program residing on a remote host, the data can require conversion from the code set of the source machine to that of the receiver. For example, when communicating with an IBM VM system, the system converts its ISO8859-1 data to EBCDIC. Code sets define character and control function assignments to code points. These coded characters must be converted when a program receives data in one code set but displays it in another code set.
There are two interfaces for doing conversions:
The system provides a library of converters that is ready to use. You supply the name of the converter you want to use. The converter libraries are found in the following directories: /usr/lib/nls/loc/iconv/* and /usr/lib/nls/loc/iconvTable/*.
In addition to code set converters, the converter library also provides a set of network interchange converters. In a network environment, the code sets of the communications systems and the protocols of communication determine how the data should be converted.
Interchange converters are used to convert data sent from one system to another. Conversions done from one internal code set to another require code set converters. Whether data must be converted from a sender's code set to a receiver's code set, or 8-bit data must be converted into 7-bit data form, a uniform interface is required. The iconv subroutines provide this interface.
There are standard converters for use with the iconv command and subroutines. The following list describes the different types of converters.
The iconv facility consists of a set of functions that contain the data and logic to convert from one code set to another. The utility also includes the iconv command, which converts data. A single system can have several converters. The LOCPATH environment variable determines the converter that the iconv subroutines use.
Note: All setuid and setgid programs ignore the LOCPATH environment variable.
Any converter installed in the system can be used through the iconv command, which uses the iconv library. The iconv command acts as a filter for converting from one code set to another. For example, the following command filters data from PC Code (IBM-850) to ISO8859-1:
cat File | iconv -f IBM-850 -t ISO8859-1 | tftp -p - host /tmp/fo
The iconv command converts the encoding of characters read from either standard input or the specified file and then writes the results to standard output.
UCS-2 is a universal 16-bit encoding (see the code set overview in AIX Version 4.3 General Programming Concepts: Writing and Debugging Programs) that can be used as an interchange medium to provide conversion capability between virtually any code sets. The conversion can be accomplished using the Universal UCS Converter, which converts between any two code sets XXX and YYY as follows:
XXX <-> UCS-2 <-> YYY
The XXX and YYY conversions must be included in the supported List of UCS-2 Interchange Converters, and must be installed on the system.
The universal converter is installed as the file /usr/lib/nls/loc/iconv/Universal_UCS_Conv. A new conversion can be supported by creating a new link with the appropriate name in the /usr/lib/nls/loc/iconv directory. For example, to support new converters between IBM-850 and IBM-437, you can execute the following commands:
ln -s /usr/lib/nls/loc/iconv/Universal_UCS_Conv /usr/lib/nls/loc/iconv/IBM-850_IBM-437 ln -s /usr/lib/nls/loc/iconv/Universal_UCS_Conv /usr/lib/nls/loc/iconv/IBM-437_IBM-850
Attention: If a converter link is created for incompatible code sets (for example, ISO8859-1 and IBM-eucJP), and if the source data contains characters that don't exist in the target code set, significant data loss can result.
The conversion between multibyte and wide character code depends on the current locale setting. Do not exchange wide character codes between two processes, unless you have knowledge that each locale that might be used handles wide character codes in a consistent fashion. Most AIX locales use the Unicode character value as a wide character code, except locales based on the IBM-850 and IBM-eucTW codesets.