1
Internationalization Overview
Internationalization is the process of designing an application so that it can be adapted to
various languages and regions without engineering changes. Sometimes the term
internationalization is abbreviated as i18n, because there are 18 letters between the first "i"
and the last "n."
An internationalized program has the following characteristics:
• With the addition of localization data, the same executable can run worldwide.
• Textual elements, such as status messages and the GUI component labels, are not
hardcoded in the program. Instead they are stored outside the source code and retrieved
dynamically.
• Support for new languages does not require recompilation.
• Culturally-dependent data, such as dates and currencies, appear in formats that conform
to the end user's region and language.
• It can be localized quickly.
The internet demands global software - that is, software that can be developed independently
of the countries or languages of its users, and then localized for multiple countries or regions.
The Java Platform provides a rich set of APIs for developing global applications. These
internationalization APIs are based on the Unicode standard and include the ability to adapt
text, numbers, dates, currency, and user-defined objects to any country's conventions.
This guide summarizes the internationalization APIs and features of the Java Platform,
Standard Edition. For coding examples and step-by-step instructions, see the
Internationalization Trail in the Java Tutorials.
Text Representation
The Java programming language is based on the Unicode character set, and several libraries
implement the Unicode standard. Unicode is an international character set standard which
supports all of the major scripts of the world, as well as common technical symbols. The
original Unicode specification defined characters as fixed-width 16-bit entities, but the
Unicode standard has since been changed to allow for characters whose representation
requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF. An
encoding defined by the standard, UTF-16, allows to represent all Unicode code points using
one or two 16-bit units.
The primitive data type char in the Java programming language is an unsigned 16-bit integer
that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of
UTF-16. The various types and classes in the Java platform that represent character
sequences - char[], implementations of java.lang.CharSequence (such as the String
class), and implementations of java.text.CharacterIterator - are UTF-16 sequences.
Most Java source code is written in ASCII, a 7-bit character encoding, or ISO-8859-1, an 8-bit
character encoding, but is translated into UTF-16 before processing.
The Character class is an object wrapper for the char primitive type. The Character
class also contains static methods such as isLowerCase() and isDigit() for
1-1