Character: (1) The smallest components of a writing system or script that have semantic value. A character refers to an abstract idea rather than to a specific glyph or shape that a character might have once rendered or displayed. (2) A code element.
Charset: Stands for “character set.” A set of characters used in Windows. Charsets refer to the same collections of characters as those defined by Windows code pages.
CJK/CJKV: A reference to Chinese, Japanese, and Korean languages. Sometimes a “V” for Vietnamese is added to the acronym.
Code page: An ordered set of characters of a given script in which a numeric index (code-point value) is associated with each character. In this book, this term is generally used in the context of code pages defined by Windows and can also be called a “character set” or “charset.”
Code point, or code element: (1) The minimum bit combination that can represent a unit of encoded text for processing or exchange. (2) An index into a code page or a Unicode standard.
Double-byte character set (DBCS): A character encoding in which the code points can be either 1 or 2 bytes. Used, for example, to encode Chinese, Japanese, and Korean languages.
GB 2312-80: A multibyte encoding standardized by the People’s Republic of China.
Multibyte character set (MBCS): A character encoding in which the code points can be either 1, 2, or more bytes.
Unicode: A worldwide character encoding that includes most of the world’s scripts; it is developed, maintained, and promoted by the Unicode Consortium, a nonprofit computer industry organization. (The official Unicode Consortium Web site is http://www.unicode.org.)
|