History īased on experiences with the Xerox Character Code Standard (XCCS) since 1980, the origins of Unicode can be traced back to 1987, when Joe Becker from Xerox with Lee Collins and Mark Davis from Apple started investigating the practicalities of creating a universal character set. Unicode Bulldog Award recipients include many names influential in the development of Unicode and include Tatsuo Kobayashi, Thomas Milo, Roozbeh Pournader, Ken Lunde, and Michael Everson. For other examples, see duplicate characters in Unicode. For example, the " fullwidth forms" section of code points encompasses a full duplicate of the Latin alphabet because Chinese, Japanese, and Korean ( CJK) fonts contain two versions of these letters: "fullwidth" matching the width of the CJK characters, and normal width. Many essentially identical characters were encoded multiple times at different code points to preserve distinctions used by legacy encodings and therefore, allow conversion from those encodings to Unicode (and back) without losing any information. The first 256 code points were made identical to the content of ISO/IEC 8859-1 so as to make it trivial to convert existing western text. This simple aim becomes complicated, however, because of concessions made by Unicode's designers in the hope of encouraging more rapid adoption of Unicode. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font, or style) to other software, such as a web browser or word processor. In text processing, Unicode takes the role of providing a unique code point-a number, not a glyph-for each character. In the case of Chinese characters, this sometimes leads to controversies over distinguishing the underlying character from its variant glyphs (see Han unification). Unicode, in intent, encodes the underlying characters- graphemes and grapheme-like units-rather than the variant glyphs (renderings) for such characters. Many traditional character encodings share a common problem in that they allow bilingual computer processing (usually using Latin characters and the local script), but not multilingual computer processing (computer processing of mixed arbitrary scripts). Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO/IEC 8859 standard, which find wide usage in various countries of the world but remain largely incompatible with each other. The most common encodings are the ASCII-compatible UTF-8, the ASCII-incompatible UTF-16 (compatible with the obsolete UCS-2), and the Chinese Unicode encoding standard GB18030 which is not an official Unicode standard but is used in China and implements Unicode fully. The Unicode standard defines three encodings but several others exist, mostly variable-length encodings. Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes. The Standard also includes reference data files and visual charts to help developers and designers correctly implement the repertoire. Alongside the character encodings, the Consortium's official publication includes a wide variety of details about the scripts and how to display them: normalization rules, decomposition, collation, rendering, and bidirectional text display order for multilingual texts, and so on. The Unicode Standard, however, includes more than just the base code. The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical to the other. The standard has been implemented in many recent technologies, including modern operating systems, XML, JSON, and most modern programming languages, sometimes only in UTF-8 form. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, 3664 emoji (including in colors), and non-visual control and formatting codes. Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. Without proper rendering support, you may see question marks, boxes, or other symbols. This article contains uncommon Unicode characters.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |