Private Use (Unicode)
In Unicode, Private Use is a concept to allow characters to be defined and used by private agreement between parties (that is, not involving the Unicode Consortium), using unspecified code points in a Private Use Area or range. The private agreement may be published, and often is. Such publication may include a font that supports the definition (showing the glyphs), and processes to support privately-defined graphic or even control effects (e.g. a clickable <do print> character). As a stability rule, the Unicode Standard guarantees these Private Use code points will never be assigned regular characters, so Unicode will never interfere with the private agreement.
For example, Apple Inc. has published the Apple logo to be encoded at Private-use code point U+F8FF <private-use-F8FF>, and maintains this in its fonts and systems.
By definition, multiple private parties may define a specific code point this way, with the consequence that a user can experience using the wrong font, seeing characters from another definition set.
Contents |
Definition
Unicode defines that Private-use code points are assigned characters (as opposed to, say, reserved code points), but no specifics are defined, and properties can be overruled by the private agreement. Part of the stability of the standard is that these code points will never be assigned a regular Unicode character:
Characters in these [Private Use] areas will never be defined by the Unicode Standard. These code points can be freely used for characters of any purpose, but successful interchange requires an agreement between sender and receiver on their interpretation.[1][2]
Just all Private-use characters have General Category=Other, private use (Co)
.
Private Use Area
There are three blocks of private-use code points, each is a Private Use Area. In the Basic Multilingual Plane (plane 0) is block Private Use Area with 6400 code points, and in plane 15 and 16 are blocks Supplemental Private Use Area-A and Supplemental Private Use Area-B respectively with 65.534 code points each. The two PUA Planes in Unicode are composed by using surrogate pairs from the basic BMP plane. The high surrogates are those in BMP-block High Private Use Surrogates (U+DB80..U+DBFF, 128 code points), combined with all low surrogates (1028 code points). The 1-to-1 mapping between surrogate-pair and U+xxxxxx code point is defined in UTF-16.
Unicode: Private Use Areas | ||||
---|---|---|---|---|
Definition: General Category=Co [a][b] |
||||
Range | Plane | Block name | Number of code points | Note |
U+E000..U+F8FF | BMP (0) | Private Use Area | 6400 | |
U+F0000..U+FFFFD | PUP (15)[c] | Supplemental Private Use Area-A | 65534 | Based on block High Private Use Surrogates (U+DB80..U+DBFF) in BMP, using UTF-16. |
U+100000..U+10FFFD | PUP (16)[c] | Supplemental Private Use Area-B | 65534 | |
Notes
|
Private Use in other encodings
In earlier encodings, the concept of private use was present. East Asian systems used End User Character Definition (EUCD)[1].
In ISO-8859-1 (and many other ASCII-compatible character encodings), the C1 control block contains two codes intended for private use "control functions" by ECMA-48: 0x91 private use one (PU1) and 0x92 private use two (PU2).[3][4] Unicode includes these at U+0091 <control-0091> and U+0092 <control-0092> but defines them as control characters (category Cc
), not private use characters (category Co
).[5][6]
The Chinese National Standard 11643 (CNS 11643) is an encoding independent of Unicode. Within this standard, planes 12 to 15 are designed for user-defined charactes.
Usage
Coordinated private use, and publishings into Unicode
Many people and institutions have created character collections for the PUA. Some of these private use agreements are published, so other PUA implementers can aim for unused or less used code points to prevent overlaps. Several characters and scripts previously encoded in private use agreements have actually been fully encoded in Unicode Template:Examples?, necessitating mappings from the PUA to other Unicode code points.
One of the more well-known and broadly implemented PUA agreements is maintained by the ConScript Unicode Registry (CSUR). The CSUR, which is not officially endorsed or associated with the Unicode Consortium, provides a mapping for constructed scripts, such as Klingon pIqaD and Ferengi script (Star Trek), Tengwar and Cirth (J.R.R. Tolkien's cursive and runic scripts), Alexander Melville Bell's Visible Speech, and Dr. Seuss' alphabet from On Beyond Zebra. The CSUR previously encoded the undeciphered or constructed scripts Phaistos, Shavian, and Deseret, which have all been accepted for official encoding in Unicode.
Another common PUA agreement is maintained by the Medieval Unicode Font Initiative (MUFI). This project is attempting to support all of the scribal abbreviations, ligatures, and alternate letterforms found in medieval texts written in the Latin alphabet. The express purpose of MUFI is to experimentally determine which characters are necessary to represent these texts, and to have those characters officially encoded in Unicode. As of Unicode version 5.1, 152 MUFI characters have been incorporated into the official Unicode encoding.
Example code point U+F8FF
Unicode code point U+F8FF or is the last code point in the Private Use Area in BMP. Its meaning and appearance vary depending on the font in use, but its usage in several fonts makes it the most notable code point in the private use area.
- Some early Tengwar fonts map Elvish characters to it.
- The Imitari font draws it as a capital eth.
- The font Luxi draws it as the euro sign.
- The font "Standard Symbols L" uses it as one of the box drawing characters.
- The official PRC standard on precomposed Tibetan uses the codepoint for the Tibetan syllable "hwo".
- Some font makers place a copyright statement or other creator's mark at that code point.
- For example, the dingbats font "DavysDingbats" uses it to display a face, presumably that of the font's creator.
- In most Apple-supplied fonts, it represents the Apple logo, or an early version of the command key.
- The ConScript Unicode Registry suggests it be used for the Klingon glyph "KLINGON MUMMIFICATION GLYPH." This is followed by e.g. Code2000.
- In Wingdings 1, is the Windows logo. In some computers, however, it is (U+F000) instead of .
References
- ^ a b Unicode Standard chapter 16.5 Private Use characters
- ^ Unicode Standard chapter 2: General Structure
- ^ Standard ECMA-48, Fifth Edition - June 1991 §8.2.14 Miscellaneous control functions, §8.3.100, §8.3.101
- ^ ISO C1 Control Character Set of ISO 6429 (1983)
- ^ UnicodeData.txt
- ^ Unicode 6.1.0, Chapter 4, Table 4-9