Unicode input

The Unicode logo

Unicode input is the insertion of a specific Unicode character on a computer by a user. Unicode characters can be inserted in two ways: from the screen by means of an applet from which one can select the character, or by certain key sequence on the keyboard. Many systems provide support for Unicode input in some form.

A Unicode input system should provide a large repertoire of characters, ideally, all valid Unicode code points. This is different from a keyboard layout which defines keys and their combinations only for a limited number of characters appropriate for a certain locale.

KCharSelect picks some of Unicode Mathematical Operators

1 Unicode numbers
- 1.1 Decimal input
- 1.2 Unicode in HTML
2 Availability
3 Selection from a screen
4 Hexadecimal code input
5 Character mnemonics
6 Specialized tools
7 See also
8 External links
9 References

Unicode numbers

Unicode characters are distinguished by code points, which are conventionally represented by the letter U followed by four or five hexadecimal digits, for example U+00AE or U+1D310. Characters in the Basic Multilingual Plane (BMP), containing modern scripts – including many Chinese and Japanese characters – and many symbols, have a 4-digit code. Historic scripts, but also many modern symbols and pictographs (such as emoticons, playing cards and many CJK characters have 5-digit codes.

Decimal input

In some applications on Microsoft Windows, particularly those using the RichEdit control, decimal Unicode code points (for example, 256 for U+0100) are supported with Alt codes.

Unicode in HTML

Main article: Numeric character reference

HTML uses a different syntax for code points. Character codes may be specified after ampersand (&) and the number sign (#), and are followed by the semicolon (;). The number can be either in decimal or in hexadecimal. Preceding zeros may be omitted. If the input is in hexadecimal, the number is preceded by an "x". Some characters can also be used by "entity name".
Example: The HTML code of the copyright sign U+00A9 (or ©) can be:
© (decimal input)
© (hexadecimal input)
© (entity name)

Availability

For displaying a Unicode character, it must be present in the chosen font.^[1] The availability of a specific character depends on its presence in the specified font; every font has its own number of characters, or even none at all. Most characters will not be available. If a desired character is not present in the available fonts, a suitable font should be installed on the system. The character can be displayed on the system now, but if it is a more or less exotic one, it will not be visible on many other systems. An empty box, a question mark or another replacement will be shown: �.

Older web browsers can only display text supported by the current font associated with the character encoding of the page. Some modern browsers, such as Mozilla Firefox, Opera, Safari and Internet Explorer (version ≥ 7), are able to display multilingual web pages by intelligently choosing a font to display each individual character on the page. They will correctly display any mix of Unicode blocks, as long as appropriate fonts are present in the operating system. After installation of a missing font the browser will find the correct Unicode character automatically after restart.

Selection from a screen

Applet for character selection

Many systems provide a way to select Unicode characters visually. ISO 14755 refers to this as a screen-selection entry method.

Microsoft Windows has provided a Unicode version of the Character Map program since version NT 4.0 – appearing in the consumer edition since XP. This is limited to characters in the Basic Multilingual Plane (BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block.

Andrew West's BabelMap freeware is a more advanced character table with good support for CJK languages. Output options include Unicode, Numeric character references or character names. It can be used through a web interface as well.

Mac OS X provides a "character palette" with much the same functionality, along with searching by related characters, glyph tables in a font, etc. It can be enabled in the input menu in the menu bar under System Preferences → International → Input Menu (or System Preferences → Language and Text → Input Sources) or can be viewed under Edit → Special Characters... while Finder is in the foreground.

Equivalent tools – such as gucharmap (GNOME) or kcharselect (KDE) – exist on most Linux desktop environments.

Hexadecimal code input

Different glyphs of Unicode U+0061.

Clause 5.1 of ISO 14755 describes a Basic method whereby a beginning sequence is followed by the hex number representation of the code point and the ending sequence. On some systems, this is limited to the BMP (characters up to U+FFFF).

Different fonts have different glyphs for the same Unicode, thus the appearance of the character will depend on the font which is defined in the webbrowser or application. Also, not every Unicode is available in every font.

In Microsoft Windows

In Microsoft Windows, if the registry key HKEY_CURRENT_USER\Control Panel\Input Method\EnableHexNumpad has a string value of "1", holding down Alt and pressing the + on the numeric keypad, followed by the hex code (using the main letter keys and any of the number keys), then releasing Alt will work.^[1] (You must log off/in on Windows 7 or reboot on earlier systems after setting this registry key for this input method to start working.)

The RichEdit control on Microsoft Windows (as used in for example WordPad) supports the following input method: one first enters the character’s hexadecimal code (between two and six hexadecimal digits), then immediately presses Alt+x. For example, entering f1 and then pressing the combination will produce the character ñ. The code must not be preceded by any digit or letters a–f as they will be treated as part of the code to be converted. This also works on Microsoft Word 2002/2003 for Windows.

In Mac OS

In Mac OS X the "Special Characters…" (⌘ Command+⌥ Option+T) menu can be found in the Edit menu in every program. This brings up the Characters palette allowing the user to choose any character from a variety of views. The user can then also search for the character or Unicode plane by name.^[2] In Mac OS 8.5 and later: one chooses the Unicode Hex Input keyboard layout. Holding down the ⌥ Option, one then types the four-digit hex Unicode code point and the equivalent character appears. One can then release the ⌥ Option key.^[3] Characters outside of the BMP exceed the four-digit limit of the Unicode hex input mechanism but can be entered using the search entry box in the Character Viewer (Edit → Special Characters…) or by using surrogate pairs.^[4] To use surrogate pairs, hold down the ⌥ Option key, the first surrogate, the + key (shift key is ignored), the second surrogate and then release the Option key.

In X11 (Linux and other Unix variants)

The possibility of hexadecimal code input on operating systems using the X Window System depends on the system and applications. Hex input is not implemented in the common X.Org Server^[5]; in practice the mnemonic Compose key subsystem of X supplants it. Individual input methods and GUI toolkits can provide hex input independent of the X server.

Qt and KDE rely on the standard X Input Method (XIM) framework, and do not implement their own solutions.^[6]

GTK+ is an ISO 14755-conformant system. The beginning sequence is Ctrl+⇧ Shift+U and the ending sequence is ↵ Enter. Programs based on GTK+, such as GNOME applications, support Unicode input.

There are two common methods for direct input of Unicode characters:

Hold Ctrl+⇧ Shift and type u followed by the four hex digits. Then release Ctrl + Shift.
Enter Ctrl+⇧ Shift+u, then type the four hex digits, and press ↵ Enter.

In OpenOffice.org and Inkscape, for example, only the second method works.

In a terminal, these input methods may not be supported, but using escape sequences is an alternative.

In platform independent applications

In Emacs, Meta+x ucs-insert.

In Opera (browser), enter the hexadecimal number of the desired symbol or character and then press Ctrl+⇧ Shift+x (alternative shortcut Meta+⇧ Shift+x on OS X).

In the Vim editor, the user first types Ctrl+V u, then types in the hexadecimal number of the symbol or character desired, and it will be converted into the symbol. (On Microsoft Windows, Ctrl+Q may be required instead of Ctrl+V.^[7])

The capability of Vim to create custom mnemonics, as described below, which could be employed on an ad-hoc basis, requires the decimal code point.

Character mnemonics

RFC 1345 defines a large number (1,893) of suggested mnemonics for code points in Unicode 1.0 (as well as characters in ISO 2DIS 10646 and many other character sets in use at the time of publication). Although the document does not restrict the length of a mnemonic (for example, "10000R" for U+2821), most (1,338), of the mnemonics are two characters long, and most (416) of the remaining are three-characters. While never complete, and targeting obsolescent set definitions, the mnemonics themselves can still be used.

Vim allows mnemonics entry (confusingly called "digraphs" by Vim developers) in insert mode (the regular mode for typing text) with Ctrl+K followed by a two-keystroke RFC 1345 mnemonic; or, in addition, if the digraph option is set, by entering the first character followed by a backspace followed by the second character. Custom mnemonics can also be defined for arbitrary code points. (For example, "dig Gr 9881" associates "Gr" with U+2699 ⚙ gear.)

GNU Emacs allows mnemonics entry by switching to rfc1345 input mode (by default Ctrl+x Ctrl+\.

GNU Screen allows mnemonics entry with (by default) Ctrl+A Ctrl+V.

Zsh allows mnemonics entry using the insert-composed-char widget.

RFC 1345 predates the introduction of the Euro sign (€, U+20AC), but the above applications included it as the mnemonic "Eu".

Specialized tools

There are several tools that allow quick input of Unicode characters in applications. The input method of the free tool ЮNICODE Keyboard Enhancer uses hotkeys: To type a Unicode character you press and hold the modifier key and then press the selected symbol key. Which physical key takes the function of the modifier key as well the appropriated symbol keys are to be defined by user.

Another free tool is UnicodeIt. It converts LaTeX expressions like \alpha into Unicode. On the Mac, this works in most programs (including Keynote and Mail) using a keyboard shortcut. This tool also has an online version at http://www.unicodeit.net which works on most platforms including smart phones.

There is also a free webtool called Shapecatcher that can by used to find Unicode characters by drawing them.

External links

Unicode Code Converter
Interactive Unicode Converter
ЮNICODE Keyboard Enhancer – type Unicode characters in (almost) any Unicode-compatible application
How to enter Unicode characters in Microsoft Windows

References

Unicode

Code points

Characters

Special purpose	BOM Combining grapheme joiner Left-to-right mark / Right-to-left mark Soft hyphen Zero-width joiner Zero-width non-breaking space Zero-width non-joiner Zero-width space

Lists	CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols

Processing

Algorithms	Bi-directional text Collation ISO 14651 Equivalence

Comparison	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Domain names (IDN)
Email
Fonts
HTML
- entity references
- numeric reference
Input
Private Character Editor (MS)

Related standards

Personal tools

Create account
Log in

Interaction

Toolbox

What links here
Related changes
Upload file
Special pages
Permanent link
Cite this page

Print/export

Create a book
Download as PDF
Printable version

Conheça Walt Disney World

Unicode input

Contents