One of the questions that I see come up a lot is about how Japanese text can be written using a computer, mobile phone, or other electronic device. Since it is a fairly important process for understanding the development of technology in Japan, I thought I would detail the process here.
Basics – Types of Japanese Characters
Japanese consists of 3 separate writing systems: Katakana and hiragana, which are phonetic alphabets (collectively referred to as kana) that each consist of roughly 50 base characters (and which are phonetically equivalent); and Kanji, which is a pictographic writing system originating from China of which approximately 6,355 characters are defined in the Japanese Industrial Standards (JIS X 0208).
Traditionally, all Japanese characters are written in a square grid of non-proportional, 1:1 aspect ratio characters (school notebooks for practicing writing use a square grid). By comparison, 7-bit ASCII characters (i.e. English letters and numbers) are generally rectangular with a roughly 1:2 aspect ratio. In fixed-width computer systems, it is therefore convenient to render Japanese characters in the same area as two consecutive English characters. From a Japanese perspective, the square Japanese characters are the normal characters and are therefore called full-width characters, while English characters that only take up half of that width are called half-width characters. However, due to all kinds of technical limitations, early computer systems actually used a half-width version of the katakana writing system that was developed specifically for computers, with the earliest Japanese encoding standard JIS X 0201 consisting only of half-width alphanumerics (i.e. 7-bit ASCII) and half-width katakana characters. Full-width katakana characters were only added later, along with full-width hiragana and kanji, as multi-byte encodings were developed. The JIS standards also define various Greek, Cyrillic and symbol characters in the standard Japanese encodings, which are displayed as full-width characters in Japanese fonts. These character sets are summarized by the following screen shot (taken from the Windows XP command prompt).
Because of the large number of characters in languages such as Japanese and Chinese (Japanese education standards define around 2000 characters for high school students and JIS standards define more than 6000 characters), characters cannot be entered directly, but instead need to be entered via some kind of conversion process. Modern computers and electronic equipment therefore employ a system whereby words are entered phonetically, and then converted into kanji through an interactive conversion process. I’m going to describe this process by using the Japanese IME included in Microsoft Windows XP.
Japanese Input Method Editor (IME)
In Windows, input method editor is a general term used by Microsoft for their abstraction of multi-stage text entry systems (specifically for Japanese, Chinese and Korean). When the IME is disabled, the keyboard functions exactly the same as a regular US English keyboard (except for the slightly different key layout), with keydown, keypress, and keyup events sent directly to the control that has the focus, as per usual. When the IME is activate, however, key events are intercepted by the IME, which then interacts with the user in a seamless way to provide character conversion according to the conversion mode selected by the user. In the Japanese IME, the most common conversion mode is a multi-stage conversion process employing romaji-kana conversion in the first stage and kana-kanji conversion in the second stage.
In the following sections, I’m going to take 特急電車 (which means “express train”) as an example to examine the multi-stage conversion from “tokkyuudennsha” to “とっきゅうでんしゃ”, and then to “特急電車”.
Romaji-kana conversion is the process of converting Roman alphabetic key presses into Japanese phonetic characters. This is a relatively straightforward process in which the IME looks up Romaji character sequences in a simple table, and replaces sequences with the corresponding kana as soon as they are matched. There are individual conversion codes for each character, such as “si” -> “し”, “xya” -> “ゃ”, sequences that convert into multiple characters, such as “sya” -> “しゃ”, and alternate sequences such as “sha” -> “しゃ”. The default conversion table contains around 300 sequences. For our example, typing “tokkyuudennsha” produces:
Note that the underline shown above actually appears on the screen during input, and serves to highlight the text that is currently undergoing conversion. For example, if we were typing this in the middle of a sentence, it would look something like this:
This underlined section is known as the composition string. This is an important concept when dealing with the IME, and warrants a more detailed explanation.
The composition string is the only part of the text in a control to which the IME has access. Once conversion begins, the cursor is trapped within the composition string until the conversion is complete (when the current string is accepted by the user hitting enter or the control losing focus) or is cancelled (by the user deleting all of the characters in the conversion string or hitting escape). An important aspect of the composition string is that the user is able to cursor around inside the string to correct typos or make other changes before proceeding to the second part of the conversion, kana-kanji conversion.
Now that the target word is spelled out phonetically in the composition string, the next step is to enter the kana-kanji stage of the process. This is achieved by simply hitting the space bar. What actually goes on inside the IME at this point is actually quite sophisticated and largely beyond the scope of this article. However, the basics are that the IME analyzes the grammar of our text, attempts to identify the separate words in the text (a process known as segmentation that is necessary because there are no spaces in Japanese), and then perform a context-sensitive look up of each of those words in its built-in dictionaries. The IME then picks the best matches for each segment, and displays them like so:
In this case, we’ve used such a common phrase that the IME has no trouble identifying the segmentation and the best candidates for each word. At this point, the IME has still not accepted our input, but is waiting for our approval of the suggested candidate characters. At this point we can either hit enter to accept, escape to return to the kana composition string, look through the other candidates that the IME has dug out of its dictionary and choose alternatives if necessary, or even adjust the location of the break between the two words.
An important point to note is that the selection of the best candidate is influenced by the context, which we can demonstrate by looking at the other candidates. The heavier underline under the first segment indicates that it is selected, and so hitting the space bar (or the up/down arrows or 変換 key) will open up a list showing the other candidates for tokkyuu, like so:
Topping the list is 特急 (express), followed by 特級 (premium) and 特休 (special holiday), and then the two different phonetic alphabet versions. However, the IME knew that 特急 (express) was a better match for 電車 (train) than the other options. Grammar and past selections are also taken into account, which gives the IME the complexity of a grammar checker, contextual dictionary, and adaptive predictive text engine all rolled into one.
The process as described above may sound straightforward from the perspective of the user, but there are actually a great number of subtleties that arise during real-world processing.
1) Correcting typing errors is a costly operation. For example, mistakenly typing “tio” instead of “to” during romaji-kanji conversion results in “ちお” instead of “と”, and the corrective operation of deleting “i” has grown into deleting everything and starting again. Automatic romaji-kanji conversion thereby acts to amplify the labor required to correct typing errors. The labor of typing a few extra keystrokes, however, is almost insignificant compared to the cognitive load that errors create, which arises because of dealing with three different representations of the same text.
In our example, the target text 特急電車 (which relates to the actual thing we are thinking about) contains 4 characters, our intermediate text とっきゅうでんしゃ (which is tied to the pronunciation that we are forced to think about while typing) contains 9 characters, and the text we actually enter on the keyboard tokkyuudennsha (which is tied to our actual motor function) contains 14 characters. So, when a single character typo produces a result like this: と kk ッ 雄電社, it can be difficult to mentally process what went wrong.
2) One consequence of the IME operating totally seamlessly without cooperation from applications is that the IME can only maintain the state of a single composition string at a time. This means that if a control loses focus while conversion is in progress, the semi-converted string is accepted as-is into the control, and all relevant state information is lost. Unfortunately, this also means that windows that steal focus, even for a split second, create a headache for any text that is in mid-conversion at the moment when the focus is stolen. (And it’s amazing how many apps actually do steal focus for just a split second.)
3) The ability to type a word relies on that word being in the IME’s dictionary. This becomes a minor problem for scientific and field-specific terminology, and a major problem for people’s names. The general solution is to try and think of other more common words that use each of the same kanji, and then delete any excess kanji.
Direct Kana Input
One immediate question that arises when we consider the complexities of text entry described above is why not simply use a kana keyboard containing the native Japanese phonetic characters to simplify the process? In fact, the JIS standard keyboard (Wikipedia has a nice picture) does define the key layout for direct kana entry, and the Japanese IME fully supports this keyboard layout. However, as a poll cited on the Japanese Wikipedia states, around 84% of Japanese people use Romaji input compared to only around 8% for direct kana input. Why is this?
The first thing to note is that back in the early 80s when computers were starting to become widespread, the complexity of the writing system led to a boom in dedicated word processing devices that could handle text entry far better than the personal computers of the time, and these dedicated word processors were almost 100% focused on direct kana entry. PCs, on the other hand, were almost exclusively used by enthusiasts who were interested in things like programming, or in businesses that used alphanumeric-oriented programs such as spreadsheets.
The other thing to note is that the Japanese phonetic alphabet consists of 82 separate kana (when accented characters are included). In order to allow touch-typing, the Japanese word-processors employed systems such as OASYS (親指シフト in Japanese, which literally means “thumb shift”, referring to the two additional thumb-actuated shift keys) which utilized complex shifting systems so that all of the kana could be squeezed into a more touch-typing friendly layout.
While these kinds of systems were very popular at the time – with OASYS keyboards still available today – when PCs finally caught up to dedicated word processors in the mid 90s the computer keyboard layout had already been standardized around the JIS layout. The JIS layout is able to fit the large number (82) of kana onto a regular US-like keyboard without additional shift keys by separating out the two accent marks into modifier characters, giving 56 kana plus 2 accent keys, which are scattered around the keyboard in one of the most touch-typing unfriendly layouts imaginable (11 regular kana keys assigned to the right pinky, including the highly-frequently used accent characters). This scheme also offers better compatibility with the JIS X 0201 encoding where accent marks are encoded as separate characters.
In short, users of modern computers need to learn the QWERTY layout anyway due to the prevalence of ASCII-only characters in things like usernames, passwords, programming languages, spreadsheets, etc., and there is no real incentive to learn the touch-typing unfriendly direct kana layout, regardless of the burden of romaji-kana conversion. The touch-typing friendly OASYS layout is also rarely used, somewhat like the Japanese version of the Dvorak keyboard.
In mobile phones, however, the scales tip the other way. Japanese kana are taught from elementary school as a 5 x 10 grid as follows:
This makes direct kana input by assigning the numbers 1 through to 0 to each of the above columns a very intuitive system for Japanese speakers. When this is combined with the adaptive predictive kana-kanji conversion systems employed by Japanese mobile phone manufacturers (which offer a lot of features that are not available in the Windows IME), text entry on a mobile phone is almost comparable in speed to using a computer and full keyboard.