Pular para o conteúdo

Conheça Walt Disney World

User:Grapesurgeon/Automatic Korean romanization

Task

An anonymous IP user and User:seefooddiet recently wrote an automatic Korean transliteration module: Module:Ko-translit. It works for both Revised Romanization (RR) and McCune–Reischauer (MR).

Using this module, we created {{Korean/auto}} and {{Infobox Korean name/auto}}: alternate versions of {{Korean}} and {{Infobox Korean name}} that use automatic romanization. These templates have simple boolean (true/false) checkboxes on whether to display a certain romanization. Over time, our goal is to convert every use of the older templates to the automatic form. Once that happens, we'll replace the old template with the automatic one.

This task will impact almost every page that uses {{Infobox Korean name}} (~16,000) and {{Korean}} (~22,000); these two numbers likely significantly intersect.

Background

The reason we want automatic romanization is because virtually nobody on enwiki (or even in Korean studies academia; ctrl+f "mistake") understands how Korean romanization works in detail; the two of us have fixed thousands of mistakes but there are too many left. A significant proportion of romanizations of Korean on Wikipedia are plainly incorrect. See the background section of this essay for some context on why romanizing Korean is so messy and difficult.

How does the module work?

See the module documentation.

Don't let the details of system intimidate you. Most symbols are only rarely used. Also, it's almost certainly easier to learn this syntax than it is to learn either of these romanization systems. We'll make sure the basic syntax is clearly documented in various places, including in the template UI in VisualEditor.

The syntax in the module covers basically every plausible edge case. If people want to make modifications that our syntax doesn't cover, they're straying far enough away from RR/MR to be using an ad-hoc romanization. In that case, the general purpose {{translit}} may be more appropriate.

We've also developed {{Ko-translit}}, which allows you to just get automatic romanization without displaying Hangul.

How do the templates work?

See the documentation pages for both templates and this draft page, where the templates are employed extensively.

Proposed project outline

  1. Use a combination of AutoWikiBrowser (AWB) and manual editing to switch uses of the old templates to the new versions
  2. Modify the MOS:KO to introduce the module and give recommendations on its use and syntax
  3. Once all uses of {{Korean}} and {{Infobox Korean name}} have been converted to the automatic forms, replace the old template with the automatic one

Why AWB/manual editing and not bots?

Our module has various special symbols that are placed in the Hangul that modify the output romanization. For example, putting % before a Hangul string actives "name mode", and the following Hangul string is romanized as if it's a person's name. Other symbols do things like capitalize letters, insert spaces or hyphens where none exist in Hangul, etc.

Furthermore, some cases are complicated. For example, romanized people names have spaces in them, but Hangul does not. While most Koreans have single-syllabic surnames, some have multi-syllabic. For example, should "남궁지" be romanized "Namgung Ji" or "Nam Gungji"? "Namgung" and "Nam" are both Korean surnames.

Personal notes

AWB tasks

For all tasks:

  • Symbols in Hangul strings that overlap with the module's syntax will need to be escaped with a \
    • e.g. hangul=돈돈$$ @here* -> hangul=돈돈\$\$ \@here\*

AWB tasks:

  • Find all uses of {{Korean}}. If this template is being used in parentheses next to what appears to be a person's name (e.g. Kim Sang-jun ({{Korean|hangul=김상준|rr=Gim Sangjun|labels=no}})), prepend % to their Hangul name and replace manual with automatic.
  • For {{Infobox Korean name}}, if a Hangul name is a mononym (e.g. stage names, art names), prepend %_ to the Hangul name, replace manual with automatic.
    • Can check if first character of name is a surname by comparing against a surname table.
  • For {{Infobox Korean name}} on biographies, if a Hangul name has a two-character surname, prepend Hangul name with % and insert _ after the Hangul surname. Replace manual with automatic.
    • Can use two character surname table.
  • For {{Infobox Korean name}} on biographies, if a Westernized name is being romanized, prepend Hangul with ^ and insert % before the surname
    • e.g. Esther Park (physician) 에스더 박 -> ^에스더 %박 -> Eseudeo Bak/Esŭdŏ Pak. Uses Western name and surname order, with space in Hangul name