Template talk:Lang/Archive 15

←

Broken usage of langx

I'm not sure how this template works, but this page is complaining about a missing parameter "p", and I'm not sure how to fix it. x42bn6 Talk Mess 18:24, 15 November 2024 (UTC)

The page was calling {{lang-ru}} with |p=. The template has been deleted, so I don't know if |p= (for "pronunciation", possibly) was a valid parameter. An admin will be able to check. – Jonesey95 (talk) 18:51, 15 November 2024 (UTC)

Some history – I didn't go back to the very beginning:

changed from {{lang-ru}} to {{lang-rus}} at this edit – {{lang-rus}} supports the |p= parameter
changed from {{lang-rus}} to {{lang-ru}} at this edit – {{lang-ru}} ignored the unsupported |p= parameter
changed from {{lang-ru}} to {{langx|ru|...}} at this edit – {{langx}} ignored the unsupported |p= parameter until just a day or so ago; now it emits an error message when editors give it parameters that it does not support.

—Trappist the monk (talk) 19:00, 15 November 2024 (UTC)

So it looks like one possible fix is to change the template transclusion back to {{lang-rus}}. Or is that creating more work in the future? This error is present in other articles, such as Denis Cheryshev. – Jonesey95 (talk) 19:17, 15 November 2024 (UTC)

For now changing is the fix. I did however propose that we either disentangle the unsupported features from -rus or add support for them so other languages can use. There is really almost no reason at all for any specific-language template to stay after the creation of langx. Gonnym (talk) 19:24, 15 November 2024 (UTC)

Pending more granular tracking categories or sorting within the category, an insource search shows 63 articles with this particular error. Most appear to be using lang|ru, but at least a few are using lang|zh, which I have not investigated. – Jonesey95 (talk) 14:32, 16 November 2024 (UTC)

It looks like there is also an error message with "sc", which presumably refers to script. Mellk (talk) 13:35, 22 November 2024 (UTC)

Thanks, but it is not necessary for you to report each instance of unknown parameters causing error messages. They are all collected in Category:Lang and lang-xx template errors which at present lists 92 pages.

—Trappist the monk (talk) 13:57, 22 November 2024 (UTC)

Since this is related to lang-rus, the issue is not just "p=". Mellk (talk) 14:06, 22 November 2024 (UTC)

The 'issue' is {{lang}} and {{langx}} with parameters that are not know to those templates. The issue is not confined to {{lang-rus}} or {{lang-zh}} templates that have been improperly changed to {{lang}} or {{langx}}. Here are searches that are not parameter specific for both templates:

{{lang}} ~680 articles
{{langx}} ~190 articles

Yep, there is a lot of junk out there. You still don't need to make a report here for every subgroup of errors that you encounter out there.

—Trappist the monk (talk) 14:43, 22 November 2024 (UTC)

I did not plan to make a report for every error. I also did not say that the errors are confined to lang-rus (that is pretty obvious when the search above showed that it was not just ru). I was referring to the fix suggested above. Mellk (talk) 14:59, 22 November 2024 (UTC)

I think it is also possible to move pronunciation to the IPA template. I was under the impression that lang-rus would eventually be replaced, but it seems like this is not the case yet? Mellk (talk) 09:38, 22 November 2024 (UTC)

Lang error category without error message?

Resolved

Church Slavonic is in Category:Lang and lang-xx template errors, but I am unable to find a red error message. Maybe I just can't see it. – Jonesey95 (talk) 19:30, 15 November 2024 (UTC)

Do you see it here:^[a]

[ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ] Error: {{Langx}}: invalid parameter: |script= (help)

^ [ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ] Error: [undefined] Error: {{Langx}}: missing language tag (help): invalid parameter: |script= (help)

Fixing the deprecated |script= parameter (cu → cu-Glab) resolves the problem.^[a]

Croatian Church Slavonic: ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ, romanized: crkavnoslověnskь jezikь

^ Croatian Church Slavonic: ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ, romanized: crkavnoslověnskь jezikь

It has been a while, but I've seen these before and if my failing memory is correct, always associated with {{efn}}. I was never able to figure out why the invalid error message gets sandwiched into and corrupts the maintenance message.

—Trappist the monk (talk) 20:04, 15 November 2024 (UTC)

No, I do not see an error message in this talk page section. Maybe my custom CSS is suppressing it? When I inspect the page, I see

<span class="lang-comment" style="font-style: normal; display: none; color: #33aa33; margin-left: 0.3em;">{{langx}} uses deprecated parameter(s) </span>

Note the display:none. – Jonesey95 (talk) 14:33, 16 November 2024 (UTC)

I can see error messages above now, and in the 20 October 2024 version of Church Slavonic. This appears to be resolved. – Jonesey95 (talk) 18:47, 22 November 2024 (UTC)

Non-latn text/Latn script subtag mismatch errors in ancient Iranian articles

Articles regarding ancient Iranian society like Mithra, Mantra (Zoroastrianism)#Etymology and Saoshyant#Etymology are showing this error recently, and I'm not sure how to fix them. —CX Zoom[he/him] ^{(let's talk • {C•X})} 13:09, 26 November 2024 (UTC)

Do you really mean to romanize Miθra and Miθraʰ with 'θ' (Greek small letter theta)? Do you really mean to romanize Astwat̰-әrәta and astvat-әrәta with 'ә' (Cyrillic small letter schwa)?

Apparently there is no unicode for Latin theta so that may require some sort of modification to Module:Lang if, in fact, you did really mean to use the Greek theta character. There is a Latin small letter schwa: 'ə'. Wouldn't that be the correct choice when romanizing Astwat̰-әrәta and astvat-әrәta?

—Trappist the monk (talk) 15:15, 26 November 2024 (UTC)

Sorry, I don't know much about how romanization works, but I believe you are correct about the schwa symbol. For Latin theta, I think there needs to be an exception. Or maybe {{transliteration}} would fit better here? I saw it work fine in some other articles. —CX Zoom[he/him] ^{(let's talk • {C•X})} 17:41, 26 November 2024 (UTC)

{{transliteration|ae|Miθra}} should emit an error message because Greek theta is not Latin theta and in the rendering, 'Miθra' is marked up as Latin text:

<span title="Avestan-language romanization"><i lang="ae-Latn">Miθra</i></span>

Miθra

For the same reason, were we using {{langx}}, there should be an error message:

{{langx|ae|𐬨𐬌𐬚𐬭𐬀|Miθra}}

[[Avestan language|Avestan]]: <span lang="ae" dir="rtl">𐬨𐬌𐬚𐬭𐬀</span>, <small>romanized:&nbsp;</small><span title="Avestan-language romanization"><i lang="ae-Latn">Miθra</i></span>

Avestan: 𐬨𐬌𐬚𐬭𐬀, romanized: Miθra

These need to be fixed.

I think that I have a solution to the {{lang|ae-Latn|Miθra}} where 'θ' is the Greek form but I'll hold off on implementing that until I've fixed the missing transliteration error messaging.

—Trappist the monk (talk) 19:21, 26 November 2024 (UTC)

I have tweaked the sandbox so that when the Greek theta (U+03B8) is the only non-Latin character in a string of text, it is assumed to represent the non-existent (in Unicode) Latin theta. Here are a variety of illustrations:

For {{lang}}:

{{Lang/sandbox|ae-Latn|Miθraʰ}} → Miθraʰ – assume Latin theta because Latn script specified and all other characters in <text> are Latin script
{{Lang/sandbox|ae-Cyrl|Miθraʰ}} → [Miθraʰ] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help) – assume Latin theta because all other characters in <text> are Latin script; script/text mismatch: Cyrl script specified but <text> is Latin script
{{Lang/sandbox|ae|Miθraʰ}} → Miθraʰ – assume Latin theta because all other characters in <text> are Latin script

When theta is the only character in <text>:

{{Lang/sandbox|ae-Latn|θ}} → θ – assume Latin theta because Latn script specified
{{Lang/sandbox|ae-Cyrl|θ}} → [θ] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help) – assume Cyrillic theta because Cyrl script specified – Greek/Cyrillic Unicode mismatch not checked
{{Lang/sandbox|ae|θ}} → θ – assume Greek theta because script not specified

For {{langx}}:

{{Langx/sandbox|ae-Latn|Miθraʰ}} → Avestan: Miθraʰ – assume Latin theta because Latn script specified and all other characters in <text> are Latin script
{{Langx/sandbox|ae-Cyrl|Miθraʰ}} → [Miθraʰ] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help) – assume Latin theta because all other characters in <text> are Latin script; script/text mismatch: Cyrl script specified but <text> is Latin script
{{Langx/sandbox|ae|Miθraʰ}} → Avestan: Miθraʰ – assume Latin theta because all other characters in <text> are Latin script

When theta is the only character in <text>:

{{Langx/sandbox|ae-Latn|θ}} → Avestan: θ – assume Latin theta because Latn script specified
{{Langx/sandbox|ae-Cyrl|θ}} → [θ] Error: {{Langx}}: Latn text/non-Latn script subtag mismatch (help) – assume Cyrillic theta because Cyrl script specified – Greek/Cyrillic Unicode mismatch not checked
{{Langx/sandbox|ae|θ}} → Avestan: θ – assume Greek theta because script not specified

For {{langx}} with <translit>:

{{langx/sandbox|ae|𐬨𐬌𐬚𐬭𐬀|Miθra}} → Avestan: 𐬨𐬌𐬚𐬭𐬀, romanized: Miθra – assume latin theta because all other characters in <translit> are Latin script
{{langx/sandbox|ae|𐬚|θ}} → Avestan: 𐬚, romanized: θ – assume latin theta because this is the <translit> parameter

For: {{transliteration}}

{{transliteration/sandbox|ae|Miθra}} → Miθra – assume latin theta because <code> is a language tag
{{transliteration/sandbox|ae|θ}} → θ – assume latin theta because <code> is a language tag
{{transliteration/sandbox|latn|θ}} → [θ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: θ) (help) – assume latin theta because <code> is a script tag
{{transliteration/sandbox|cyrl|θ}} → [θ] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: θ) (help) – assume latin theta because <code> is a script tag
{{transliteration/sandbox|ru|ш}} → [ш] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: ш) (help) – error because <translit> not latn script
{{transliteration/sandbox|cyrl|ш}} → [ш] Error: {{Transliteration}}: transliteration text not Latin script (pos 1: ш) (help) – error because <translit> not latn script

Without objection, I shall update the live module.

—Trappist the monk (talk) 20:38, 27 November 2024 (UTC)

Updated.

—Trappist the monk (talk) 17:33, 28 November 2024 (UTC)

Category:Transliteration template errors $2

The article First Sino-Japanese War, in the sidebar box entitled "First Sino-Japanese War", contains a transliteration error and also appears to be assigning the nonexistent category Category:Transliteration template errors $2. I suspect that recent changes to this module or one of its subpages has caused this new, nonexistent category to appear. – Jonesey95 (talk) 18:45, 28 November 2024 (UTC)

Fixed I think; the miscoding (on my part) also added articles to Category:Lang and lang-xx template errors $2. The article count in Category:Lang and lang-xx template errors was going down, which is an expected result of the change. On the other hand, Category:Transliteration template errors was not changing so I was beginning to wonder why. Now I know why.

—Trappist the monk (talk) 19:35, 28 November 2024 (UTC)

I figured it was a small typo like this. I don't go looking for these things, but I look at a lot of pages with errors in my travels, and I often stumble across new entries in error reports and categories that are caused by template and module changes. – Jonesey95 (talk) 21:03, 28 November 2024 (UTC)

lang sandbox edits

@Gonnym: Something about this edit broke Module:Lang/sandbox so that the testcases fail.

Also: maker_error_span() should be make_error_span()?

—Trappist the monk (talk) 15:41, 30 November 2024 (UTC)

Fixed both. make_error_span() could probably be replaced with make_error_msg() which also handles the span. I just created it to have that code be in one place while it was there. Gonnym (talk) 22:37, 30 November 2024 (UTC)

Wrong font for lzh (Literary Chinese)

When using "lang|lzh" for Literary Chinese texts, it seems to be using a Taiwanese font?

For example, 有 is typically written as 月 which is also seen in historical texts such as in the Kangxi dictionary (inherited glyphs). But in the Taiwanese standard, they prefer to write it as ⺼ which is modern orthography (Traditional Chinese characters ≠ Literary Chinese characters). Another example would be 遣 where the radical ⻌ would be written as ⻍ according to the inherited glyphs, while the Taiwanese standard is ⻎. The template uses ⻎ instead of ⻍. How would one change it so that the template would use fonts (such as I.Ming) that are based on the inherited glyphs rather than the Taiwanese Traditional characters fonts (which are based on handwriting and their own standard)? Lachy70 (talk) 07:25, 4 December 2024 (UTC)

This is only related to the fonts your system picks to render specific languages, and has nothing to do with Wikipedia. Remsense ‥ 论 08:04, 4 December 2024 (UTC)

It doesn't use Taiwanese font for me. Unfortunately browsers allow configuring default fonts only for handful of languages (if at all) and lzh isn't among them. And system configuration might be difficult. The easiest way is to add something like [lang]:lang(lzh){font-family:"I.Ming"} to your user style either in your browser, or just for when you're logged in to Wikipedia at common.css or global.css. The template documentation already covers that at § Applying styles. But if you think of something like overriding default fonts for everyone regardless of their system configuration, then this is something that definitely should not be done (MOS:FONTFAMILY). – MwGamera (talk) 05:51, 5 December 2024 (UTC)

Update to Module:Lang/sandbox

I've modified Module:Lang/sandbox to allow {{Wikt-lang}} to use the language html attribute logic instead of having to duplicate the entire code. Testcases at Module talk:Lang/testcases have all passed so nothing seems to have been broken. Let me know if you have any comments before I update. Gonnym (talk) 09:58, 16 December 2024 (UTC)

Transliteration whitelist

@Trappist the monk I don't think having a blanket whitelist of arbitrary non-Latin script characters makes sense, and especially not one which is as random as [ʻʼʾʿΔαβγδθσφχϑьᾱῑ῾上入去平]. This is totally unsustainable, since it will constantly need to be expanded (e.g. I can already see that ъ is missing, which crops up in various Slavicist transcripitons), and it also opens the door to false-negatives, because most of these will not be acceptable characters in the vast majority of languages. This seems like an artificially-imposed maintenance burden for increasingly little gain.

What I suggest is:

Convert to form NFD before checking, which removes the need to have precomposed characters like ᾱῑ.
Allow all common script characters.
Allow any characters marked with Latn in the Unicode ScriptExtension.txt file.
Generate a warning message via mw.message instead of a big error message, as it's overkill.
Create a maintenance category, and add all transcriptions containing non-Latin-script characters to it by default.
Allow language-specific exceptions, specified in the data somewhere. These should only be added for really common cases.
Implement an override, which can be specified using a parameter. This should be used in all other cases. Suggest it in the warning, too ("If this is correct, please...").

Theknightwho (talk) 16:34, 2 January 2025 (UTC)

Yeah, I know, really crude. I did that for the avoidance of conflict.

Hadn't thought about NFD and ScriptExtensions; I will.

mw.message? Not sure how that would be used. My experience with mw.message() is limited to rendering error messages with $1, $2, etc replacements. Can it be used to render messages someplace other than directly in the rendered article? Or were you perhaps thinking of mw.addWarning()?

Maintenance categories are problematic because quite often, {{transliteration}} is used in wikilinks and {{ill}} templates:

[[wikt:نسیم|{{Transliteration|ar|nasim}}]] → nasim

[[Yupei#Jinbu (禁步)|{{Transliteration|zh|Jinbu}}]] → Jinbu

{{ill|yuta (priestess)|lt={{transliteration|ryu|yuta}}|ja|ユタ}} → yuta [ja]

Emitting a category wikilink inside another wikilink breaks the rendering.

Yep, overrides are necessary because stuff like this: {{Transliteration|ja|Ama Kakeru ﾐ☆ Jōshikōsei}}.

—Trappist the monk (talk) 19:57, 2 January 2025 (UTC)

I strongly agree with User:Theknightwho on the problems with the whitelist. I think underlying problem with this breaking change stems from mixing two separate uses of the term 'Latn', without being clear about transliteration requirements.

-Latn is the script portion of the IETF language tag, which is used to set the lang= attribute (RFC-4646), which affects the display style of the inline text containing element (among other things,as noted by Template:Lang#Rationale). It is important that a single transliterated string has a consistent display style across all its characters, and with other transliterations in the same document. It's a sensible requirement for a en-wiki transliteration template where 'romanization' is a near synonym to use a 'Latn' display style.
Latn is also used in Unicode for the "predominant" [script value] of a single code-point. if the predominant use of the character is in one script, but it is also used in others, then it takes the Script property value associated with that predominant use. This is a different (glyph level) classification, and doesn't directly relate to transliteration.

It's hard to find a concrete example in the specs, so this could perhaps be explained better, but it is in fact completely reasonable to have a Greek theta character displayed side-by-side with Latn characters, all using the same Latn display style. This is what is required for Etruscan transliterations, and all the other non-Latn Unicode-script-class examples previously mentioned, including the "modifier" half circles used for Arabic, and the ъ mentioned above.

The same string could be displayed using Greek display rules, but it would look wrong. It would also be wrong to use mixed styles in the same string. A 'Latin theta' is a semantically different symbol, which is why it has a different Unicode code point, and is also incorrect to substitute.

The number of characters, or the Unicode script classification of any adjacent characters, are irrelevant for the display purposes if the transliteration is valid. Single character transliterations are totally valid. Ironically, the most obvious use is in transliteration tables.

[ʻʼʾʿΔαβγδθσφχϑьᾱῑ῾上入去平] demonstrates that Unicode Script classification of individual glyphs is a different concern from a consistent transliteration display style. I have no idea what the CJK glyphs are doing there, I cannot verify any of it. It looks like nonsense. I know what the Greek symbols are, and don't even doubt those CJK are valid in some transliteration of something, but this partial list has no value AFAICT.

The current IS-LATIN whitelist function is misnamed. It's more of a is-valid-transliteration-string/char, but as stated above is of little value, impossible to maintain, and additionally seems to be based on misunderstandings.

Not only is it prone to false-positives, but every "true"-positive error it catches mis-characterises the problem. It's not that the string contains a non-"Unicode Script = Latn" character, rather the character possibly is not a valid transliteration symbol. At best, this is a heuristic for maintenance purposes, but even then it needs to be considerably smarter and have a better idea of what is and isn't valid transliteration. It is not appropriate for this to be raising error messages. Warnings at most, but it'd still be annoyingly noisy.

Template:Transliteration/testcases are appallingly light and most of these basic transliteration cases that broke and seem to be a total surprise should be covered. Salpynx (talk) 21:08, 3 January 2025 (UTC)

@Salpynx Just FYI, 上入去平 refer to the four tones of Middle Chinese, which are of fundamental importance in Chinese linguistics, so it's not that weird that they've come up. No modern variety has retained the Middle Chinese tone system (Mandarin having 4 tones is a coincidence - it's not a one-to-one conversion), and they're diaphonemic anyway (so IPA is out), so you sometimes see them given next to readings in a similar fashion to the tone numbers used in Wade-Giles or Jyutping. Theknightwho (talk) 02:16, 4 January 2025 (UTC)

The re-use of 'is-latin' with the unjustified whitelist now breaks more things:

The Chinese character {{lang|und-Hani|上}} has 3 strokes (modelled after an example from Template:Lang#Undetermined_language)

The Chinese character 上 has 3 strokes

{{lang|und-Grek|σαβαθ}}

σαβαθ

{{lang|ota-Grek|χαβα}} / {{lang|ota-Arab|هوا}} (Weather)

χαβα / هوا (Weather)

It's not clear what the original change was for, but it broke things without pre-discussion (AFAICT), despite the warnings on Template:Lang and Template:Transliteration. Patching it up piecemeal doesn't seem to be helping. Salpynx (talk) 05:30, 5 January 2025 (UTC)

[1] [ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ] Error: [undefined] Error: {{Langx}}: missing language tag (help): invalid parameter: |script= (help)

[2] Croatian Church Slavonic: ⱌⱃⰽⰲⰰⱀⱁⱄⰾⱁⰲⱑⱀⱄⰽⱜ ⰵⰸⰻⰽⱜ, romanized: crkavnoslověnskь jezikь

[a]

[a]

Conheça Walt Disney World