Discussion:
SUBJECT: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
Egor Kobylkin
2018-07-17 19:34:34 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]

to localedata/locales/ and include it in all your locales going forward.

Patch included inline below.


From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.



Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


Root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compliation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration has only ASCII codes but still can be read by a native
speaker. Among other things it is useful for processing the Cyrillic
texts and filenames by programs or on systems that are not specifically
prepared to work with Cyrillic, don't have corresponding fonts installed
or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on GOST 7.79-2000 official source
(Federal Agency on Technical Regulating and Metrology Of Russian
Federation [2]). Technically an independent but identical source [3] was
used and prepared in a spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.
However it would not be the standard Cyrillic transliteration as
described above.
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files but have
received not reply so far except from Volodymyr Lisivka
<***@gmail.com> (uk_UA) who has confirmed the exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=8590
[7] translit_cyrillic https://sourceware.org/bugzilla/attachment.cgi?id=8591
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=8618

Best regards,
Egor Kobylkin

---
2018-07-17 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* locales/translit_cyrillic: add Russian GOST 7.79-2000 transliteration
table from Cyrillic to Latin.
* locales/C: add include "translit_cyrillic";"" to LC_CTYPE translit
section.
* locales/aa_DJ: likewise
* locales/af_ZA: likewise
* locales/ak_GH: likewise
* locales/am_ET: likewise
* locales/ar_EG: likewise
* locales/be_BY: likewise
* locales/bem_ZM: likewise
* locales/ber_DZ: likewise
* locales/ber_MA: likewise
* locales/bg_BG: likewise
* locales/bi_VU: likewise
* locales/bn_BD: likewise
* locales/bo_CN: likewise
* locales/ca_ES: likewise
* locales/ce_RU: likewise
* locales/cs_CZ: likewise
* locales/cv_RU: likewise
* locales/cy_GB: likewise
* locales/da_DK: likewise
* locales/de_DE: likewise
* locales/dv_MV: likewise
* locales/dz_BT: likewise
* locales/el_GR: likewise
* locales/en_GB: likewise
* locales/en_NG: likewise
* locales/en_ZM: likewise
* locales/es_CU: likewise
* locales/es_ES: likewise
* locales/et_EE: likewise
* locales/fa_IR: likewise
* locales/ff_SN: likewise
* locales/fi_FI: likewise
* locales/fr_FR: likewise
* locales/ga_IE: likewise
* locales/gd_GB: likewise
* locales/gu_IN: likewise
* locales/gv_GB: likewise
* locales/he_IL: likewise
* locales/hi_IN: likewise
* locales/hif_FJ: likewise
* locales/hr_HR: likewise
* locales/ht_HT: likewise
* locales/hu_HU: likewise
* locales/hy_AM: likewise
* locales/id_ID: likewise
* locales/is_IS: likewise
* locales/it_IT: likewise
* locales/ja_JP: likewise
* locales/kk_KZ: likewise
* locales/km_KH: likewise
* locales/kn_IN: likewise
* locales/ko_KR: likewise
* locales/ks_IN: likewise
* locales/kw_GB: likewise
* locales/lb_LU: likewise
* locales/lg_UG: likewise
* locales/lij_IT: likewise
* locales/ln_CD: likewise
* locales/lo_LA: likewise
* locales/lt_LT: likewise
* locales/lv_LV: likewise
* locales/mg_MG: likewise
* locales/mhr_RU: likewise
* locales/mk_MK: likewise
* locales/ml_IN: likewise
* locales/ms_MY: likewise
* locales/mt_MT: likewise
* locales/***@latin: likewise
* locales/nb_NO: likewise
* locales/ne_NP: likewise
* locales/nhn_MX: likewise
* locales/niu_NU: likewise
* locales/niu_NZ: likewise
* locales/nl_NL: likewise
* locales/nr_ZA: likewise
* locales/oc_FR: likewise
* locales/om_KE: likewise
* locales/or_IN: likewise
* locales/os_RU: likewise
* locales/pa_IN: likewise
* locales/pa_PK: likewise
* locales/pl_PL: likewise
* locales/pt_PT: likewise
* locales/quz_PE: likewise
* locales/ro_RO: likewise
* locales/ru_RU: likewise
* locales/rw_RW: likewise
* locales/sa_IN: likewise
* locales/sd_IN: likewise
* locales/***@devanagari: likewise
* locales/sd_PK: likewise
* locales/se_NO: likewise
* locales/sgs_LT: likewise
* locales/si_LK: likewise
* locales/sk_SK: likewise
* locales/sl_SI: likewise
* locales/sm_WS: likewise
* locales/so_SO: likewise
* locales/sq_AL: likewise
* locales/ss_ZA: likewise
* locales/st_ZA: likewise
* locales/sv_SE: likewise
* locales/sw_KE: likewise
* locales/ta_IN: likewise
* locales/te_IN: likewise
* locales/th_TH: likewise
* locales/ti_ET: likewise
* locales/tn_ZA: likewise
* locales/to_TO: likewise
* locales/tpi_PG: likewise
* locales/tr_TR: likewise
* locales/ts_ZA: likewise
* locales/unm_US: likewise
* locales/ur_IN: likewise
* locales/ur_PK: likewise
* locales/ve_ZA: likewise
* locales/vi_VN: likewise
* locales/wa_BE: likewise
* locales/wo_SN: likewise
* locales/xh_ZA: likewise
* locales/yi_US: likewise
* locales/zh_CN: likewise
* locales/zu_ZA: likewise


diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/C 2018-07-17 17:55:47.000000000 +0000
@@ -2292,6 +2292,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-07-17 17:55:47.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-07-17 17:55:47.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-07-17 17:55:47.000000000 +0000
@@ -56,6 +56,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-07-17 17:55:47.000000000 +0000
@@ -1396,6 +1396,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-07-17 17:55:48.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-07-17 17:55:48.000000000 +0000
@@ -166,6 +166,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-07-17 17:55:48.000000000 +0000
@@ -86,6 +86,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-07-17 17:55:48.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-07-17 17:55:48.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-07-17 17:55:48.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-07-17 17:55:48.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-07-17 17:55:48.000000000 +0000
@@ -2311,6 +2311,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-07-17 17:55:48.000000000 +0000
@@ -109,6 +109,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-07-17 17:55:48.000000000 +0000
@@ -167,6 +167,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-07-17 17:55:48.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-07-17 17:55:48.000000000 +0000
@@ -52,6 +52,7 @@
include "translit_combining";""


+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-07-17 17:55:48.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-07-17 17:55:48.000000000 +0000
@@ -55,6 +55,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-07-17 17:55:48.000000000 +0000
@@ -50,6 +50,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-07-17 17:55:49.000000000 +0000
@@ -73,6 +73,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-07-17 17:55:49.000000000 +0000
@@ -109,6 +109,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-07-17 17:55:49.000000000 +0000
@@ -79,6 +79,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-07-17 17:55:49.000000000 +0000
@@ -42,6 +42,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-07-17 17:55:49.000000000 +0000
@@ -137,6 +137,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-07-17 17:55:49.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-07-17 17:55:49.000000000 +0000
@@ -47,6 +47,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-07-17 17:55:49.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-07-17 17:55:49.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-07-17 17:55:49.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-07-17 17:55:49.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-07-17 17:55:49.000000000 +0000
@@ -153,6 +153,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-07-17 17:55:49.000000000 +0000
@@ -478,6 +478,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-07-17 17:55:49.000000000 +0000
@@ -77,6 +77,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/id_ID 2018-07-17 17:55:49.000000000 +0000
@@ -55,6 +55,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-07-17 17:55:49.000000000 +0000
@@ -2161,6 +2161,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-07-17 17:55:49.000000000 +0000
@@ -1682,6 +1682,7 @@
include "translit_combining";""
include "translit_cjk_variants";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-07-17 17:55:50.000000000 +0000
@@ -158,6 +158,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-07-17 17:55:50.000000000 +0000
@@ -873,6 +873,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-07-17 17:55:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-07-17 17:55:50.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-07-17 17:55:50.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-07-17 17:55:50.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-07-17 17:55:50.000000000 +0000
@@ -78,6 +78,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "<U0065><U005E>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-07-17 17:55:50.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-07-17 17:55:50.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-07-17 17:55:50.000000000 +0000
@@ -51,6 +51,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-07-17 17:55:50.000000000 +0000
@@ -77,6 +77,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-07-17 17:55:50.000000000 +0000
@@ -2122,6 +2122,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-07-17 17:55:50.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-07-17 17:55:50.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-07-17 17:55:50.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-07-17 17:55:50.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-07-17 17:55:50.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@latin
b/localedata/locales/***@latin
--- a/localedata/locales/***@latin 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/***@latin 2018-07-17 17:55:50.000000000 +0000
@@ -53,6 +53,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-07-17 17:55:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-07-17 17:55:50.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-07-17 17:55:51.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/om_KE 2018-07-17 17:55:51.000000000 +0000
@@ -140,6 +140,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/or_IN 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/os_RU 2018-07-17 17:55:51.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-07-17 17:55:51.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-07-17 17:55:51.000000000 +0000
@@ -142,6 +142,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-07-17 17:55:51.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-07-17 17:55:51.000000000 +0000
@@ -144,6 +144,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-07-17 17:55:51.000000000 +0000
@@ -74,6 +74,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-07-17 17:55:51.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-07-17 17:55:51.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@devanagari
b/localedata/locales/***@devanagari
--- a/localedata/locales/***@devanagari 2018-07-17 17:49:19.000000000
+0000
+++ b/localedata/locales/***@devanagari 2018-07-17 17:55:51.000000000
+0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-07-17 17:55:51.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-07-17 17:55:51.000000000 +0000
@@ -205,6 +205,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-07-17 17:55:52.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-07-17 17:55:52.000000000 +0000
@@ -91,6 +91,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-07-17 17:55:52.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-07-17 17:55:52.000000000 +0000
@@ -139,6 +139,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-07-17 17:55:52.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-07-17 17:55:52.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-07-17 17:55:52.000000000 +0000
@@ -866,6 +866,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>

include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-07-17 17:55:52.000000000 +0000
@@ -36,6 +36,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-07-17 17:55:52.000000000 +0000
@@ -2430,6 +2430,7 @@

% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-07-17 17:55:52.000000000
+0000
@@ -0,0 +1,151 @@
+escape_char /
+comment_char %
+
+% Transliterations that converts cyrillic letters to ascii symbols
inspired by GOST 7.79-2000
+% https://sourceware.org/bugzilla/show_bug.cgi?id=2872
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=8590
+% Up to three characters are required to do a reversible transliteration.
+
+LC_CTYPE
+
+translit_start
+
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> "<U0059><U004F>";<U0059>
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> "<U005A><U0048>";<U005A>
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> "<U0043><U005A>";<U0043>
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> "<U0043><U0048>";<U0043>
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> "<U0053><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> "<U0053><U0048><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> "<U0060><U0060>";<U0060>
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> "<U0059><U0027>";<U0059>
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> "<U0045><U0060>";<U0045>
+% CYRILLIC CAPITAL LETTER YU
+<U042E> "<U0059><U0055>";<U0059>
+% CYRILLIC CAPITAL LETTER YA
+<U042F> "<U0059><U0041>";<U0059>
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> "<U007A><U0068>";<U007A>
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> "<U0063><U007A>";<U0063>
+% CYRILLIC SMALL LETTER CHE
+<U0447> "<U0063><U0068>";<U0063>
+% CYRILLIC SMALL LETTER SHA
+<U0448> "<U0073><U0068>";<U0073>
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> "<U0073><U0068><U0068>";<U0073>
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> "<U0060><U0060>";<U0060>
+% CYRILLIC SMALL LETTER YERU
+<U044B> "<U0079><U0027>";<U0079>
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> "<U0065><U0060>";<U0065>
+% CYRILLIC SMALL LETTER YU
+<U044E> "<U0079><U0075>";<U0079>
+% CYRILLIC SMALL LETTER YA
+<U044F> "<U0079><U0061>";<U0079>
+% CYRILLIC SMALL LETTER IO
+<U0451> "<U0079><U006F>";<U0079>
+
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-07-17 17:55:52.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-07-17 17:55:53.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-07-17 17:55:53.000000000 +0000
@@ -69,6 +69,7 @@
<U00C5> "<U0041><U030A>";"<U0041>";"<U0041><U0055>"
<U00E5> "<U0061><U030A>";"<U0061>";"<U0061><U0075>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-07-17 17:55:53.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-07-17 17:55:53.000000000 +0000
@@ -73,6 +73,7 @@
<U05F0> "<U05D5><U05D5>";"<U0077><U0077>"
<U05F1> "<U05D5><U05D9>";"<U0077><U006A>"
<U05F2> "<U05D9><U05D9>";"<U006A><U006A>"
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-07-17 17:49:22.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Carlos O'Donell
2018-07-17 19:40:54 UTC
Permalink
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
We are currently preparing for the 2.28 release and it may take
a while to review this change and the structure of the changes,
and the data itself.

Is it OK if this material is reviewed for 2.29 inclusion (after
August 1st)?

Cheers,
Carlos.
Egor Kobylkin
2018-07-17 19:50:40 UTC
Permalink
Post by Carlos O'Donell
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
We are currently preparing for the 2.28 release and it may take
a while to review this change and the structure of the changes,
and the data itself.
Is it OK if this material is reviewed for 2.29 inclusion (after
August 1st)?
It's fine with me to postpone it for for 2.29 inclusion (after August 1st).
Should I send a reminder in August?

Bests,
Egor
Carlos O'Donell
2018-07-17 19:59:27 UTC
Permalink
Post by Egor Kobylkin
Post by Carlos O'Donell
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
We are currently preparing for the 2.28 release and it may take
a while to review this change and the structure of the changes,
and the data itself.
Is it OK if this material is reviewed for 2.29 inclusion (after
August 1st)?
It's fine with me to postpone it for for 2.29 inclusion (after August 1st).
Should I send a reminder in August?
Yes please, ping the original patches again in August and we can
review. In the meantime others may feel free to review, but we won't
consider them for inclusion yet e.g. don't block the release.
--
Cheers,
Carlos.
Egor Kobylkin
2018-08-06 19:00:30 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]

to localedata/locales/ and include it in all your locales going forward.

Patch included inline below.

This is a re-submission for the consideration for 2.29 on a request from
Carlos O'Donell https://sourceware.org/ml/libc-alpha/2018-07/msg00506.html

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.



Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


Root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration has only ASCII codes but still can be read by a native
speaker. Among other things it is useful for processing the Cyrillic
texts and filenames by programs or on systems that are not specifically
prepared to work with Cyrillic, don't have corresponding fonts installed
or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on GOST 7.79-2000 official source
(Federal Agency on Technical Regulating and Metrology Of Russian
Federation [2]). Technically an independent but identical source [3] was
used and prepared in a spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.
However it would not be the standard Russian Cyrillic transliteration as
described above.
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_YU, sr_CS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=8590
[7] translit_cyrillic https://sourceware.org/bugzilla/attachment.cgi?id=8591
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=8618

Best regards,
Egor Kobylkin

---
2018-07-17 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* locales/translit_cyrillic: add Russian GOST 7.79-2000 transliteration
table from Cyrillic to Latin.
* locales/C: add include "translit_cyrillic";"" to LC_CTYPE translit
section.
* locales/aa_DJ: likewise
* locales/af_ZA: likewise
* locales/ak_GH: likewise
* locales/am_ET: likewise
* locales/ar_EG: likewise
* locales/be_BY: likewise
* locales/bem_ZM: likewise
* locales/ber_DZ: likewise
* locales/ber_MA: likewise
* locales/bg_BG: likewise
* locales/bi_VU: likewise
* locales/bn_BD: likewise
* locales/bo_CN: likewise
* locales/ca_ES: likewise
* locales/ce_RU: likewise
* locales/cs_CZ: likewise
* locales/cv_RU: likewise
* locales/cy_GB: likewise
* locales/da_DK: likewise
* locales/de_DE: likewise
* locales/dv_MV: likewise
* locales/dz_BT: likewise
* locales/el_GR: likewise
* locales/en_GB: likewise
* locales/en_NG: likewise
* locales/en_ZM: likewise
* locales/es_CU: likewise
* locales/es_ES: likewise
* locales/et_EE: likewise
* locales/fa_IR: likewise
* locales/ff_SN: likewise
* locales/fi_FI: likewise
* locales/fr_FR: likewise
* locales/ga_IE: likewise
* locales/gd_GB: likewise
* locales/gu_IN: likewise
* locales/gv_GB: likewise
* locales/he_IL: likewise
* locales/hi_IN: likewise
* locales/hif_FJ: likewise
* locales/hr_HR: likewise
* locales/ht_HT: likewise
* locales/hu_HU: likewise
* locales/hy_AM: likewise
* locales/id_ID: likewise
* locales/is_IS: likewise
* locales/it_IT: likewise
* locales/ja_JP: likewise
* locales/kk_KZ: likewise
* locales/km_KH: likewise
* locales/kn_IN: likewise
* locales/ko_KR: likewise
* locales/ks_IN: likewise
* locales/kw_GB: likewise
* locales/lb_LU: likewise
* locales/lg_UG: likewise
* locales/lij_IT: likewise
* locales/ln_CD: likewise
* locales/lo_LA: likewise
* locales/lt_LT: likewise
* locales/lv_LV: likewise
* locales/mg_MG: likewise
* locales/mhr_RU: likewise
* locales/mk_MK: likewise
* locales/ml_IN: likewise
* locales/ms_MY: likewise
* locales/mt_MT: likewise
* locales/***@latin: likewise
* locales/nb_NO: likewise
* locales/ne_NP: likewise
* locales/nhn_MX: likewise
* locales/niu_NU: likewise
* locales/niu_NZ: likewise
* locales/nl_NL: likewise
* locales/nr_ZA: likewise
* locales/oc_FR: likewise
* locales/om_KE: likewise
* locales/or_IN: likewise
* locales/os_RU: likewise
* locales/pa_IN: likewise
* locales/pa_PK: likewise
* locales/pl_PL: likewise
* locales/pt_PT: likewise
* locales/quz_PE: likewise
* locales/ro_RO: likewise
* locales/ru_RU: likewise
* locales/rw_RW: likewise
* locales/sa_IN: likewise
* locales/sd_IN: likewise
* locales/***@devanagari: likewise
* locales/sd_PK: likewise
* locales/se_NO: likewise
* locales/sgs_LT: likewise
* locales/si_LK: likewise
* locales/sk_SK: likewise
* locales/sl_SI: likewise
* locales/sm_WS: likewise
* locales/so_SO: likewise
* locales/sq_AL: likewise
* locales/ss_ZA: likewise
* locales/st_ZA: likewise
* locales/sv_SE: likewise
* locales/sw_KE: likewise
* locales/ta_IN: likewise
* locales/te_IN: likewise
* locales/th_TH: likewise
* locales/ti_ET: likewise
* locales/tn_ZA: likewise
* locales/to_TO: likewise
* locales/tpi_PG: likewise
* locales/tr_TR: likewise
* locales/ts_ZA: likewise
* locales/unm_US: likewise
* locales/ur_IN: likewise
* locales/ur_PK: likewise
* locales/ve_ZA: likewise
* locales/vi_VN: likewise
* locales/wa_BE: likewise
* locales/wo_SN: likewise
* locales/xh_ZA: likewise
* locales/yi_US: likewise
* locales/zh_CN: likewise
* locales/zu_ZA: likewise


diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/C 2018-07-17 17:55:47.000000000 +0000
@@ -2292,6 +2292,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-07-17 17:55:47.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-07-17 17:55:47.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-07-17 17:55:47.000000000 +0000
@@ -56,6 +56,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-07-17 17:55:47.000000000 +0000
@@ -1396,6 +1396,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-07-17 17:55:48.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-07-17 17:55:48.000000000 +0000
@@ -166,6 +166,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-07-17 17:55:48.000000000 +0000
@@ -86,6 +86,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-07-17 17:55:48.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-07-17 17:55:48.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-07-17 17:55:48.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-07-17 17:55:48.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-07-17 17:55:48.000000000 +0000
@@ -2311,6 +2311,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-07-17 17:55:48.000000000 +0000
@@ -109,6 +109,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-07-17 17:55:48.000000000 +0000
@@ -167,6 +167,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-07-17 17:55:48.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-07-17 17:55:48.000000000 +0000
@@ -52,6 +52,7 @@
include "translit_combining";""


+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-07-17 17:55:48.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-07-17 17:55:48.000000000 +0000
@@ -55,6 +55,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-07-17 17:55:48.000000000 +0000
@@ -50,6 +50,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-07-17 17:55:49.000000000 +0000
@@ -73,6 +73,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-07-17 17:55:49.000000000 +0000
@@ -109,6 +109,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-07-17 17:55:49.000000000 +0000
@@ -79,6 +79,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-07-17 17:55:49.000000000 +0000
@@ -42,6 +42,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-07-17 17:55:49.000000000 +0000
@@ -137,6 +137,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-07-17 17:55:49.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-07-17 17:55:49.000000000 +0000
@@ -47,6 +47,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-07-17 17:55:49.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-07-17 17:55:49.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-07-17 17:55:49.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-07-17 17:55:49.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-07-17 17:55:49.000000000 +0000
@@ -153,6 +153,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-07-17 17:55:49.000000000 +0000
@@ -478,6 +478,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-07-17 17:55:49.000000000 +0000
@@ -77,6 +77,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/id_ID 2018-07-17 17:55:49.000000000 +0000
@@ -55,6 +55,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-07-17 17:55:49.000000000 +0000
@@ -2161,6 +2161,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-07-17 17:55:49.000000000 +0000
@@ -1682,6 +1682,7 @@
include "translit_combining";""
include "translit_cjk_variants";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-07-17 17:55:50.000000000 +0000
@@ -158,6 +158,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-07-17 17:55:50.000000000 +0000
@@ -873,6 +873,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-07-17 17:55:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-07-17 17:55:50.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-07-17 17:55:50.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-07-17 17:55:50.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-07-17 17:55:50.000000000 +0000
@@ -78,6 +78,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "<U0065><U005E>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-07-17 17:55:50.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-07-17 17:55:50.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-07-17 17:55:50.000000000 +0000
@@ -51,6 +51,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-07-17 17:55:50.000000000 +0000
@@ -77,6 +77,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-07-17 17:55:50.000000000 +0000
@@ -2122,6 +2122,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-07-17 17:55:50.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-07-17 17:55:50.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-07-17 17:55:50.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-07-17 17:55:50.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-07-17 17:55:50.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@latin
b/localedata/locales/***@latin
--- a/localedata/locales/***@latin 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/***@latin 2018-07-17 17:55:50.000000000 +0000
@@ -53,6 +53,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-07-17 17:55:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-07-17 17:55:50.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-07-17 17:55:51.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/om_KE 2018-07-17 17:55:51.000000000 +0000
@@ -140,6 +140,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/or_IN 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/os_RU 2018-07-17 17:55:51.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-07-17 17:55:51.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-07-17 17:55:51.000000000 +0000
@@ -142,6 +142,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-07-17 17:55:51.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-07-17 17:55:51.000000000 +0000
@@ -144,6 +144,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-07-17 17:55:51.000000000 +0000
@@ -74,6 +74,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-07-17 17:55:51.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-07-17 17:55:51.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@devanagari
b/localedata/locales/***@devanagari
--- a/localedata/locales/***@devanagari 2018-07-17 17:49:19.000000000
+0000
+++ b/localedata/locales/***@devanagari 2018-07-17 17:55:51.000000000
+0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-07-17 17:55:51.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-07-17 17:55:51.000000000 +0000
@@ -205,6 +205,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-07-17 17:55:52.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-07-17 17:55:52.000000000 +0000
@@ -91,6 +91,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-07-17 17:55:52.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-07-17 17:55:52.000000000 +0000
@@ -139,6 +139,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-07-17 17:55:52.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-07-17 17:55:52.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-07-17 17:55:52.000000000 +0000
@@ -866,6 +866,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>

include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-07-17 17:55:52.000000000 +0000
@@ -36,6 +36,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-07-17 17:55:52.000000000 +0000
@@ -2430,6 +2430,7 @@

% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-07-17 17:55:52.000000000
+0000
@@ -0,0 +1,151 @@
+escape_char /
+comment_char %
+
+% Transliterations that converts cyrillic letters to ascii symbols
inspired by GOST 7.79-2000
+% https://sourceware.org/bugzilla/show_bug.cgi?id=2872
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=8590
+% Up to three characters are required to do a reversible transliteration.
+
+LC_CTYPE
+
+translit_start
+
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> "<U0059><U004F>";<U0059>
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> "<U005A><U0048>";<U005A>
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> "<U0043><U005A>";<U0043>
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> "<U0043><U0048>";<U0043>
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> "<U0053><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> "<U0053><U0048><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> "<U0060><U0060>";<U0060>
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> "<U0059><U0027>";<U0059>
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> "<U0045><U0060>";<U0045>
+% CYRILLIC CAPITAL LETTER YU
+<U042E> "<U0059><U0055>";<U0059>
+% CYRILLIC CAPITAL LETTER YA
+<U042F> "<U0059><U0041>";<U0059>
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> "<U007A><U0068>";<U007A>
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> "<U0063><U007A>";<U0063>
+% CYRILLIC SMALL LETTER CHE
+<U0447> "<U0063><U0068>";<U0063>
+% CYRILLIC SMALL LETTER SHA
+<U0448> "<U0073><U0068>";<U0073>
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> "<U0073><U0068><U0068>";<U0073>
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> "<U0060><U0060>";<U0060>
+% CYRILLIC SMALL LETTER YERU
+<U044B> "<U0079><U0027>";<U0079>
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> "<U0065><U0060>";<U0065>
+% CYRILLIC SMALL LETTER YU
+<U044E> "<U0079><U0075>";<U0079>
+% CYRILLIC SMALL LETTER YA
+<U044F> "<U0079><U0061>";<U0079>
+% CYRILLIC SMALL LETTER IO
+<U0451> "<U0079><U006F>";<U0079>
+
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-07-17 17:55:52.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-07-17 17:55:53.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-07-17 17:55:53.000000000 +0000
@@ -69,6 +69,7 @@
<U00C5> "<U0041><U030A>";"<U0041>";"<U0041><U0055>"
<U00E5> "<U0061><U030A>";"<U0061>";"<U0061><U0075>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-07-17 17:55:53.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-07-17 17:55:53.000000000 +0000
@@ -73,6 +73,7 @@
<U05F0> "<U05D5><U05D5>";"<U0077><U0077>"
<U05F1> "<U05D5><U05D9>";"<U0077><U006A>"
<U05F2> "<U05D9><U05D9>";"<U006A><U006A>"
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-07-17 17:49:22.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Egor Kobylkin
2018-10-03 08:26:40 UTC
Permalink
Ping.

Absent of feedback I am wondering if anything could be missing in this
patch from the maintainers standpoint. More than two months have passed
since the original submission.

If I can be of assistance, please do not hesitate to contact me,
Egor Kobylkin
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]
add Cyrillic transliteration table translit_cyrillic file
https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]
to localedata/locales/ and include it in all your locales going forward.
Patch included inline below.
This is a re-submission for the consideration for 2.29 on a request from
Carlos O'Donell https://sourceware.org/ml/libc-alpha/2018-07/msg00506.html
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.
The glibc wiki explicitly lists this use case as the test example
LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt
currently it fails on Cyrillic texts in most locales including ru_RU [1]
LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC
CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.
- It produces a string of question marks and spaces.
CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.
The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.
While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration has only ASCII codes but still can be read by a native
speaker. Among other things it is useful for processing the Cyrillic
texts and filenames by programs or on systems that are not specifically
prepared to work with Cyrillic, don't have corresponding fonts installed
or can't handle UTF-8.
The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on GOST 7.79-2000 official source
(Federal Agency on Technical Regulating and Metrology Of Russian
Federation [2]). Technically an independent but identical source [3] was
used and prepared in a spreadsheet [6].
The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.
The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.
However it would not be the standard Russian Cyrillic transliteration as
described above.
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
exclusion.
[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=8590
[7] translit_cyrillic https://sourceware.org/bugzilla/attachment.cgi?id=8591
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=8618
Best regards,
Egor Kobylkin
---
[BZ #2872]
* locales/translit_cyrillic: add Russian GOST 7.79-2000 transliteration
table from Cyrillic to Latin.
* locales/C: add include "translit_cyrillic";"" to LC_CTYPE translit
section.
* locales/aa_DJ: likewise
* locales/af_ZA: likewise
* locales/ak_GH: likewise
* locales/am_ET: likewise
* locales/ar_EG: likewise
* locales/be_BY: likewise
* locales/bem_ZM: likewise
* locales/ber_DZ: likewise
* locales/ber_MA: likewise
* locales/bg_BG: likewise
* locales/bi_VU: likewise
* locales/bn_BD: likewise
* locales/bo_CN: likewise
* locales/ca_ES: likewise
* locales/ce_RU: likewise
* locales/cs_CZ: likewise
* locales/cv_RU: likewise
* locales/cy_GB: likewise
* locales/da_DK: likewise
* locales/de_DE: likewise
* locales/dv_MV: likewise
* locales/dz_BT: likewise
* locales/el_GR: likewise
* locales/en_GB: likewise
* locales/en_NG: likewise
* locales/en_ZM: likewise
* locales/es_CU: likewise
* locales/es_ES: likewise
* locales/et_EE: likewise
* locales/fa_IR: likewise
* locales/ff_SN: likewise
* locales/fi_FI: likewise
* locales/fr_FR: likewise
* locales/ga_IE: likewise
* locales/gd_GB: likewise
* locales/gu_IN: likewise
* locales/gv_GB: likewise
* locales/he_IL: likewise
* locales/hi_IN: likewise
* locales/hif_FJ: likewise
* locales/hr_HR: likewise
* locales/ht_HT: likewise
* locales/hu_HU: likewise
* locales/hy_AM: likewise
* locales/id_ID: likewise
* locales/is_IS: likewise
* locales/it_IT: likewise
* locales/ja_JP: likewise
* locales/kk_KZ: likewise
* locales/km_KH: likewise
* locales/kn_IN: likewise
* locales/ko_KR: likewise
* locales/ks_IN: likewise
* locales/kw_GB: likewise
* locales/lb_LU: likewise
* locales/lg_UG: likewise
* locales/lij_IT: likewise
* locales/ln_CD: likewise
* locales/lo_LA: likewise
* locales/lt_LT: likewise
* locales/lv_LV: likewise
* locales/mg_MG: likewise
* locales/mhr_RU: likewise
* locales/mk_MK: likewise
* locales/ml_IN: likewise
* locales/ms_MY: likewise
* locales/mt_MT: likewise
* locales/nb_NO: likewise
* locales/ne_NP: likewise
* locales/nhn_MX: likewise
* locales/niu_NU: likewise
* locales/niu_NZ: likewise
* locales/nl_NL: likewise
* locales/nr_ZA: likewise
* locales/oc_FR: likewise
* locales/om_KE: likewise
* locales/or_IN: likewise
* locales/os_RU: likewise
* locales/pa_IN: likewise
* locales/pa_PK: likewise
* locales/pl_PL: likewise
* locales/pt_PT: likewise
* locales/quz_PE: likewise
* locales/ro_RO: likewise
* locales/ru_RU: likewise
* locales/rw_RW: likewise
* locales/sa_IN: likewise
* locales/sd_IN: likewise
* locales/sd_PK: likewise
* locales/se_NO: likewise
* locales/sgs_LT: likewise
* locales/si_LK: likewise
* locales/sk_SK: likewise
* locales/sl_SI: likewise
* locales/sm_WS: likewise
* locales/so_SO: likewise
* locales/sq_AL: likewise
* locales/ss_ZA: likewise
* locales/st_ZA: likewise
* locales/sv_SE: likewise
* locales/sw_KE: likewise
* locales/ta_IN: likewise
* locales/te_IN: likewise
* locales/th_TH: likewise
* locales/ti_ET: likewise
* locales/tn_ZA: likewise
* locales/to_TO: likewise
* locales/tpi_PG: likewise
* locales/tr_TR: likewise
* locales/ts_ZA: likewise
* locales/unm_US: likewise
* locales/ur_IN: likewise
* locales/ur_PK: likewise
* locales/ve_ZA: likewise
* locales/vi_VN: likewise
* locales/wa_BE: likewise
* locales/wo_SN: likewise
* locales/xh_ZA: likewise
* locales/yi_US: likewise
* locales/zh_CN: likewise
* locales/zu_ZA: likewise
diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/C 2018-07-17 17:55:47.000000000 +0000
@@ -2292,6 +2292,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-07-17 17:55:47.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-07-17 17:55:47.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-07-17 17:55:47.000000000 +0000
@@ -56,6 +56,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-07-17 17:55:47.000000000 +0000
@@ -1396,6 +1396,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-07-17 17:55:48.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-07-17 17:55:48.000000000 +0000
@@ -166,6 +166,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-07-17 17:55:48.000000000 +0000
@@ -86,6 +86,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-07-17 17:55:48.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-07-17 17:55:48.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-07-17 17:55:48.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-07-17 17:55:48.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-07-17 17:55:48.000000000 +0000
@@ -2311,6 +2311,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-07-17 17:55:48.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-07-17 17:55:48.000000000 +0000
@@ -167,6 +167,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-07-17 17:55:48.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-07-17 17:55:48.000000000 +0000
@@ -52,6 +52,7 @@
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-07-17 17:55:48.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-07-17 17:55:48.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-07-17 17:55:48.000000000 +0000
@@ -50,6 +50,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-07-17 17:55:49.000000000 +0000
@@ -73,6 +73,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-07-17 17:55:49.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-07-17 17:55:49.000000000 +0000
@@ -79,6 +79,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-07-17 17:55:49.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-07-17 17:55:49.000000000 +0000
@@ -137,6 +137,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-07-17 17:55:49.000000000 +0000
@@ -54,6 +54,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-07-17 17:55:49.000000000 +0000
@@ -47,6 +47,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-07-17 17:55:49.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-07-17 17:55:49.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-07-17 17:55:49.000000000 +0000
@@ -61,6 +61,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-07-17 17:55:49.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-07-17 17:55:49.000000000 +0000
@@ -153,6 +153,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-07-17 17:55:49.000000000 +0000
@@ -478,6 +478,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-07-17 17:55:49.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/id_ID 2018-07-17 17:55:49.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-07-17 17:55:49.000000000 +0000
@@ -2161,6 +2161,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-07-17 17:55:49.000000000 +0000
@@ -1682,6 +1682,7 @@
include "translit_combining";""
include "translit_cjk_variants";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-07-17 17:55:50.000000000 +0000
@@ -158,6 +158,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-07-17 17:55:50.000000000 +0000
@@ -873,6 +873,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-07-17 17:55:50.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-07-17 17:55:50.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-07-17 17:55:50.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-07-17 17:55:50.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-07-17 17:55:50.000000000 +0000
@@ -78,6 +78,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "<U0065><U005E>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-07-17 17:55:50.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-07-17 17:55:50.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-07-17 17:55:50.000000000 +0000
@@ -51,6 +51,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-07-17 17:55:50.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-07-17 17:55:50.000000000 +0000
@@ -2122,6 +2122,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-07-17 17:55:50.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-07-17 17:55:50.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-07-17 17:55:50.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-07-17 17:55:50.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-07-17 17:55:50.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
@@ -53,6 +53,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-07-17 17:55:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-07-17 17:55:50.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-07-17 17:55:51.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/om_KE 2018-07-17 17:55:51.000000000 +0000
@@ -140,6 +140,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/or_IN 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/os_RU 2018-07-17 17:55:51.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-07-17 17:55:51.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-07-17 17:55:51.000000000 +0000
@@ -142,6 +142,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-07-17 17:55:51.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-07-17 17:55:51.000000000 +0000
@@ -144,6 +144,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-07-17 17:55:51.000000000 +0000
@@ -74,6 +74,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-07-17 17:55:51.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-07-17 17:55:51.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
+0000
+0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-07-17 17:55:51.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-07-17 17:55:51.000000000 +0000
@@ -205,6 +205,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-07-17 17:55:52.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-07-17 17:55:52.000000000 +0000
@@ -91,6 +91,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-07-17 17:55:52.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-07-17 17:55:52.000000000 +0000
@@ -139,6 +139,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-07-17 17:55:52.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-07-17 17:55:52.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-07-17 17:55:52.000000000 +0000
@@ -866,6 +866,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-07-17 17:55:52.000000000 +0000
@@ -36,6 +36,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-07-17 17:55:52.000000000 +0000
@@ -2430,6 +2430,7 @@
% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-07-17 17:55:52.000000000
+0000
@@ -0,0 +1,151 @@
+escape_char /
+comment_char %
+
+% Transliterations that converts cyrillic letters to ascii symbols
inspired by GOST 7.79-2000
+% https://sourceware.org/bugzilla/show_bug.cgi?id=2872
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=8590
+% Up to three characters are required to do a reversible transliteration.
+
+LC_CTYPE
+
+translit_start
+
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> "<U0059><U004F>";<U0059>
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> "<U005A><U0048>";<U005A>
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> "<U0043><U005A>";<U0043>
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> "<U0043><U0048>";<U0043>
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> "<U0053><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> "<U0053><U0048><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> "<U0060><U0060>";<U0060>
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> "<U0059><U0027>";<U0059>
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> "<U0045><U0060>";<U0045>
+% CYRILLIC CAPITAL LETTER YU
+<U042E> "<U0059><U0055>";<U0059>
+% CYRILLIC CAPITAL LETTER YA
+<U042F> "<U0059><U0041>";<U0059>
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> "<U007A><U0068>";<U007A>
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> "<U0063><U007A>";<U0063>
+% CYRILLIC SMALL LETTER CHE
+<U0447> "<U0063><U0068>";<U0063>
+% CYRILLIC SMALL LETTER SHA
+<U0448> "<U0073><U0068>";<U0073>
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> "<U0073><U0068><U0068>";<U0073>
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> "<U0060><U0060>";<U0060>
+% CYRILLIC SMALL LETTER YERU
+<U044B> "<U0079><U0027>";<U0079>
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> "<U0065><U0060>";<U0065>
+% CYRILLIC SMALL LETTER YU
+<U044E> "<U0079><U0075>";<U0079>
+% CYRILLIC SMALL LETTER YA
+<U044F> "<U0079><U0061>";<U0079>
+% CYRILLIC SMALL LETTER IO
+<U0451> "<U0079><U006F>";<U0079>
+
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-07-17 17:55:52.000000000 +0000
@@ -48,6 +48,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-07-17 17:55:53.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -67,6 +67,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-07-17 17:55:53.000000000 +0000
@@ -69,6 +69,7 @@
<U00C5> "<U0041><U030A>";"<U0041>";"<U0041><U0055>"
<U00E5> "<U0061><U030A>";"<U0061>";"<U0061><U0075>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-07-17 17:55:53.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-07-17 17:55:53.000000000 +0000
@@ -73,6 +73,7 @@
<U05F0> "<U05D5><U05D5>";"<U0077><U0077>"
<U05F1> "<U05D5><U05D9>";"<U0077><U006A>"
<U05F2> "<U05D9><U05D9>";"<U006A><U006A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-07-17 17:49:22.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Keld Simonsen
2018-10-03 09:19:49 UTC
Permalink
Hi

Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.

But do go forward with fixing this bug.

Best regards
Keld
Post by Egor Kobylkin
Ping.
Absent of feedback I am wondering if anything could be missing in this
patch from the maintainers standpoint. More than two months have passed
since the original submission.
If I can be of assistance, please do not hesitate to contact me,
Egor Kobylkin
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]
add Cyrillic transliteration table translit_cyrillic file
https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]
to localedata/locales/ and include it in all your locales going forward.
Patch included inline below.
This is a re-submission for the consideration for 2.29 on a request from
Carlos O'Donell https://sourceware.org/ml/libc-alpha/2018-07/msg00506.html
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.
The glibc wiki explicitly lists this use case as the test example
LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt
currently it fails on Cyrillic texts in most locales including ru_RU [1]
LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC
CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.
- It produces a string of question marks and spaces.
CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.
The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.
While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration has only ASCII codes but still can be read by a native
speaker. Among other things it is useful for processing the Cyrillic
texts and filenames by programs or on systems that are not specifically
prepared to work with Cyrillic, don't have corresponding fonts installed
or can't handle UTF-8.
The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on GOST 7.79-2000 official source
(Federal Agency on Technical Regulating and Metrology Of Russian
Federation [2]). Technically an independent but identical source [3] was
used and prepared in a spreadsheet [6].
The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.
The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.
However it would not be the standard Russian Cyrillic transliteration as
described above.
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
exclusion.
[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=8590
[7] translit_cyrillic https://sourceware.org/bugzilla/attachment.cgi?id=8591
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=8618
Best regards,
Egor Kobylkin
---
[BZ #2872]
* locales/translit_cyrillic: add Russian GOST 7.79-2000 transliteration
table from Cyrillic to Latin.
* locales/C: add include "translit_cyrillic";"" to LC_CTYPE translit
section.
* locales/aa_DJ: likewise
* locales/af_ZA: likewise
* locales/ak_GH: likewise
* locales/am_ET: likewise
* locales/ar_EG: likewise
* locales/be_BY: likewise
* locales/bem_ZM: likewise
* locales/ber_DZ: likewise
* locales/ber_MA: likewise
* locales/bg_BG: likewise
* locales/bi_VU: likewise
* locales/bn_BD: likewise
* locales/bo_CN: likewise
* locales/ca_ES: likewise
* locales/ce_RU: likewise
* locales/cs_CZ: likewise
* locales/cv_RU: likewise
* locales/cy_GB: likewise
* locales/da_DK: likewise
* locales/de_DE: likewise
* locales/dv_MV: likewise
* locales/dz_BT: likewise
* locales/el_GR: likewise
* locales/en_GB: likewise
* locales/en_NG: likewise
* locales/en_ZM: likewise
* locales/es_CU: likewise
* locales/es_ES: likewise
* locales/et_EE: likewise
* locales/fa_IR: likewise
* locales/ff_SN: likewise
* locales/fi_FI: likewise
* locales/fr_FR: likewise
* locales/ga_IE: likewise
* locales/gd_GB: likewise
* locales/gu_IN: likewise
* locales/gv_GB: likewise
* locales/he_IL: likewise
* locales/hi_IN: likewise
* locales/hif_FJ: likewise
* locales/hr_HR: likewise
* locales/ht_HT: likewise
* locales/hu_HU: likewise
* locales/hy_AM: likewise
* locales/id_ID: likewise
* locales/is_IS: likewise
* locales/it_IT: likewise
* locales/ja_JP: likewise
* locales/kk_KZ: likewise
* locales/km_KH: likewise
* locales/kn_IN: likewise
* locales/ko_KR: likewise
* locales/ks_IN: likewise
* locales/kw_GB: likewise
* locales/lb_LU: likewise
* locales/lg_UG: likewise
* locales/lij_IT: likewise
* locales/ln_CD: likewise
* locales/lo_LA: likewise
* locales/lt_LT: likewise
* locales/lv_LV: likewise
* locales/mg_MG: likewise
* locales/mhr_RU: likewise
* locales/mk_MK: likewise
* locales/ml_IN: likewise
* locales/ms_MY: likewise
* locales/mt_MT: likewise
* locales/nb_NO: likewise
* locales/ne_NP: likewise
* locales/nhn_MX: likewise
* locales/niu_NU: likewise
* locales/niu_NZ: likewise
* locales/nl_NL: likewise
* locales/nr_ZA: likewise
* locales/oc_FR: likewise
* locales/om_KE: likewise
* locales/or_IN: likewise
* locales/os_RU: likewise
* locales/pa_IN: likewise
* locales/pa_PK: likewise
* locales/pl_PL: likewise
* locales/pt_PT: likewise
* locales/quz_PE: likewise
* locales/ro_RO: likewise
* locales/ru_RU: likewise
* locales/rw_RW: likewise
* locales/sa_IN: likewise
* locales/sd_IN: likewise
* locales/sd_PK: likewise
* locales/se_NO: likewise
* locales/sgs_LT: likewise
* locales/si_LK: likewise
* locales/sk_SK: likewise
* locales/sl_SI: likewise
* locales/sm_WS: likewise
* locales/so_SO: likewise
* locales/sq_AL: likewise
* locales/ss_ZA: likewise
* locales/st_ZA: likewise
* locales/sv_SE: likewise
* locales/sw_KE: likewise
* locales/ta_IN: likewise
* locales/te_IN: likewise
* locales/th_TH: likewise
* locales/ti_ET: likewise
* locales/tn_ZA: likewise
* locales/to_TO: likewise
* locales/tpi_PG: likewise
* locales/tr_TR: likewise
* locales/ts_ZA: likewise
* locales/unm_US: likewise
* locales/ur_IN: likewise
* locales/ur_PK: likewise
* locales/ve_ZA: likewise
* locales/vi_VN: likewise
* locales/wa_BE: likewise
* locales/wo_SN: likewise
* locales/xh_ZA: likewise
* locales/yi_US: likewise
* locales/zh_CN: likewise
* locales/zu_ZA: likewise
diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/C 2018-07-17 17:55:47.000000000 +0000
@@ -2292,6 +2292,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-07-17 17:55:47.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-07-17 17:55:47.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-07-17 17:55:47.000000000 +0000
@@ -56,6 +56,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-07-17 17:55:47.000000000 +0000
@@ -1396,6 +1396,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-07-17 17:55:48.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-07-17 17:55:48.000000000 +0000
@@ -166,6 +166,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-07-17 17:55:48.000000000 +0000
@@ -86,6 +86,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-07-17 17:55:48.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-07-17 17:55:48.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-07-17 17:55:48.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-07-17 17:55:48.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-07-17 17:55:48.000000000 +0000
@@ -2311,6 +2311,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-07-17 17:55:48.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-07-17 17:55:48.000000000 +0000
@@ -167,6 +167,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-07-17 17:55:48.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-07-17 17:55:48.000000000 +0000
@@ -52,6 +52,7 @@
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-07-17 17:55:48.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-07-17 17:55:48.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-07-17 17:55:48.000000000 +0000
@@ -50,6 +50,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-07-17 17:55:49.000000000 +0000
@@ -73,6 +73,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-07-17 17:55:49.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-07-17 17:55:49.000000000 +0000
@@ -79,6 +79,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-07-17 17:55:49.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-07-17 17:55:49.000000000 +0000
@@ -137,6 +137,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-07-17 17:55:49.000000000 +0000
@@ -54,6 +54,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-07-17 17:55:49.000000000 +0000
@@ -47,6 +47,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-07-17 17:55:49.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-07-17 17:55:49.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-07-17 17:55:49.000000000 +0000
@@ -61,6 +61,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-07-17 17:55:49.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-07-17 17:55:49.000000000 +0000
@@ -153,6 +153,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-07-17 17:55:49.000000000 +0000
@@ -478,6 +478,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-07-17 17:55:49.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/id_ID 2018-07-17 17:55:49.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-07-17 17:55:49.000000000 +0000
@@ -2161,6 +2161,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-07-17 17:55:49.000000000 +0000
@@ -1682,6 +1682,7 @@
include "translit_combining";""
include "translit_cjk_variants";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-07-17 17:55:50.000000000 +0000
@@ -158,6 +158,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-07-17 17:55:50.000000000 +0000
@@ -873,6 +873,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-07-17 17:55:50.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-07-17 17:55:50.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-07-17 17:55:50.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-07-17 17:55:50.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-07-17 17:55:50.000000000 +0000
@@ -78,6 +78,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "<U0065><U005E>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-07-17 17:55:50.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-07-17 17:55:50.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-07-17 17:55:50.000000000 +0000
@@ -51,6 +51,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-07-17 17:55:50.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-07-17 17:55:50.000000000 +0000
@@ -2122,6 +2122,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-07-17 17:55:50.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-07-17 17:55:50.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-07-17 17:55:50.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-07-17 17:55:50.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-07-17 17:55:50.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
@@ -53,6 +53,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-07-17 17:55:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-07-17 17:55:50.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-07-17 17:55:51.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/om_KE 2018-07-17 17:55:51.000000000 +0000
@@ -140,6 +140,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/or_IN 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/os_RU 2018-07-17 17:55:51.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-07-17 17:55:51.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-07-17 17:55:51.000000000 +0000
@@ -142,6 +142,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-07-17 17:55:51.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-07-17 17:55:51.000000000 +0000
@@ -144,6 +144,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-07-17 17:55:51.000000000 +0000
@@ -74,6 +74,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-07-17 17:55:51.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-07-17 17:55:51.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
+0000
+0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-07-17 17:55:51.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-07-17 17:55:51.000000000 +0000
@@ -205,6 +205,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-07-17 17:55:52.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-07-17 17:55:52.000000000 +0000
@@ -91,6 +91,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-07-17 17:55:52.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-07-17 17:55:52.000000000 +0000
@@ -139,6 +139,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-07-17 17:55:52.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-07-17 17:55:52.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-07-17 17:55:52.000000000 +0000
@@ -866,6 +866,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-07-17 17:55:52.000000000 +0000
@@ -36,6 +36,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-07-17 17:55:52.000000000 +0000
@@ -2430,6 +2430,7 @@
% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-07-17 17:55:52.000000000
+0000
@@ -0,0 +1,151 @@
+escape_char /
+comment_char %
+
+% Transliterations that converts cyrillic letters to ascii symbols
inspired by GOST 7.79-2000
+% https://sourceware.org/bugzilla/show_bug.cgi?id=2872
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=8590
+% Up to three characters are required to do a reversible transliteration.
+
+LC_CTYPE
+
+translit_start
+
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> "<U0059><U004F>";<U0059>
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> "<U005A><U0048>";<U005A>
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> "<U0043><U005A>";<U0043>
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> "<U0043><U0048>";<U0043>
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> "<U0053><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> "<U0053><U0048><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> "<U0060><U0060>";<U0060>
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> "<U0059><U0027>";<U0059>
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> "<U0045><U0060>";<U0045>
+% CYRILLIC CAPITAL LETTER YU
+<U042E> "<U0059><U0055>";<U0059>
+% CYRILLIC CAPITAL LETTER YA
+<U042F> "<U0059><U0041>";<U0059>
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> "<U007A><U0068>";<U007A>
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> "<U0063><U007A>";<U0063>
+% CYRILLIC SMALL LETTER CHE
+<U0447> "<U0063><U0068>";<U0063>
+% CYRILLIC SMALL LETTER SHA
+<U0448> "<U0073><U0068>";<U0073>
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> "<U0073><U0068><U0068>";<U0073>
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> "<U0060><U0060>";<U0060>
+% CYRILLIC SMALL LETTER YERU
+<U044B> "<U0079><U0027>";<U0079>
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> "<U0065><U0060>";<U0065>
+% CYRILLIC SMALL LETTER YU
+<U044E> "<U0079><U0075>";<U0079>
+% CYRILLIC SMALL LETTER YA
+<U044F> "<U0079><U0061>";<U0079>
+% CYRILLIC SMALL LETTER IO
+<U0451> "<U0079><U006F>";<U0079>
+
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-07-17 17:55:52.000000000 +0000
@@ -48,6 +48,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-07-17 17:55:53.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -67,6 +67,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-07-17 17:55:53.000000000 +0000
@@ -69,6 +69,7 @@
<U00C5> "<U0041><U030A>";"<U0041>";"<U0041><U0055>"
<U00E5> "<U0061><U030A>";"<U0061>";"<U0061><U0075>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-07-17 17:55:53.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-07-17 17:55:53.000000000 +0000
@@ -73,6 +73,7 @@
<U05F0> "<U05D5><U05D5>";"<U0077><U0077>"
<U05F1> "<U05D5><U05D9>";"<U0077><U006A>"
<U05F2> "<U05D9><U05D9>";"<U006A><U006A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-07-17 17:49:22.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Egor Kobylkin
2018-10-03 09:32:00 UTC
Permalink
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!

Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?

That is:
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.

--Egor
Post by Keld Simonsen
But do go forward with fixing this bug.
Best regards
Keld
Post by Egor Kobylkin
Ping.
Absent of feedback I am wondering if anything could be missing in this
patch from the maintainers standpoint. More than two months have passed
since the original submission.
If I can be of assistance, please do not hesitate to contact me,
Egor Kobylkin
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]
add Cyrillic transliteration table translit_cyrillic file
https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]
to localedata/locales/ and include it in all your locales going forward.
Patch included inline below.
This is a re-submission for the consideration for 2.29 on a request from
Carlos O'Donell https://sourceware.org/ml/libc-alpha/2018-07/msg00506.html
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.
The glibc wiki explicitly lists this use case as the test example
LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt
currently it fails on Cyrillic texts in most locales including ru_RU [1]
LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC
CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.
- It produces a string of question marks and spaces.
CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.
The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.
While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration has only ASCII codes but still can be read by a native
speaker. Among other things it is useful for processing the Cyrillic
texts and filenames by programs or on systems that are not specifically
prepared to work with Cyrillic, don't have corresponding fonts installed
or can't handle UTF-8.
The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on GOST 7.79-2000 official source
(Federal Agency on Technical Regulating and Metrology Of Russian
Federation [2]). Technically an independent but identical source [3] was
used and prepared in a spreadsheet [6].
The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.
The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.
However it would not be the standard Russian Cyrillic transliteration as
described above.
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
exclusion.
[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=8590
[7] translit_cyrillic https://sourceware.org/bugzilla/attachment.cgi?id=8591
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=8618
Best regards,
Egor Kobylkin
---
[BZ #2872]
* locales/translit_cyrillic: add Russian GOST 7.79-2000 transliteration
table from Cyrillic to Latin.
* locales/C: add include "translit_cyrillic";"" to LC_CTYPE translit
section.
* locales/aa_DJ: likewise
* locales/af_ZA: likewise
* locales/ak_GH: likewise
* locales/am_ET: likewise
* locales/ar_EG: likewise
* locales/be_BY: likewise
* locales/bem_ZM: likewise
* locales/ber_DZ: likewise
* locales/ber_MA: likewise
* locales/bg_BG: likewise
* locales/bi_VU: likewise
* locales/bn_BD: likewise
* locales/bo_CN: likewise
* locales/ca_ES: likewise
* locales/ce_RU: likewise
* locales/cs_CZ: likewise
* locales/cv_RU: likewise
* locales/cy_GB: likewise
* locales/da_DK: likewise
* locales/de_DE: likewise
* locales/dv_MV: likewise
* locales/dz_BT: likewise
* locales/el_GR: likewise
* locales/en_GB: likewise
* locales/en_NG: likewise
* locales/en_ZM: likewise
* locales/es_CU: likewise
* locales/es_ES: likewise
* locales/et_EE: likewise
* locales/fa_IR: likewise
* locales/ff_SN: likewise
* locales/fi_FI: likewise
* locales/fr_FR: likewise
* locales/ga_IE: likewise
* locales/gd_GB: likewise
* locales/gu_IN: likewise
* locales/gv_GB: likewise
* locales/he_IL: likewise
* locales/hi_IN: likewise
* locales/hif_FJ: likewise
* locales/hr_HR: likewise
* locales/ht_HT: likewise
* locales/hu_HU: likewise
* locales/hy_AM: likewise
* locales/id_ID: likewise
* locales/is_IS: likewise
* locales/it_IT: likewise
* locales/ja_JP: likewise
* locales/kk_KZ: likewise
* locales/km_KH: likewise
* locales/kn_IN: likewise
* locales/ko_KR: likewise
* locales/ks_IN: likewise
* locales/kw_GB: likewise
* locales/lb_LU: likewise
* locales/lg_UG: likewise
* locales/lij_IT: likewise
* locales/ln_CD: likewise
* locales/lo_LA: likewise
* locales/lt_LT: likewise
* locales/lv_LV: likewise
* locales/mg_MG: likewise
* locales/mhr_RU: likewise
* locales/mk_MK: likewise
* locales/ml_IN: likewise
* locales/ms_MY: likewise
* locales/mt_MT: likewise
* locales/nb_NO: likewise
* locales/ne_NP: likewise
* locales/nhn_MX: likewise
* locales/niu_NU: likewise
* locales/niu_NZ: likewise
* locales/nl_NL: likewise
* locales/nr_ZA: likewise
* locales/oc_FR: likewise
* locales/om_KE: likewise
* locales/or_IN: likewise
* locales/os_RU: likewise
* locales/pa_IN: likewise
* locales/pa_PK: likewise
* locales/pl_PL: likewise
* locales/pt_PT: likewise
* locales/quz_PE: likewise
* locales/ro_RO: likewise
* locales/ru_RU: likewise
* locales/rw_RW: likewise
* locales/sa_IN: likewise
* locales/sd_IN: likewise
* locales/sd_PK: likewise
* locales/se_NO: likewise
* locales/sgs_LT: likewise
* locales/si_LK: likewise
* locales/sk_SK: likewise
* locales/sl_SI: likewise
* locales/sm_WS: likewise
* locales/so_SO: likewise
* locales/sq_AL: likewise
* locales/ss_ZA: likewise
* locales/st_ZA: likewise
* locales/sv_SE: likewise
* locales/sw_KE: likewise
* locales/ta_IN: likewise
* locales/te_IN: likewise
* locales/th_TH: likewise
* locales/ti_ET: likewise
* locales/tn_ZA: likewise
* locales/to_TO: likewise
* locales/tpi_PG: likewise
* locales/tr_TR: likewise
* locales/ts_ZA: likewise
* locales/unm_US: likewise
* locales/ur_IN: likewise
* locales/ur_PK: likewise
* locales/ve_ZA: likewise
* locales/vi_VN: likewise
* locales/wa_BE: likewise
* locales/wo_SN: likewise
* locales/xh_ZA: likewise
* locales/yi_US: likewise
* locales/zh_CN: likewise
* locales/zu_ZA: likewise
diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/C 2018-07-17 17:55:47.000000000 +0000
@@ -2292,6 +2292,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-07-17 17:55:47.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-07-17 17:55:47.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-07-17 17:55:47.000000000 +0000
@@ -56,6 +56,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-07-17 17:55:47.000000000 +0000
@@ -1396,6 +1396,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-07-17 17:55:48.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-07-17 17:55:48.000000000 +0000
@@ -166,6 +166,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-07-17 17:55:48.000000000 +0000
@@ -86,6 +86,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-07-17 17:55:48.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-07-17 17:55:48.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-07-17 17:55:48.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-07-17 17:55:48.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-07-17 17:55:48.000000000 +0000
@@ -2311,6 +2311,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-07-17 17:55:48.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-07-17 17:55:48.000000000 +0000
@@ -167,6 +167,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-07-17 17:55:48.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-07-17 17:55:48.000000000 +0000
@@ -52,6 +52,7 @@
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-07-17 17:55:48.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-07-17 17:55:48.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-07-17 17:55:48.000000000 +0000
@@ -50,6 +50,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-07-17 17:55:49.000000000 +0000
@@ -73,6 +73,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-07-17 17:55:49.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-07-17 17:55:49.000000000 +0000
@@ -79,6 +79,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-07-17 17:55:49.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-07-17 17:55:49.000000000 +0000
@@ -137,6 +137,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-07-17 17:55:49.000000000 +0000
@@ -54,6 +54,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-07-17 17:55:49.000000000 +0000
@@ -47,6 +47,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-07-17 17:55:49.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-07-17 17:55:49.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-07-17 17:55:49.000000000 +0000
@@ -61,6 +61,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-07-17 17:55:49.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-07-17 17:55:49.000000000 +0000
@@ -153,6 +153,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-07-17 17:55:49.000000000 +0000
@@ -478,6 +478,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-07-17 17:55:49.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/id_ID 2018-07-17 17:55:49.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-07-17 17:55:49.000000000 +0000
@@ -2161,6 +2161,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-07-17 17:55:49.000000000 +0000
@@ -1682,6 +1682,7 @@
include "translit_combining";""
include "translit_cjk_variants";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-07-17 17:55:50.000000000 +0000
@@ -158,6 +158,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-07-17 17:55:50.000000000 +0000
@@ -873,6 +873,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-07-17 17:55:50.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-07-17 17:55:50.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-07-17 17:55:50.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-07-17 17:55:50.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-07-17 17:55:50.000000000 +0000
@@ -78,6 +78,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "<U0065><U005E>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-07-17 17:55:50.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-07-17 17:55:50.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-07-17 17:55:50.000000000 +0000
@@ -51,6 +51,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-07-17 17:55:50.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-07-17 17:55:50.000000000 +0000
@@ -2122,6 +2122,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-07-17 17:55:50.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-07-17 17:55:50.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-07-17 17:55:50.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-07-17 17:55:50.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-07-17 17:55:50.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
@@ -53,6 +53,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-07-17 17:55:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-07-17 17:55:50.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-07-17 17:55:51.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/om_KE 2018-07-17 17:55:51.000000000 +0000
@@ -140,6 +140,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/or_IN 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/os_RU 2018-07-17 17:55:51.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-07-17 17:55:51.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-07-17 17:55:51.000000000 +0000
@@ -142,6 +142,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-07-17 17:55:51.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-07-17 17:55:51.000000000 +0000
@@ -144,6 +144,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-07-17 17:55:51.000000000 +0000
@@ -74,6 +74,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-07-17 17:55:51.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-07-17 17:55:51.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
+0000
+0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-07-17 17:55:51.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-07-17 17:55:51.000000000 +0000
@@ -205,6 +205,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-07-17 17:55:52.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-07-17 17:55:52.000000000 +0000
@@ -91,6 +91,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-07-17 17:55:52.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-07-17 17:55:52.000000000 +0000
@@ -139,6 +139,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-07-17 17:55:52.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-07-17 17:55:52.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-07-17 17:55:52.000000000 +0000
@@ -866,6 +866,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-07-17 17:55:52.000000000 +0000
@@ -36,6 +36,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-07-17 17:55:52.000000000 +0000
@@ -2430,6 +2430,7 @@
% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-07-17 17:55:52.000000000
+0000
@@ -0,0 +1,151 @@
+escape_char /
+comment_char %
+
+% Transliterations that converts cyrillic letters to ascii symbols
inspired by GOST 7.79-2000
+% https://sourceware.org/bugzilla/show_bug.cgi?id=2872
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=8590
+% Up to three characters are required to do a reversible transliteration.
+
+LC_CTYPE
+
+translit_start
+
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> "<U0059><U004F>";<U0059>
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> "<U005A><U0048>";<U005A>
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> "<U0043><U005A>";<U0043>
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> "<U0043><U0048>";<U0043>
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> "<U0053><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> "<U0053><U0048><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> "<U0060><U0060>";<U0060>
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> "<U0059><U0027>";<U0059>
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> "<U0045><U0060>";<U0045>
+% CYRILLIC CAPITAL LETTER YU
+<U042E> "<U0059><U0055>";<U0059>
+% CYRILLIC CAPITAL LETTER YA
+<U042F> "<U0059><U0041>";<U0059>
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> "<U007A><U0068>";<U007A>
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> "<U0063><U007A>";<U0063>
+% CYRILLIC SMALL LETTER CHE
+<U0447> "<U0063><U0068>";<U0063>
+% CYRILLIC SMALL LETTER SHA
+<U0448> "<U0073><U0068>";<U0073>
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> "<U0073><U0068><U0068>";<U0073>
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> "<U0060><U0060>";<U0060>
+% CYRILLIC SMALL LETTER YERU
+<U044B> "<U0079><U0027>";<U0079>
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> "<U0065><U0060>";<U0065>
+% CYRILLIC SMALL LETTER YU
+<U044E> "<U0079><U0075>";<U0079>
+% CYRILLIC SMALL LETTER YA
+<U044F> "<U0079><U0061>";<U0079>
+% CYRILLIC SMALL LETTER IO
+<U0451> "<U0079><U006F>";<U0079>
+
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-07-17 17:55:52.000000000 +0000
@@ -48,6 +48,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-07-17 17:55:53.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -67,6 +67,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-07-17 17:55:53.000000000 +0000
@@ -69,6 +69,7 @@
<U00C5> "<U0041><U030A>";"<U0041>";"<U0041><U0055>"
<U00E5> "<U0061><U030A>";"<U0061>";"<U0061><U0075>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-07-17 17:55:53.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-07-17 17:55:53.000000000 +0000
@@ -73,6 +73,7 @@
<U05F0> "<U05D5><U05D5>";"<U0077><U0077>"
<U05F1> "<U05D5><U05D9>";"<U0077><U006A>"
<U05F2> "<U05D9><U05D9>";"<U006A><U006A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-07-17 17:49:22.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Marko Myllynen
2018-10-05 08:43:46 UTC
Permalink
Hi Egor,

Thanks for your patience with this one.
Post by Egor Kobylkin
Post by Keld Simonsen
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.
The Wikipedia article https://en.wikipedia.org/wiki/ISO_9 helps to
understand that ISO 9:1995 and GOST 7.79-2000 System A are identical so
perhaps you could mention both ISO 9 and the Wikipedia article in the
commit log. translit_cyrillic includes every transliteration defined in
ISO 9:1995 and GOST 7.79-2000, correct?

I think those locales which already have Cyrillic transliteration
defined it would be best to leave them as-is (as you've done) unless
there are some issues with them, there's probably a good reason why they
have been added in the first place.

For other locales, using ISO 9 instead of not doing transliteration at
all may not be entirely correct but I'd suppose it's better to provide
at least some sort of transliteration (even if not entirely correct)
than sequences of question marks. But as you say, locale maintainers may
know better the case for individual locales.

Wrt language-specific differences Keld mentioned, Finnish Wikipedia
article on transliteration gives an example, see the table on right at
https://fi.wikipedia.org/wiki/Siirtokirjoitus for Russian /
international / Finnish / Swedish / English / French / German / Polish /
phonetic transliteration of a Russian name. (The table also shows that
for correct transliteration ASCII letters are not enough for some
languages.)

Some of the differences and language-specific aspects are probably
impossible to take fully into account within the locale system we have
today. For example, in Finnish (the tables at
http://jkorpela.fi/iso9.html8 and
https://fi.wikipedia.org/wiki/Ven%C3%A4j%C3%A4n_translitterointi might
also be helpful):

1) transliteration of Russian is mostly as per ISO 9 but with national
differences defined in SFS 4900
2) transliteration of Russian and Ukrainian names have some slight
differences according to http://jkorpela.fi/iso9.html8
3) transliteration of a letter depends on its position within a word or
pronunciation of adjacent letters, for example U+0435 becomes U+0065 (e)
except when at the beginning of a word it becomes U+006A U+0065 (je)

Hopefully we'll hear comments from others as well. Once your patch is
merged, I'll try to come up with the needed locale-specific changes for
fi_FI, some differences referred to in 1) above are straightforward to
implement but for 2) and 3) some compromises probably need to be made,
unfortunately.

Thanks,
Post by Egor Kobylkin
Post by Keld Simonsen
Post by Egor Kobylkin
Ping.
Absent of feedback I am wondering if anything could be missing in this
patch from the maintainers standpoint. More than two months have passed
since the original submission.
If I can be of assistance, please do not hesitate to contact me,
Egor Kobylkin
Post by Egor Kobylkin
Dear locale maintainers,
fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]
add Cyrillic transliteration table translit_cyrillic file
https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]
to localedata/locales/ and include it in all your locales going forward.
Patch included inline below.
This is a re-submission for the consideration for 2.29 on a request from
Carlos O'Donell https://sourceware.org/ml/libc-alpha/2018-07/msg00506.html
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.
The glibc wiki explicitly lists this use case as the test example
LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt
currently it fails on Cyrillic texts in most locales including ru_RU [1]
LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC
CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.
- It produces a string of question marks and spaces.
CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.
The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.
While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration has only ASCII codes but still can be read by a native
speaker. Among other things it is useful for processing the Cyrillic
texts and filenames by programs or on systems that are not specifically
prepared to work with Cyrillic, don't have corresponding fonts installed
or can't handle UTF-8.
The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on GOST 7.79-2000 official source
(Federal Agency on Technical Regulating and Metrology Of Russian
Federation [2]). Technically an independent but identical source [3] was
used and prepared in a spreadsheet [6].
The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.
The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.
However it would not be the standard Russian Cyrillic transliteration as
described above.
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
exclusion.
[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=8590
[7] translit_cyrillic https://sourceware.org/bugzilla/attachment.cgi?id=8591
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=8618
Best regards,
Egor Kobylkin
---
[BZ #2872]
* locales/translit_cyrillic: add Russian GOST 7.79-2000 transliteration
table from Cyrillic to Latin.
* locales/C: add include "translit_cyrillic";"" to LC_CTYPE translit
section.
* locales/aa_DJ: likewise
* locales/af_ZA: likewise
* locales/ak_GH: likewise
* locales/am_ET: likewise
* locales/ar_EG: likewise
* locales/be_BY: likewise
* locales/bem_ZM: likewise
* locales/ber_DZ: likewise
* locales/ber_MA: likewise
* locales/bg_BG: likewise
* locales/bi_VU: likewise
* locales/bn_BD: likewise
* locales/bo_CN: likewise
* locales/ca_ES: likewise
* locales/ce_RU: likewise
* locales/cs_CZ: likewise
* locales/cv_RU: likewise
* locales/cy_GB: likewise
* locales/da_DK: likewise
* locales/de_DE: likewise
* locales/dv_MV: likewise
* locales/dz_BT: likewise
* locales/el_GR: likewise
* locales/en_GB: likewise
* locales/en_NG: likewise
* locales/en_ZM: likewise
* locales/es_CU: likewise
* locales/es_ES: likewise
* locales/et_EE: likewise
* locales/fa_IR: likewise
* locales/ff_SN: likewise
* locales/fi_FI: likewise
* locales/fr_FR: likewise
* locales/ga_IE: likewise
* locales/gd_GB: likewise
* locales/gu_IN: likewise
* locales/gv_GB: likewise
* locales/he_IL: likewise
* locales/hi_IN: likewise
* locales/hif_FJ: likewise
* locales/hr_HR: likewise
* locales/ht_HT: likewise
* locales/hu_HU: likewise
* locales/hy_AM: likewise
* locales/id_ID: likewise
* locales/is_IS: likewise
* locales/it_IT: likewise
* locales/ja_JP: likewise
* locales/kk_KZ: likewise
* locales/km_KH: likewise
* locales/kn_IN: likewise
* locales/ko_KR: likewise
* locales/ks_IN: likewise
* locales/kw_GB: likewise
* locales/lb_LU: likewise
* locales/lg_UG: likewise
* locales/lij_IT: likewise
* locales/ln_CD: likewise
* locales/lo_LA: likewise
* locales/lt_LT: likewise
* locales/lv_LV: likewise
* locales/mg_MG: likewise
* locales/mhr_RU: likewise
* locales/mk_MK: likewise
* locales/ml_IN: likewise
* locales/ms_MY: likewise
* locales/mt_MT: likewise
* locales/nb_NO: likewise
* locales/ne_NP: likewise
* locales/nhn_MX: likewise
* locales/niu_NU: likewise
* locales/niu_NZ: likewise
* locales/nl_NL: likewise
* locales/nr_ZA: likewise
* locales/oc_FR: likewise
* locales/om_KE: likewise
* locales/or_IN: likewise
* locales/os_RU: likewise
* locales/pa_IN: likewise
* locales/pa_PK: likewise
* locales/pl_PL: likewise
* locales/pt_PT: likewise
* locales/quz_PE: likewise
* locales/ro_RO: likewise
* locales/ru_RU: likewise
* locales/rw_RW: likewise
* locales/sa_IN: likewise
* locales/sd_IN: likewise
* locales/sd_PK: likewise
* locales/se_NO: likewise
* locales/sgs_LT: likewise
* locales/si_LK: likewise
* locales/sk_SK: likewise
* locales/sl_SI: likewise
* locales/sm_WS: likewise
* locales/so_SO: likewise
* locales/sq_AL: likewise
* locales/ss_ZA: likewise
* locales/st_ZA: likewise
* locales/sv_SE: likewise
* locales/sw_KE: likewise
* locales/ta_IN: likewise
* locales/te_IN: likewise
* locales/th_TH: likewise
* locales/ti_ET: likewise
* locales/tn_ZA: likewise
* locales/to_TO: likewise
* locales/tpi_PG: likewise
* locales/tr_TR: likewise
* locales/ts_ZA: likewise
* locales/unm_US: likewise
* locales/ur_IN: likewise
* locales/ur_PK: likewise
* locales/ve_ZA: likewise
* locales/vi_VN: likewise
* locales/wa_BE: likewise
* locales/wo_SN: likewise
* locales/xh_ZA: likewise
* locales/yi_US: likewise
* locales/zh_CN: likewise
* locales/zu_ZA: likewise
diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/C 2018-07-17 17:55:47.000000000 +0000
@@ -2292,6 +2292,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-07-17 17:55:47.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-07-17 17:55:47.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-07-17 17:55:47.000000000 +0000
@@ -56,6 +56,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-07-17 17:55:47.000000000 +0000
@@ -1396,6 +1396,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-07-17 17:49:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-07-17 17:55:48.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-07-17 17:55:48.000000000 +0000
@@ -166,6 +166,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-07-17 17:55:48.000000000 +0000
@@ -86,6 +86,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-07-17 17:55:48.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-07-17 17:55:48.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-07-17 17:55:48.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-07-17 17:55:48.000000000 +0000
@@ -72,6 +72,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-07-17 17:55:48.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-07-17 17:49:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-07-17 17:55:48.000000000 +0000
@@ -2311,6 +2311,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-07-17 17:55:48.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-07-17 17:55:48.000000000 +0000
@@ -69,6 +69,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-07-17 17:55:48.000000000 +0000
@@ -167,6 +167,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-07-17 17:55:48.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-07-17 17:55:48.000000000 +0000
@@ -52,6 +52,7 @@
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-07-17 17:55:48.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-07-17 17:55:48.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-07-17 17:49:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-07-17 17:55:48.000000000 +0000
@@ -50,6 +50,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-07-17 17:55:48.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-07-17 17:55:48.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-07-17 17:55:49.000000000 +0000
@@ -73,6 +73,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-07-17 17:55:49.000000000 +0000
@@ -109,6 +109,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-07-17 17:55:49.000000000 +0000
@@ -79,6 +79,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-07-17 17:55:49.000000000 +0000
@@ -42,6 +42,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-07-17 17:49:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-07-17 17:55:49.000000000 +0000
@@ -137,6 +137,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-07-17 17:55:49.000000000 +0000
@@ -54,6 +54,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-07-17 17:55:49.000000000 +0000
@@ -47,6 +47,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-07-17 17:55:49.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-07-17 17:55:49.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-07-17 17:55:49.000000000 +0000
@@ -61,6 +61,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-07-17 17:55:49.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-07-17 17:55:49.000000000 +0000
@@ -153,6 +153,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-07-17 17:49:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-07-17 17:55:49.000000000 +0000
@@ -478,6 +478,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-07-17 17:55:49.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/id_ID 2018-07-17 17:55:49.000000000 +0000
@@ -55,6 +55,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-07-17 17:55:49.000000000 +0000
@@ -2161,6 +2161,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-07-17 17:55:49.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-07-17 17:55:49.000000000 +0000
@@ -1682,6 +1682,7 @@
include "translit_combining";""
include "translit_cjk_variants";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-07-17 17:55:50.000000000 +0000
@@ -158,6 +158,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-07-17 17:55:50.000000000 +0000
@@ -873,6 +873,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-07-17 17:55:50.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-07-17 17:55:50.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-07-17 17:55:50.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-07-17 17:55:50.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-07-17 17:55:50.000000000 +0000
@@ -78,6 +78,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "<U0065><U005E>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-07-17 17:55:50.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-07-17 17:49:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-07-17 17:55:50.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-07-17 17:55:50.000000000 +0000
@@ -51,6 +51,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-07-17 17:55:50.000000000 +0000
@@ -77,6 +77,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-07-17 17:55:50.000000000 +0000
@@ -2122,6 +2122,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-07-17 17:55:50.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-07-17 17:55:50.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-07-17 17:55:50.000000000 +0000
@@ -49,6 +49,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-07-17 17:55:50.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-07-17 17:55:50.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-07-17 17:55:50.000000000 +0000
@@ -47,6 +47,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
@@ -53,6 +53,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-07-17 17:55:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-07-17 17:55:50.000000000 +0000
@@ -43,6 +43,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-07-17 17:49:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-07-17 17:55:51.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/om_KE 2018-07-17 17:55:51.000000000 +0000
@@ -140,6 +140,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/or_IN 2018-07-17 17:55:51.000000000 +0000
@@ -62,6 +62,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/os_RU 2018-07-17 17:55:51.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -60,6 +60,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-07-17 17:55:51.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-07-17 17:55:51.000000000 +0000
@@ -142,6 +142,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-07-17 17:55:51.000000000 +0000
@@ -59,6 +59,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-07-17 17:55:51.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-07-17 17:55:51.000000000 +0000
@@ -144,6 +144,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-07-17 17:55:51.000000000 +0000
@@ -74,6 +74,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-07-17 17:55:51.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-07-17 17:55:51.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-07-17 17:55:51.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
+0000
+0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-07-17 17:55:51.000000000 +0000
@@ -39,6 +39,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-07-17 17:55:51.000000000 +0000
@@ -205,6 +205,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-07-17 17:55:52.000000000 +0000
@@ -59,6 +59,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-07-17 17:49:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-07-17 17:55:52.000000000 +0000
@@ -91,6 +91,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-07-17 17:55:52.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-07-17 17:55:52.000000000 +0000
@@ -45,6 +45,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -68,6 +68,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-07-17 17:55:52.000000000 +0000
@@ -139,6 +139,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-07-17 17:55:52.000000000 +0000
@@ -44,6 +44,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-07-17 17:55:52.000000000 +0000
@@ -63,6 +63,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-07-17 17:55:52.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-07-17 17:55:52.000000000 +0000
@@ -866,6 +866,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -69,6 +69,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-07-17 17:55:52.000000000 +0000
@@ -36,6 +36,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-07-17 17:49:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-07-17 17:55:52.000000000 +0000
@@ -37,6 +37,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-07-17 17:55:52.000000000 +0000
@@ -2430,6 +2430,7 @@
% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-07-17 17:55:52.000000000
+0000
@@ -0,0 +1,151 @@
+escape_char /
+comment_char %
+
+% Transliterations that converts cyrillic letters to ascii symbols
inspired by GOST 7.79-2000
+% https://sourceware.org/bugzilla/show_bug.cgi?id=2872
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=8590
+% Up to three characters are required to do a reversible transliteration.
+
+LC_CTYPE
+
+translit_start
+
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> "<U0059><U004F>";<U0059>
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> "<U005A><U0048>";<U005A>
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> "<U0043><U005A>";<U0043>
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> "<U0043><U0048>";<U0043>
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> "<U0053><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> "<U0053><U0048><U0048>";<U0053>
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> "<U0060><U0060>";<U0060>
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> "<U0059><U0027>";<U0059>
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> "<U0045><U0060>";<U0045>
+% CYRILLIC CAPITAL LETTER YU
+<U042E> "<U0059><U0055>";<U0059>
+% CYRILLIC CAPITAL LETTER YA
+<U042F> "<U0059><U0041>";<U0059>
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> "<U007A><U0068>";<U007A>
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> "<U0063><U007A>";<U0063>
+% CYRILLIC SMALL LETTER CHE
+<U0447> "<U0063><U0068>";<U0063>
+% CYRILLIC SMALL LETTER SHA
+<U0448> "<U0073><U0068>";<U0073>
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> "<U0073><U0068><U0068>";<U0073>
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> "<U0060><U0060>";<U0060>
+% CYRILLIC SMALL LETTER YERU
+<U044B> "<U0079><U0027>";<U0079>
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> "<U0065><U0060>";<U0065>
+% CYRILLIC SMALL LETTER YU
+<U044E> "<U0079><U0075>";<U0079>
+% CYRILLIC SMALL LETTER YA
+<U044F> "<U0079><U0061>";<U0079>
+% CYRILLIC SMALL LETTER IO
+<U0451> "<U0079><U006F>";<U0079>
+
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-07-17 17:55:52.000000000 +0000
@@ -64,6 +64,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-07-17 17:55:52.000000000 +0000
@@ -48,6 +48,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-07-17 17:55:53.000000000 +0000
@@ -46,6 +46,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -67,6 +67,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-07-17 17:55:53.000000000 +0000
@@ -69,6 +69,7 @@
<U00C5> "<U0041><U030A>";"<U0041>";"<U0041><U0055>"
<U00E5> "<U0061><U030A>";"<U0061>";"<U0061><U0075>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-07-17 17:55:53.000000000 +0000
@@ -55,6 +55,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -66,6 +66,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-07-17 17:55:53.000000000 +0000
@@ -73,6 +73,7 @@
<U05F0> "<U05D5><U05D5>";"<U0077><U0077>"
<U05F1> "<U05D5><U05D9>";"<U0077><U006A>"
<U05F2> "<U05D9><U05D9>";"<U006A><U006A>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-07-17 17:49:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-07-17 17:55:53.000000000 +0000
@@ -58,6 +58,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-07-17 17:49:22.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-07-17 17:55:53.000000000 +0000
@@ -70,6 +70,7 @@
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
--
Marko Myllynen
Rafal Luzynski
2018-10-05 09:20:25 UTC
Permalink
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?
I think it is about me so I must reply. I am sorry about that and the sole
reason is my lack of time. I'm just a volunteer here, that means it's not
my regular job to work on locale data nor anything in glibc nor in any other
open source project. I do these things only in my free time which I don't
have much. Of course you will see my contributions here and there but they
are either trivial or take me months to complete. Your patches are on my
radar but I can't tell any ETA for them. Of course, there are other people
around here and they are all welcome to come and join.
Post by Egor Kobylkin
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every language so
I don't think you should treat your rules as universal and include them
in every locale. At first sight, it seems to me they work only for English
(as a destination locale). Also, although it is called "transliteration
from Cyrillic" it seems that it covers only Russian alphabet. What about
other languages which use Cyrillic alphabet but add their own diacritic
characters? Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
Mari, Ossetian, Yakut, Tatar, and more. What about languages which use
Cyrillic alphabet but transliterate their respective letters in a different
way than Russian? For example, Russian "Ъ" is (I think) usually skipped
in transliteration, I think you propose "``", but when transliterating from
Bulgarian they usually transliterate this as "ă".

Few remarks:

* I think you transliterate "щ" as "shh", wouldn't "shch" be better?
* You transliterate "ц" as "cz", wouldn't "ts" be better? By the way,
in Polish language "cz" is a correct transliteration of "ч".
* You transliterate "й" as "j", this is fine in many languages but wouldn't
"y" be better in English?
* In case of "е": how will you know if it is correct to transliterate it
to "e" or "ie" or "je" or "ye"?

These remarks are obviously incomplete, your patch deserves much more
attention to review.

Best regards,

Rafal
Egor Kobylkin
2018-10-05 10:36:29 UTC
Permalink
removed a png image attachment

Keld,Marko,Rafal, other locale maintainers,

this all is written with having in mind a minimal viable fix for this
bug asap. I want to avoid wasting maintainers time getting into
fundamental discussions here (although for perfectly good reasons).

I see three options:
1. those locale maintainers that are fine with using ISO
9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
in their locales. https://sourceware.org/bugzilla/attachment.cgi?id=11289
2. those that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
this patch.
3. those that want to omit a cyrillic transliteration altogether for now
state so and just carry over the bug #2872 from the year 2006.

Does this make sense to you?

Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not explicitly
targeting transliteration standards of any country.

The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
available and can be helpful to a majority of cyrillic users b) I have
access to it including via being proficient in Russian.

It is offered to all the respective locale maintainers as a stopgap
solution. Stopgap in the sense that it is better to have some
transliteration than not to have any at all and carry over the bug from
2006. That it may be a somewhat officially correct transliteration for
ru_RU is a bonus. In that sense I would dub the discussion on the
correctness for other languages "offtopic". Let me know if this is not OK.

You are all are correctly mentioning the deficiencies of this approach.
However, I couldn't find a better straightforward approach as of yet.
Happy to hear from you as on how this could be handled.

There is a danger of being caught in the web of language/country
differences. I propose just pruning the locales that are not comfortable
including this current table. We can address possible solutions in the
second wave of patching.

I am vary of getting into discussions on specific country variants just
because of the sheer complexity of this topic. It is probably better
addressed by respective maintainers of their locales. I do not see a
"one fits all" solution in this first wave possible.

I would like to have this "three options plan of action" vetted first
and then we could go to the specific detail. (Like, for instance, what
characters should be included in to the table, and in which
transliteration form.)

I am looking forward to your reply,
Egor Kobylkin

P.S. specifically as to how address languages other than Ru included in
GOST_7.79_System_B: we can take the first option left to right from that
table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
locales/languages but with errors where Ru supersedes their own variants.
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?
I think it is about me so I must reply. I am sorry about that and the sole
reason is my lack of time. I'm just a volunteer here, that means it's not
my regular job to work on locale data nor anything in glibc nor in any other
open source project. I do these things only in my free time which I don't
have much. Of course you will see my contributions here and there but they
are either trivial or take me months to complete. Your patches are on my
radar but I can't tell any ETA for them. Of course, there are other people
around here and they are all welcome to come and join.
Post by Egor Kobylkin
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every language so
I don't think you should treat your rules as universal and include them
in every locale. At first sight, it seems to me they work only for English
(as a destination locale). Also, although it is called "transliteration
from Cyrillic" it seems that it covers only Russian alphabet. What about
other languages which use Cyrillic alphabet but add their own diacritic
characters? Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
Mari, Ossetian, Yakut, Tatar, and more. What about languages which use
Cyrillic alphabet but transliterate their respective letters in a different
way than Russian? For example, Russian "Ъ" is (I think) usually skipped
in transliteration, I think you propose "``", but when transliterating from
Bulgarian they usually transliterate this as "ă".
* I think you transliterate "щ" as "shh", wouldn't "shch" be better?
* You transliterate "ц" as "cz", wouldn't "ts" be better? By the way,
in Polish language "cz" is a correct transliteration of "ч".
* You transliterate "й" as "j", this is fine in many languages but wouldn't
"y" be better in English?
* In case of "е": how will you know if it is correct to transliterate it
to "e" or "ie" or "je" or "ye"?
These remarks are obviously incomplete, your patch deserves much more
attention to review.
Best regards,
Rafal
Rafal Luzynski
2018-10-08 22:04:55 UTC
Permalink
Post by Egor Kobylkin
[...]
1. those locale maintainers that are fine with using ISO
9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
in their locales. https://sourceware.org/bugzilla/attachment.cgi?id=11289
2. those that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
this patch.
3. those that want to omit a cyrillic transliteration altogether for now
state so and just carry over the bug #2872 from the year 2006.
Does this make sense to you?
The problem is that we don't have a separate maintainer for each locale,
we have only 2 maintainers for about 200 locales and we must represent
them all. Sometimes a locale may happen to be our own native locale or
of someone in this list, or it may be a locale which we accidentally can
speak as a foreign language, or we may have friends who can speak it.
Or it may be totally unknown and we still must somehow handle it.

I think that these transliteration rules should be included in multiple
locales on "opt-in" basis rather than "opt-out". I mean, we should not
include them in all locales unless someone explicitly provides a different
rules. Instead, I think we should add them (maybe with modification)
only to those locales where we have a good reason to think they will work.

Particularly, I think that those rules will not be helpful at all for
the languages which use neither Latin nor Cyrillic alphabet.
Post by Egor Kobylkin
[...]
The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
available and can be helpful to a majority of cyrillic users b) I have
access to it including via being proficient in Russian.
I took a look at these standards and as first I doubted they may be
correct for English language now I understand they are created for
Russian users. Therefore I think it is pretty correct to include them
to Russian locale data. Will it be OK if we say that it is only for
Russian language? Will it be satisfying for you and/or your users?
Post by Egor Kobylkin
It is offered to all the respective locale maintainers as a stopgap
solution. Stopgap in the sense that it is better to have some
transliteration than not to have any at all and carry over the bug from
2006. That it may be a somewhat officially correct transliteration for
ru_RU is a bonus. In that sense I would dub the discussion on the
correctness for other languages "offtopic". Let me know if this is not OK.
If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now. I am afraid that the iconv
algorithm does not handle such case. Of course, we should add this missing
feature eventually but I do not volunteer to do it now.
Post by Egor Kobylkin
[...]
P.S. specifically as to how address languages other than Ru included in
GOST_7.79_System_B: we can take the first option left to right from that
table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
locales/languages but with errors where Ru supersedes their own variants.
Makes sense, as long as we cannot select the source language now.

But, while at this, is there anything that stops are from adding transliteration
rules for additional Cyrillic characters not used in Russian but used in
other languages?

Regards,

Rafal
Egor Kobylkin
2018-10-08 22:52:00 UTC
Permalink
Hi Rafal,
Post by Rafal Luzynski
But, while at this, is there anything that stops are from adding
transliteration rules for additional Cyrillic characters not used in
Russian but used in other languages?
Just to make sure we are not talking at cross purposes. Since your last
email on this topic on the suggestion from Marko I have already
implemented ISO 9 transliteration for all characters there are. This
should cover most if not all Slavic Cyrillic. You seem to have just
noticed and replied to this email of Marko as I write mine.

Pls also check the Spreadsheet version I have just uploaded
https://sourceware.org/bugzilla/attachment.cgi?id=11298

I am currently absorbing Marko's further suggestions and correction to
that one and will get back for more discussion once done there. I am
reading your suggestions and taking them to my heart, be sure of that.

Two professional translators independently indicated the difference
between transliteration and transcription to me. Transliteration is
normative (letter for letter) and transcription is phonetic - letter for
whatever combination of Latin letters in the target language that sounds
like it for a native speaker. While transliteration should be easy to
cover for all those languages via ISO 9, transcription is inherently
language specific. The problem is we are (mis)using the transcription as
transliteration to ASCII because ASCII set of characters does not allow
for proper transcription. Another problem is that to be really useful
the ASCII transliteration should work outside of source locale (i.e. not
only ru_RU but en_US, de_DE, en_DE, es_ES etc. or even just C locale).

In fact for myself I would be committed to do all work needed to cover
at least C, en_US, ru_RU, de_DE in that order. ru_RU as a "courtesy", I
am not really using it but hope more contributors for locales may come
because of that and fix my bugs :-).
Post by Rafal Luzynski
The problem is that we don't have a separate maintainer for each
locale, we have only 2 maintainers for about 200 locales and we must
represent them all.
It was not clear to me that glibc team can not fall back on the
individual locale maintainers to make the decision. But then it may make
the decision making even easier. If you guys have a list of requirements
(may be implicit until now) could you please shoot them my way? We can
also certainly just keep this thread up and have all issues ironed out.

Anyway hopefully with ISO 9 as a first column in the translit_cyrillic
we cover the issue of the completeness of transliteration now. What we
need to figure out is transcription/transliteration to ASCII - second
column.

Are we sharing the same view on this?

Speaking on decision making - maybe I can get an officially certified
court translator to answer our questions. Do you care to put a list
together of questions you would like answered to make a decision on the
table/inclusion into various locales?

Hope this helps,
Egor
Post by Rafal Luzynski
[...] I see three options: 1. those locale maintainers that are
fine with using ISO 9:1995/GOST_7.79_System_B cyrillic
transliteration table (Ru) include it in their locales.
https://sourceware.org/bugzilla/attachment.cgi?id=11289 2. those
that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include
it in this patch. 3. those that want to omit a cyrillic
transliteration altogether for now state so and just carry over the
bug #2872 from the year 2006.
Does this make sense to you?
The problem is that we don't have a separate maintainer for each
locale, we have only 2 maintainers for about 200 locales and we must
represent them all. Sometimes a locale may happen to be our own
native locale or of someone in this list, or it may be a locale which
we accidentally can speak as a foreign language, or we may have
friends who can speak it. Or it may be totally unknown and we still
must somehow handle it.
I think that these transliteration rules should be included in
multiple locales on "opt-in" basis rather than "opt-out". I mean, we
should not include them in all locales unless someone explicitly
provides a different rules. Instead, I think we should add them
(maybe with modification) only to those locales where we have a good
reason to think they will work.
Particularly, I think that those rules will not be helpful at all
for the languages which use neither Latin nor Cyrillic alphabet.
[...] The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO
9:1995/GOST_7.79_System_B is available and can be helpful to a
majority of cyrillic users b) I have access to it including via
being proficient in Russian.
I took a look at these standards and as first I doubted they may be
correct for English language now I understand they are created for
Russian users. Therefore I think it is pretty correct to include
them to Russian locale data. Will it be OK if we say that it is only
for Russian language? Will it be satisfying for you and/or your
users?
It is offered to all the respective locale maintainers as a
stopgap solution. Stopgap in the sense that it is better to have
some transliteration than not to have any at all and carry over the
bug from 2006. That it may be a somewhat officially correct
transliteration for ru_RU is a bonus. In that sense I would dub the
discussion on the correctness for other languages "offtopic". Let
me know if this is not OK.
If you refer to other languages than Russian which also use the
Cyrillic alphabet but need a different transliteration rules than
Russian for the same characters then it is OK for me now. I am
afraid that the iconv algorithm does not handle such case. Of
course, we should add this missing feature eventually but I do not
volunteer to do it now.
[...] P.S. specifically as to how address languages other than Ru
included in GOST_7.79_System_B: we can take the first option left
to right from that table (Ru,By,Uk,Bg,Mk). Then it will technically
work for all those locales/languages but with errors where Ru
supersedes their own variants.
Makes sense, as long as we cannot select the source language now.
But, while at this, is there anything that stops are from adding
transliteration rules for additional Cyrillic characters not used in
Russian but used in other languages?
Regards,
Rafal
Rafal Luzynski
2018-10-09 21:43:05 UTC
Permalink
Post by Egor Kobylkin
[...]
Just to make sure we are not talking at cross purposes. Since your last
email on this topic on the suggestion from Marko I have already
implemented ISO 9 transliteration for all characters there are. This
should cover most if not all Slavic Cyrillic. You seem to have just
noticed and replied to this email of Marko as I write mine.
That's great. I'm sorry about not noticing this before, as you can see
this only confirms that I'm unable to give a proper attention to your bug.
Post by Egor Kobylkin
Post by Rafal Luzynski
Are the duplicates here because some Cyrillic letters may have multiple
Latin transliterations depending on the context, for example Cyrillic IE
must be transliterated sometimes as "e", sometimes as "ie", sometimes
as "ye" or "je"? Can we provide rules for groups of characters instead?
No, the duplicates are just by design of my line generating logic. I
have fixed (removed) them. The varying transcription between
languages/locales can not be handled in one file at all as far as I
understood.
No, I did not mean here different languages but that some letters may need
to be transliterated in a different way depending on the context. For
example, a letter "е" might be transliterated as "e" or "ie" or "je"
depending on whether it appears after "ж" or after another consonant
or after a vowel or a soft or hard sign etc. All within Russian language.
(Sorry if I'm messing that, maybe what I wrote is wrong but may be correct
for another combination of letters.)

Regards,

Rafal
Zack Weinberg
2018-10-08 23:20:23 UTC
Permalink
On Mon, Oct 8, 2018 at 6:05 PM Rafal Luzynski
Post by Rafal Luzynski
The problem is that we don't have a separate maintainer for each locale,
we have only 2 maintainers for about 200 locales and we must represent
them all. Sometimes a locale may happen to be our own native locale or
of someone in this list, or it may be a locale which we accidentally can
speak as a foreign language, or we may have friends who can speak it.
Or it may be totally unknown and we still must somehow handle it.
I just want to mention that this is also why most of the non-locale
maintainers tend to stay out of threads about locales. We know we're
even less expert on these issues than you are, and I think as a
general rule you should be assuming that the community is OK with what
you're doing unless someone speaks up to object.

zw
Carlos O'Donell
2018-10-09 15:26:25 UTC
Permalink
Post by Zack Weinberg
On Mon, Oct 8, 2018 at 6:05 PM Rafal Luzynski
Post by Rafal Luzynski
The problem is that we don't have a separate maintainer for each locale,
we have only 2 maintainers for about 200 locales and we must represent
them all. Sometimes a locale may happen to be our own native locale or
of someone in this list, or it may be a locale which we accidentally can
speak as a foreign language, or we may have friends who can speak it.
Or it may be totally unknown and we still must somehow handle it.
I just want to mention that this is also why most of the non-locale
maintainers tend to stay out of threads about locales. We know we're
even less expert on these issues than you are, and I think as a
general rule you should be assuming that the community is OK with what
you're doing unless someone speaks up to object.
I agree with Zach here.

Rafal and Mike are localedata subsystem maintainers, and your best efforts
are the best we have right now in the community.

I also agree that a conservative position of is always a good place to start,
but it sounds like Egor has added enough coverage to perhaps make all of
these transliterations opt-in by default.

I don't have a good sense of this though, and so I defer to you as a the
subsystem maintainer to review and formulate a position. If you have any
specific questions, I can certainly help review.
--
Cheers,
Carlos.
Rafal Luzynski
2018-10-09 21:51:28 UTC
Permalink
Post by Carlos O'Donell
[...]
but it sounds like Egor has added enough coverage to perhaps make all of
these transliterations opt-in by default.
I think that it is correct if this transliteration is meant to be "Russian
language as if it used a Latin alphabet (even if it does not actually
except in some computer systems which do not support Cyrillic)"
but not if it is meant to be "Russian language to make sure it is comfortable
for reading by English speakers (assuming that everyone else should be fine
with English if their native language is not supported)".

Regards,

Rafal
Marko Myllynen
2018-10-09 16:10:26 UTC
Permalink
Hi,
Post by Rafal Luzynski
Particularly, I think that those rules will not be helpful at all for
the languages which use neither Latin nor Cyrillic alphabet.
This is certainly a very good point.
Post by Rafal Luzynski
If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now. I am afraid that the iconv
algorithm does not handle such case. Of course, we should add this missing
feature eventually but I do not volunteer to do it now.
Yes, this would be needed for correct transliteration of different
languages, and this might be quite a bit of work. There's also the case
of transliteration and character sets, consider the transliteration
examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus:

Russian: Борис Николаевич Ельцин
Int'l: Boris Nikolaevič Elʹcin
Finnish: Boris Nikolajevitš Jeltsin
French: Boris Nikolaïevitch Ieltsine
Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]

For French you'll get the correct transliteration with iconv by using -t
ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
not so obvious how to get the above kind transliteration for ISO 9
international or especially for the phonetic case.

One thing that might be helpful here could be something like:

$ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
ž

That is, force transliteration of each character (if defined) even if
it's part of the target character set. AFAICS this is not currently
possible.
Post by Rafal Luzynski
But, while at this, is there anything that stops are from adding transliteration
rules for additional Cyrillic characters not used in Russian but used in
other languages?
This would probably make sense.

FWIW, for Finnish the diff for Russian to be applied in the locale on
top of translit_cyrillic (ISO 9) rules would be something like below, I
still need to check whether there are rules needed for other languages
than Russian that could be added (I hope to submit a proper patch
against fi_FI shortly after translit_cyrillic has landed):

<U0446> "<U0074><U0073>"
<U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
<U0448> "<U0161>";"<U0073><U0068>"
<U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
<U044A> ""
<U044C> ""
<U044D> "<U0065>"
<U044E> "<U006A><U0075>"
<U044F> "<U006A><U0061>"
<U0451> "<U006A><U006F>"

Thanks,
--
Marko Myllynen
Egor Kobylkin
2018-10-09 16:22:22 UTC
Permalink
In the hope to be helpful: what you describe below from
https://fi.wikipedia.org/wiki/Siirtokirjoitus is called _transcription_,
not transliteration.

Transliteration is what we have done with ISO 9 or GOST 7.79 System A
and it could be the same for all languages indeed.

The transcription can be phonetic or serve other purposes and depends on
the target language or use case. We have used the GOST 7.79 System B.

Egor
Post by Marko Myllynen
Hi,
Post by Rafal Luzynski
Particularly, I think that those rules will not be helpful at all for
the languages which use neither Latin nor Cyrillic alphabet.
This is certainly a very good point.
Post by Rafal Luzynski
If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now. I am afraid that the iconv
algorithm does not handle such case. Of course, we should add this missing
feature eventually but I do not volunteer to do it now.
Yes, this would be needed for correct transliteration of different
languages, and this might be quite a bit of work. There's also the case
of transliteration and character sets, consider the transliteration
Russian: Борис Николаевич Ельцин
Int'l: Boris Nikolaevič Elʹcin
Finnish: Boris Nikolajevitš Jeltsin
French: Boris Nikolaïevitch Ieltsine
Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
For French you'll get the correct transliteration with iconv by using -t
ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
not so obvious how to get the above kind transliteration for ISO 9
international or especially for the phonetic case.
$ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
ž
That is, force transliteration of each character (if defined) even if
it's part of the target character set. AFAICS this is not currently
possible.
Post by Rafal Luzynski
But, while at this, is there anything that stops are from adding transliteration
rules for additional Cyrillic characters not used in Russian but used in
other languages?
This would probably make sense.
FWIW, for Finnish the diff for Russian to be applied in the locale on
top of translit_cyrillic (ISO 9) rules would be something like below, I
still need to check whether there are rules needed for other languages
than Russian that could be added (I hope to submit a proper patch
<U0446> "<U0074><U0073>"
<U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
<U0448> "<U0161>";"<U0073><U0068>"
<U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
<U044A> ""
<U044C> ""
<U044D> "<U0065>"
<U044E> "<U006A><U0075>"
<U044F> "<U006A><U0061>"
<U0451> "<U006A><U006F>"
Thanks,
Marko Myllynen
2018-10-09 16:49:06 UTC
Permalink
Hi,

To clarify, the page has a section explaining the differences between
transliteration and transcription and how the terminology is not
entirely unambiguous. It also explains that the national standard SFS
4900 overrides ISO 9, thus ISO 9 can't be used as-is in Finnish context.

Thanks,
Post by Egor Kobylkin
In the hope to be helpful: what you describe below from
https://fi.wikipedia.org/wiki/Siirtokirjoitus is called _transcription_,
not transliteration.
Transliteration is what we have done with ISO 9 or GOST 7.79 System A
and it could be the same for all languages indeed.
The transcription can be phonetic or serve other purposes and depends on
the target language or use case. We have used the GOST 7.79 System B.
Egor
Post by Marko Myllynen
Hi,
Post by Rafal Luzynski
Particularly, I think that those rules will not be helpful at all for
the languages which use neither Latin nor Cyrillic alphabet.
This is certainly a very good point.
Post by Rafal Luzynski
If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now. I am afraid that the iconv
algorithm does not handle such case. Of course, we should add this missing
feature eventually but I do not volunteer to do it now.
Yes, this would be needed for correct transliteration of different
languages, and this might be quite a bit of work. There's also the case
of transliteration and character sets, consider the transliteration
Russian: Борис Николаевич Ельцин
Int'l: Boris Nikolaevič Elʹcin
Finnish: Boris Nikolajevitš Jeltsin
French: Boris Nikolaïevitch Ieltsine
Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
For French you'll get the correct transliteration with iconv by using -t
ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's
not so obvious how to get the above kind transliteration for ISO 9
international or especially for the phonetic case.
$ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
ž
That is, force transliteration of each character (if defined) even if
it's part of the target character set. AFAICS this is not currently
possible.
Post by Rafal Luzynski
But, while at this, is there anything that stops are from adding transliteration
rules for additional Cyrillic characters not used in Russian but used in
other languages?
This would probably make sense.
FWIW, for Finnish the diff for Russian to be applied in the locale on
top of translit_cyrillic (ISO 9) rules would be something like below, I
still need to check whether there are rules needed for other languages
than Russian that could be added (I hope to submit a proper patch
<U0446> "<U0074><U0073>"
<U0447> "<U0074><U0161>";"<U0074><U0073><U0068>"
<U0448> "<U0161>";"<U0073><U0068>"
<U0449> "<U0161><U0074><U0161>";"<U0073><U0068><U0074><U0073><U0068>"
<U044A> ""
<U044C> ""
<U044D> "<U0065>"
<U044E> "<U006A><U0075>"
<U044F> "<U006A><U0061>"
<U0451> "<U006A><U006F>"
Thanks,
--
Marko Myllynen
Rafal Luzynski
2018-10-09 22:08:56 UTC
Permalink
Post by Marko Myllynen
Post by Rafal Luzynski
If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now. I am afraid that the iconv
algorithm does not handle such case. Of course, we should add this missing
feature eventually but I do not volunteer to do it now.
Yes, this would be needed for correct transliteration of different
languages, and this might be quite a bit of work. There's also the case
of transliteration and character sets, consider the transliteration
Russian: Борис Николаевич Ельцин
Int'l: Boris Nikolaevič Elʹcin
Finnish: Boris Nikolajevitš Jeltsin
French: Boris Nikolaïevitch Ieltsine
Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
No, I did not mean the transcription using the rules of the destination
locale using Latin but that the rules of transliteration may be different
depending on the language of the source text. For example, consider
this Cyrillic string: "нъг" (I'm not telling that it is actually used
in any existing word but still must be handled). By our transliteration
rules it will be transliterated as "n``g". But this is fine for Russian;
if we knew that the source string is Ukrainian it would be transliterated
as "n``h"; if it was Bulgarian it would be transliterated as "năg".
Similarly, if you had to transliterate the Latin letters "sch" to Cyrillic
first you would have to ask what was be the source language.

Unfortunately, I think that distinction of the source language is impossible
at the moment so let's assume that we fall back to Russian if there is
any ambiguity.

Regards,

Rafal
Marko Myllynen
2018-10-10 11:21:46 UTC
Permalink
Hi,
Post by Rafal Luzynski
Post by Marko Myllynen
Post by Rafal Luzynski
If you refer to other languages than Russian which also use the Cyrillic
alphabet but need a different transliteration rules than Russian for
the same characters then it is OK for me now. I am afraid that the iconv
algorithm does not handle such case. Of course, we should add this missing
feature eventually but I do not volunteer to do it now.
Yes, this would be needed for correct transliteration of different
languages, and this might be quite a bit of work. There's also the case
of transliteration and character sets, consider the transliteration
Russian: Борис Николаевич Ельцин
Int'l: Boris Nikolaevič Elʹcin
Finnish: Boris Nikolajevitš Jeltsin
French: Boris Nikolaïevitch Ieltsine
Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn]
No, I did not mean the transcription using the rules of the destination
locale using Latin but that the rules of transliteration may be different
depending on the language of the source text.
Yes, I mentioned this case in my earlier email:

https://sourceware.org/ml/libc-alpha/2018-10/msg00083.html
Post by Rafal Luzynski
this Cyrillic string: "нъг" (I'm not telling that it is actually used
in any existing word but still must be handled). By our transliteration
rules it will be transliterated as "n``g". But this is fine for Russian;
if we knew that the source string is Ukrainian it would be transliterated
as "n``h"; if it was Bulgarian it would be transliterated as "năg".
And according to SFS 4900, in fi_FI for this string we would see for
Russian ng, for Ukrainian nh, and for Bulgarian năg.
Post by Rafal Luzynski
Unfortunately, I think that distinction of the source language is impossible
at the moment so let's assume that we fall back to Russian if there is
any ambiguity.
Yeah, it's not optimal but probably the most decent compromise for now.

Thanks,
--
Marko Myllynen
Marko Myllynen
2018-10-11 10:10:00 UTC
Permalink
Hi,
Post by Marko Myllynen
$ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE
ž
That is, force transliteration of each character (if defined) even if
it's part of the target character set. AFAICS this is not currently
possible.
FWIW, this is currently not possible with iconv(1) but uconv(1) supports
this with -x (AFAICS it's using ICU not glibc locale data):

https://en.wikipedia.org/wiki/uconv
https://linux.die.net/man/1/uconv
https://github.com/unicode-org/icu/tree/master/icu4c/source/extra/uconv

Cheers,
--
Marko Myllynen
Marko Myllynen
2018-10-05 11:54:10 UTC
Permalink
Hi,

Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
possible and if not, then fall back to GOST 7.79 System B?

Implementation-wise current translit_* files have few examples where a
non-ASCII transliteration is tried first before an ASCII fallback. These
examples are from translit_neutral:

% NARROW NO-BREAK SPACE
<U202F> <U00A0>;<U0020>
% REVERSED TRIPLE PRIME
<U2037> "<U2035><U2035><U2035>";"<U0060><U0060><U0060>"

Thanks,
Post by Egor Kobylkin
Keld,Marko,Rafal, other locale maintainers,
this all is written with having in mind a minimal viable fix for this
bug asap. I want to avoid wasting maintainers time getting into
fundamental discussions here (although for perfectly good reasons).
1. those locale maintainers that are fine with using ISO
9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
in their locales (see attached screenshot of the table).
2. those that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
this patch.
3. those that want to omit a cyrillic transliteration altogether for now
state so and just carry over the bug #2872 from the year 2006.
Does this make sense to you?
Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not explicitly
targeting transliteration standards of any country.
The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
available and can be helpful to a majority of cyrillic users b) I have
access to it including via being proficient in Russian.
It is offered to all the respective locale maintainers as a stopgap
solution. Stopgap in the sense that it is better to have some
transliteration than not to have any at all and carry over the bug from
2006. That it may be a somewhat officially correct transliteration for
ru_RU is a bonus. In that sense I would dub the discussion on the
correctness for other languages "offtopic". Let me know if this is not OK.
You are all are correctly mentioning the deficiencies of this approach.
However, I couldn't find a better straightforward approach as of yet.
Happy to hear from you as on how this could be handled.
There is a danger of being caught in the web of language/country
differences. I propose just pruning the locales that are not comfortable
including this current table. We can address possible solutions in the
second wave of patching.
I am vary of getting into discussions on specific country variants just
because of the sheer complexity of this topic. It is probably better
addressed by respective maintainers of their locales. I do not see a
"one fits all" solution in this first wave possible.
I would like to have this "three options plan of action" vetted first
and then we could go to the specific detail. (Like, for instance, what
characters should be included in to the table, and in which
transliteration form.)
I am looking forward to your reply,
Egor Kobylkin
P.S. specifically as to how address languages other than Ru included in
GOST_7.79_System_B: we can take the first option left to right from that
table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
locales/languages but with errors where Ru supersedes their own variants.
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?
I think it is about me so I must reply. I am sorry about that and the sole
reason is my lack of time. I'm just a volunteer here, that means it's not
my regular job to work on locale data nor anything in glibc nor in any other
open source project. I do these things only in my free time which I don't
have much. Of course you will see my contributions here and there but they
are either trivial or take me months to complete. Your patches are on my
radar but I can't tell any ETA for them. Of course, there are other people
around here and they are all welcome to come and join.
Post by Egor Kobylkin
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every language so
I don't think you should treat your rules as universal and include them
in every locale. At first sight, it seems to me they work only for English
(as a destination locale). Also, although it is called "transliteration
from Cyrillic" it seems that it covers only Russian alphabet. What about
other languages which use Cyrillic alphabet but add their own diacritic
characters? Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
Mari, Ossetian, Yakut, Tatar, and more. What about languages which use
Cyrillic alphabet but transliterate their respective letters in a different
way than Russian? For example, Russian "Ъ" is (I think) usually skipped
in transliteration, I think you propose "``", but when transliterating from
Bulgarian they usually transliterate this as "ă".
* I think you transliterate "щ" as "shh", wouldn't "shch" be better?
* You transliterate "ц" as "cz", wouldn't "ts" be better? By the way,
in Polish language "cz" is a correct transliteration of "ч".
* You transliterate "й" as "j", this is fine in many languages but wouldn't
"y" be better in English?
* In case of "е": how will you know if it is correct to transliterate it
to "e" or "ie" or "je" or "ye"?
These remarks are obviously incomplete, your patch deserves much more
attention to review.
Best regards,
Rafal
--
Marko Myllynen
Egor Kobylkin
2018-10-05 12:00:02 UTC
Permalink
Hi Marko,

I have chosen the System B because it is ASCII compartible. System A is
not ASCII compartible (diacritics in target).

https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
"GOST 7.79 contains two transliteration tables.

System A
one Cyrillic character to one Latin character, some with diacritics
– identical to ISO 9:1995

System B
one Cyrillic character to one or many Latin characters without
diacritics
"
Hope this helps,
Egor
Post by Marko Myllynen
Hi,
Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
possible and if not, then fall back to GOST 7.79 System B?
Implementation-wise current translit_* files have few examples where a
non-ASCII transliteration is tried first before an ASCII fallback. These
% NARROW NO-BREAK SPACE
<U202F> <U00A0>;<U0020>
% REVERSED TRIPLE PRIME
<U2037> "<U2035><U2035><U2035>";"<U0060><U0060><U0060>"
Thanks,
Post by Egor Kobylkin
Keld,Marko,Rafal, other locale maintainers,
this all is written with having in mind a minimal viable fix for this
bug asap. I want to avoid wasting maintainers time getting into
fundamental discussions here (although for perfectly good reasons).
1. those locale maintainers that are fine with using ISO
9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
in their locales (see attached screenshot of the table).
2. those that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
this patch.
3. those that want to omit a cyrillic transliteration altogether for now
state so and just carry over the bug #2872 from the year 2006.
Does this make sense to you?
Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not explicitly
targeting transliteration standards of any country.
The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
available and can be helpful to a majority of cyrillic users b) I have
access to it including via being proficient in Russian.
It is offered to all the respective locale maintainers as a stopgap
solution. Stopgap in the sense that it is better to have some
transliteration than not to have any at all and carry over the bug from
2006. That it may be a somewhat officially correct transliteration for
ru_RU is a bonus. In that sense I would dub the discussion on the
correctness for other languages "offtopic". Let me know if this is not OK.
You are all are correctly mentioning the deficiencies of this approach.
However, I couldn't find a better straightforward approach as of yet.
Happy to hear from you as on how this could be handled.
There is a danger of being caught in the web of language/country
differences. I propose just pruning the locales that are not comfortable
including this current table. We can address possible solutions in the
second wave of patching.
I am vary of getting into discussions on specific country variants just
because of the sheer complexity of this topic. It is probably better
addressed by respective maintainers of their locales. I do not see a
"one fits all" solution in this first wave possible.
I would like to have this "three options plan of action" vetted first
and then we could go to the specific detail. (Like, for instance, what
characters should be included in to the table, and in which
transliteration form.)
I am looking forward to your reply,
Egor Kobylkin
P.S. specifically as to how address languages other than Ru included in
GOST_7.79_System_B: we can take the first option left to right from that
table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
locales/languages but with errors where Ru supersedes their own variants.
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?
I think it is about me so I must reply. I am sorry about that and the sole
reason is my lack of time. I'm just a volunteer here, that means it's not
my regular job to work on locale data nor anything in glibc nor in any other
open source project. I do these things only in my free time which I don't
have much. Of course you will see my contributions here and there but they
are either trivial or take me months to complete. Your patches are on my
radar but I can't tell any ETA for them. Of course, there are other people
around here and they are all welcome to come and join.
Post by Egor Kobylkin
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every language so
I don't think you should treat your rules as universal and include them
in every locale. At first sight, it seems to me they work only for English
(as a destination locale). Also, although it is called "transliteration
from Cyrillic" it seems that it covers only Russian alphabet. What about
other languages which use Cyrillic alphabet but add their own diacritic
characters? Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
Mari, Ossetian, Yakut, Tatar, and more. What about languages which use
Cyrillic alphabet but transliterate their respective letters in a different
way than Russian? For example, Russian "Ъ" is (I think) usually skipped
in transliteration, I think you propose "``", but when transliterating from
Bulgarian they usually transliterate this as "ă".
* I think you transliterate "щ" as "shh", wouldn't "shch" be better?
* You transliterate "ц" as "cz", wouldn't "ts" be better? By the way,
in Polish language "cz" is a correct transliteration of "ч".
* You transliterate "й" as "j", this is fine in many languages but wouldn't
"y" be better in English?
* In case of "е": how will you know if it is correct to transliterate it
to "e" or "ie" or "je" or "ye"?
These remarks are obviously incomplete, your patch deserves much more
attention to review.
Best regards,
Rafal
Marko Myllynen
2018-10-05 12:21:09 UTC
Permalink
Hi,

The scheme I proposed would also be ASCII compatible; consider this example:

% CYRILLIC CAPITAL LETTER SHA
<U0428> "<U0160>";"<U0053><U0068>"

"printf \\u0428\\n | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv -f
ISO-8859-15 -t UTF-8" would produce Š as per System A and "printf
\\u0428\\n | iconv -f UTF-8 -t ASCII//TRANSLIT" would produce Sh as per
System B.

Thanks,
Post by Egor Kobylkin
Hi Marko,
I have chosen the System B because it is ASCII compartible. System A is
not ASCII compartible (diacritics in target).
https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
"GOST 7.79 contains two transliteration tables.
System A
one Cyrillic character to one Latin character, some with diacritics
– identical to ISO 9:1995
System B
one Cyrillic character to one or many Latin characters without
diacritics
"
Hope this helps,
Egor
Post by Marko Myllynen
Hi,
Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
possible and if not, then fall back to GOST 7.79 System B?
Implementation-wise current translit_* files have few examples where a
non-ASCII transliteration is tried first before an ASCII fallback. These
% NARROW NO-BREAK SPACE
<U202F> <U00A0>;<U0020>
% REVERSED TRIPLE PRIME
<U2037> "<U2035><U2035><U2035>";"<U0060><U0060><U0060>"
Thanks,
Post by Egor Kobylkin
Keld,Marko,Rafal, other locale maintainers,
this all is written with having in mind a minimal viable fix for this
bug asap. I want to avoid wasting maintainers time getting into
fundamental discussions here (although for perfectly good reasons).
1. those locale maintainers that are fine with using ISO
9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
in their locales (see attached screenshot of the table).
2. those that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
this patch.
3. those that want to omit a cyrillic transliteration altogether for now
state so and just carry over the bug #2872 from the year 2006.
Does this make sense to you?
Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not explicitly
targeting transliteration standards of any country.
The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
available and can be helpful to a majority of cyrillic users b) I have
access to it including via being proficient in Russian.
It is offered to all the respective locale maintainers as a stopgap
solution. Stopgap in the sense that it is better to have some
transliteration than not to have any at all and carry over the bug from
2006. That it may be a somewhat officially correct transliteration for
ru_RU is a bonus. In that sense I would dub the discussion on the
correctness for other languages "offtopic". Let me know if this is not OK.
You are all are correctly mentioning the deficiencies of this approach.
However, I couldn't find a better straightforward approach as of yet.
Happy to hear from you as on how this could be handled.
There is a danger of being caught in the web of language/country
differences. I propose just pruning the locales that are not comfortable
including this current table. We can address possible solutions in the
second wave of patching.
I am vary of getting into discussions on specific country variants just
because of the sheer complexity of this topic. It is probably better
addressed by respective maintainers of their locales. I do not see a
"one fits all" solution in this first wave possible.
I would like to have this "three options plan of action" vetted first
and then we could go to the specific detail. (Like, for instance, what
characters should be included in to the table, and in which
transliteration form.)
I am looking forward to your reply,
Egor Kobylkin
P.S. specifically as to how address languages other than Ru included in
GOST_7.79_System_B: we can take the first option left to right from that
table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
locales/languages but with errors where Ru supersedes their own variants.
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin is not universal.
There are different schemes for for example German, English and Danish, and
there is also an ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include this patch
explicitly state so here?
I think it is about me so I must reply. I am sorry about that and the sole
reason is my lack of time. I'm just a volunteer here, that means it's not
my regular job to work on locale data nor anything in glibc nor in any other
open source project. I do these things only in my free time which I don't
have much. Of course you will see my contributions here and there but they
are either trivial or take me months to complete. Your patches are on my
radar but I can't tell any ETA for them. Of course, there are other people
around here and they are all welcome to come and join.
Post by Egor Kobylkin
- In the case that there is a different preferred cyrillic
transliteration table for any specific locale their maintainers may want
to point me to it so I can supply a separate table/patch.
- Or they could state explicitly that for some reason they would like to
exclude their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every language so
I don't think you should treat your rules as universal and include them
in every locale. At first sight, it seems to me they work only for English
(as a destination locale). Also, although it is called "transliteration
from Cyrillic" it seems that it covers only Russian alphabet. What about
other languages which use Cyrillic alphabet but add their own diacritic
characters? Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
Mari, Ossetian, Yakut, Tatar, and more. What about languages which use
Cyrillic alphabet but transliterate their respective letters in a different
way than Russian? For example, Russian "Ъ" is (I think) usually skipped
in transliteration, I think you propose "``", but when transliterating from
Bulgarian they usually transliterate this as "ă".
* I think you transliterate "щ" as "shh", wouldn't "shch" be better?
* You transliterate "ц" as "cz", wouldn't "ts" be better? By the way,
in Polish language "cz" is a correct transliteration of "ч".
* You transliterate "й" as "j", this is fine in many languages but wouldn't
"y" be better in English?
* In case of "е": how will you know if it is correct to transliterate it
to "e" or "ie" or "je" or "ye"?
These remarks are obviously incomplete, your patch deserves much more
attention to review.
Best regards,
Rafal
--
Marko Myllynen
Egor Kobylkin
2018-10-05 20:47:09 UTC
Permalink
After some kind help from Marko in the offline discussion
I realized the multi/single character approach I originally took was
against the of the iconv(1) logic anyway. So there is no harm in
dropping it and adopting Marko's suggestion instead. I will do so and
will resubmit the patch with ISO 9:1995/GOST 7.79 System A + fallback to
GOST 7.79 System B (for ASCII).

However this doesn't resolve the issue for ASCII part being different
for various locales. Again, I am offering the locale maintainers to let
me know if they want to 1) adopt the one I am supplying, 2) write their
own or 3) ignore the patch altogether. Your feedback is appreciated!
The first part (ISO-8859-15 or ASCII) defines the target encoding for
If the string //TRANSLIT is appended to to-encoding, characters
being converted are transliterated when needed and possible. This
means that when a character cannot be represented in the target
character set, it can be approximated through one or sev‐ eral
similar looking characters. Characters that are outside of the
target character set and cannot be transliterated are replaced
with a question mark (?) in the output.
So in the above examples, iconv(1) encounters the character U+0428
which is not part of either of the target encoding and since
//TRANSLIT is specified, iconv(1) tries transliteration according to
the rules defined above, in case of ASCII U+0160 is not part of the
target encoding so the next alternative is used.
Bests,
Egor Kobylkin
Hi,
% CYRILLIC CAPITAL LETTER SHA <U0428> "<U0160>";"<U0053><U0068>"
"printf \\u0428\\n | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv
-f ISO-8859-15 -t UTF-8" would produce Š as per System A and "printf
\\u0428\\n | iconv -f UTF-8 -t ASCII//TRANSLIT" would produce Sh as
per System B.
Thanks,
Post by Egor Kobylkin
Hi Marko,
I have chosen the System B because it is ASCII compartible. System
A is not ASCII compartible (diacritics in target).
https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
"GOST 7.79 contains two transliteration tables.
Post by Egor Kobylkin
System A one Cyrillic character to one Latin character, some with
diacritics – identical to ISO 9:1995
System B one Cyrillic character to one or many Latin characters
without diacritics " Hope this helps, Egor
Post by Marko Myllynen
Hi,
Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
possible and if not, then fall back to GOST 7.79 System B?
Implementation-wise current translit_* files have few examples
where a non-ASCII transliteration is tried first before an ASCII
% NARROW NO-BREAK SPACE <U202F> <U00A0>;<U0020> % REVERSED
TRIPLE PRIME <U2037>
"<U2035><U2035><U2035>";"<U0060><U0060><U0060>"
Thanks,
Post by Egor Kobylkin
Keld,Marko,Rafal, other locale maintainers,
this all is written with having in mind a minimal viable fix
for this bug asap. I want to avoid wasting maintainers time
getting into fundamental discussions here (although for
perfectly good reasons).
I see three options: 1. those locale maintainers that are fine
with using ISO 9:1995/GOST_7.79_System_B cyrillic
transliteration table (Ru) include it in their locales (see
attached screenshot of the table). 2. those that that want to
have a differing table can create their own variety based on
the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and
include it in this patch. 3. those that want to omit a
cyrillic transliteration altogether for now state so and just
carry over the bug #2872 from the year 2006.
Does this make sense to you?
Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not
explicitly targeting transliteration standards of any country.
The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO
9:1995/GOST_7.79_System_B is available and can be helpful to a
majority of cyrillic users b) I have access to it including
via being proficient in Russian.
It is offered to all the respective locale maintainers as a
stopgap solution. Stopgap in the sense that it is better to
have some transliteration than not to have any at all and
carry over the bug from 2006. That it may be a somewhat
officially correct transliteration for ru_RU is a bonus. In
that sense I would dub the discussion on the correctness for
other languages "offtopic". Let me know if this is not OK.
You are all are correctly mentioning the deficiencies of this
approach. However, I couldn't find a better straightforward
approach as of yet. Happy to hear from you as on how this
could be handled.
There is a danger of being caught in the web of
language/country differences. I propose just pruning the
locales that are not comfortable including this current table.
We can address possible solutions in the second wave of
patching.
I am vary of getting into discussions on specific country
variants just because of the sheer complexity of this topic.
It is probably better addressed by respective maintainers of
their locales. I do not see a "one fits all" solution in this
first wave possible.
I would like to have this "three options plan of action"
vetted first and then we could go to the specific detail.
(Like, for instance, what characters should be included in to
the table, and in which transliteration form.)
I am looking forward to your reply, Egor Kobylkin
P.S. specifically as to how address languages other than Ru
included in GOST_7.79_System_B: we can take the first option
left to right from that table (Ru,By,Uk,Bg,Mk). Then it will
technically work for all those locales/languages but with
errors where Ru supersedes their own variants.
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin
is not universal. There are different schemes for for
example German, English and Danish, and there is also an
ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include
this patch explicitly state so here?
I think it is about me so I must reply. I am sorry about
that and the sole reason is my lack of time. I'm just a
volunteer here, that means it's not my regular job to work
on locale data nor anything in glibc nor in any other open
source project. I do these things only in my free time
which I don't have much. Of course you will see my
contributions here and there but they are either trivial or
take me months to complete. Your patches are on my radar but
I can't tell any ETA for them. Of course, there are other
people around here and they are all welcome to come and
join.
Post by Egor Kobylkin
That is: - In the case that there is a different preferred
cyrillic transliteration table for any specific locale
their maintainers may want to point me to it so I can
supply a separate table/patch. - Or they could state
explicitly that for some reason they would like to exclude
their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every
language so I don't think you should treat your rules as
universal and include them in every locale. At first sight,
it seems to me they work only for English (as a destination
locale). Also, although it is called "transliteration from
Cyrillic" it seems that it covers only Russian alphabet. What
about other languages which use Cyrillic alphabet but add
their own diacritic characters? Think about Belarusian,
Ukrainian, Serbian, Chechen, Chuvash, Mari, Ossetian, Yakut,
Tatar, and more. What about languages which use Cyrillic
alphabet but transliterate their respective letters in a
different way than Russian? For example, Russian "Ъ" is (I
think) usually skipped in transliteration, I think you
propose "``", but when transliterating from Bulgarian they
usually transliterate this as "ă".
* I think you transliterate "щ" as "shh", wouldn't "shch" be
better? * You transliterate "ц" as "cz", wouldn't "ts" be
better? By the way, in Polish language "cz" is a correct
transliteration of "ч". * You transliterate "й" as "j", this
is fine in many languages but wouldn't "y" be better in
English? * In case of "е": how will you know if it is
correct to transliterate it to "e" or "ie" or "je" or "ye"?
These remarks are obviously incomplete, your patch deserves
much more attention to review.
Best regards,
Rafal
Marko Myllynen
2018-10-08 12:40:53 UTC
Permalink
Hi,

Thanks for the update. I have few mostly cosmetic comments below,
hopefully we'll hear from others whether they agree with this direction.

- Please add the standard glibc locale header (see the existing
translit_* files for reference)
- Consider wrapping the header lines at or around column 70-72
- Consider describing which characters, character ranges, or blocks are
supported (perhaps also describe why some of those are not included, see
e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
- Please remove trailing whitespaces and spaces after ;
- No duplicates:

% CYRILLIC SMALL LETTER IE
<U0435> <U0065>; <U0065>

should become:

% CYRILLIC SMALL LETTER IE
<U0435> <U0065>

- There are few issues with the definitions:

% CYRILLIC CAPITAL LETTER U
<U0423> <U0055>; <U0055>
% CYRILLIC UNDEFINED
<U0423><U0423> <U00DA>; "<U0055><U0060>"

% CYRILLIC SMALL LETTER U
<U0443> <U0075>; <U0075>
% CYRILLIC UNDEFINED
<U0443><U0443> <U00FA>; "<U0075><U0060>"

I wonder would it be possible to automate generation of this file so
that issues like the above could avoided? But perhaps that could be the
next step once this initial patch lands.

Thanks,
Post by Egor Kobylkin
After some kind help from Marko in the offline discussion
I realized the multi/single character approach I originally took was
against the of the iconv(1) logic anyway. So there is no harm in
dropping it and adopting Marko's suggestion instead. I will do so and
will resubmit the patch with ISO 9:1995/GOST 7.79 System A + fallback to
GOST 7.79 System B (for ASCII).
However this doesn't resolve the issue for ASCII part being different
for various locales. Again, I am offering the locale maintainers to let
me know if they want to 1) adopt the one I am supplying, 2) write their
own or 3) ignore the patch altogether. Your feedback is appreciated!
The first part (ISO-8859-15 or ASCII) defines the target encoding for
If the string //TRANSLIT is appended to to-encoding, characters
being converted are transliterated when needed and possible. This
means that when a character cannot be represented in the target
character set, it can be approximated through one or sev‐ eral
similar looking characters. Characters that are outside of the
target character set and cannot be transliterated are replaced
with a question mark (?) in the output.
So in the above examples, iconv(1) encounters the character U+0428
which is not part of either of the target encoding and since
//TRANSLIT is specified, iconv(1) tries transliteration according to
the rules defined above, in case of ASCII U+0160 is not part of the
target encoding so the next alternative is used.
Bests,
Egor Kobylkin
Hi,
% CYRILLIC CAPITAL LETTER SHA <U0428> "<U0160>";"<U0053><U0068>"
"printf \\u0428\\n | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv
-f ISO-8859-15 -t UTF-8" would produce Š as per System A and "printf
\\u0428\\n | iconv -f UTF-8 -t ASCII//TRANSLIT" would produce Sh as
per System B.
Thanks,
Post by Egor Kobylkin
Hi Marko,
I have chosen the System B because it is ASCII compartible. System
A is not ASCII compartible (diacritics in target).
https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
"GOST 7.79 contains two transliteration tables.
Post by Egor Kobylkin
System A one Cyrillic character to one Latin character, some with
diacritics – identical to ISO 9:1995
System B one Cyrillic character to one or many Latin characters
without diacritics " Hope this helps, Egor
Post by Marko Myllynen
Hi,
Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
possible and if not, then fall back to GOST 7.79 System B?
Implementation-wise current translit_* files have few examples
where a non-ASCII transliteration is tried first before an ASCII
% NARROW NO-BREAK SPACE <U202F> <U00A0>;<U0020> % REVERSED
TRIPLE PRIME <U2037>
"<U2035><U2035><U2035>";"<U0060><U0060><U0060>"
Thanks,
Post by Egor Kobylkin
Keld,Marko,Rafal, other locale maintainers,
this all is written with having in mind a minimal viable fix
for this bug asap. I want to avoid wasting maintainers time
getting into fundamental discussions here (although for
perfectly good reasons).
I see three options: 1. those locale maintainers that are fine
with using ISO 9:1995/GOST_7.79_System_B cyrillic
transliteration table (Ru) include it in their locales (see
attached screenshot of the table). 2. those that that want to
have a differing table can create their own variety based on
the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and
include it in this patch. 3. those that want to omit a
cyrillic transliteration altogether for now state so and just
carry over the bug #2872 from the year 2006.
Does this make sense to you?
Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not
explicitly targeting transliteration standards of any country.
The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO
9:1995/GOST_7.79_System_B is available and can be helpful to a
majority of cyrillic users b) I have access to it including
via being proficient in Russian.
It is offered to all the respective locale maintainers as a
stopgap solution. Stopgap in the sense that it is better to
have some transliteration than not to have any at all and
carry over the bug from 2006. That it may be a somewhat
officially correct transliteration for ru_RU is a bonus. In
that sense I would dub the discussion on the correctness for
other languages "offtopic". Let me know if this is not OK.
You are all are correctly mentioning the deficiencies of this
approach. However, I couldn't find a better straightforward
approach as of yet. Happy to hear from you as on how this
could be handled.
There is a danger of being caught in the web of
language/country differences. I propose just pruning the
locales that are not comfortable including this current table.
We can address possible solutions in the second wave of
patching.
I am vary of getting into discussions on specific country
variants just because of the sheer complexity of this topic.
It is probably better addressed by respective maintainers of
their locales. I do not see a "one fits all" solution in this
first wave possible.
I would like to have this "three options plan of action"
vetted first and then we could go to the specific detail.
(Like, for instance, what characters should be included in to
the table, and in which transliteration form.)
I am looking forward to your reply, Egor Kobylkin
P.S. specifically as to how address languages other than Ru
included in GOST_7.79_System_B: we can take the first option
left to right from that table (Ru,By,Uk,Bg,Mk). Then it will
technically work for all those locales/languages but with
errors where Ru supersedes their own variants.
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Keld Simonsen
Hi
Please note that translitteration of Cyrillic to latin
is not universal. There are different schemes for for
example German, English and Danish, and there is also an
ISO standard for it.
Thanks for your feedback, Keld!
Could the locale maintainers that wouldn't like to include
this patch explicitly state so here?
I think it is about me so I must reply. I am sorry about
that and the sole reason is my lack of time. I'm just a
volunteer here, that means it's not my regular job to work
on locale data nor anything in glibc nor in any other open
source project. I do these things only in my free time
which I don't have much. Of course you will see my
contributions here and there but they are either trivial or
take me months to complete. Your patches are on my radar but
I can't tell any ETA for them. Of course, there are other
people around here and they are all welcome to come and
join.
Post by Egor Kobylkin
That is: - In the case that there is a different preferred
cyrillic transliteration table for any specific locale
their maintainers may want to point me to it so I can
supply a separate table/patch. - Or they could state
explicitly that for some reason they would like to exclude
their locale from the patch for a default cyrillic
transliteration altogether.
As Keld wrote, there are probably separate rules for every
language so I don't think you should treat your rules as
universal and include them in every locale. At first sight,
it seems to me they work only for English (as a destination
locale). Also, although it is called "transliteration from
Cyrillic" it seems that it covers only Russian alphabet. What
about other languages which use Cyrillic alphabet but add
their own diacritic characters? Think about Belarusian,
Ukrainian, Serbian, Chechen, Chuvash, Mari, Ossetian, Yakut,
Tatar, and more. What about languages which use Cyrillic
alphabet but transliterate their respective letters in a
different way than Russian? For example, Russian "Ъ" is (I
think) usually skipped in transliteration, I think you
propose "``", but when transliterating from Bulgarian they
usually transliterate this as "ă".
* I think you transliterate "щ" as "shh", wouldn't "shch" be
better? * You transliterate "ц" as "cz", wouldn't "ts" be
better? By the way, in Polish language "cz" is a correct
transliteration of "ч". * You transliterate "й" as "j", this
is fine in many languages but wouldn't "y" be better in
English? * In case of "е": how will you know if it is
correct to transliterate it to "e" or "ie" or "je" or "ye"?
These remarks are obviously incomplete, your patch deserves
much more attention to review.
Best regards,
Rafal
--
Marko Myllynen
Rafal Luzynski
2018-10-08 22:23:42 UTC
Permalink
Post by Marko Myllynen
Hi,
Thanks for the update. I have few mostly cosmetic comments below,
hopefully we'll hear from others whether they agree with this direction.
- Please add the standard glibc locale header (see the existing
translit_* files for reference)
- Consider wrapping the header lines at or around column 70-72
- Consider describing which characters, character ranges, or blocks are
supported (perhaps also describe why some of those are not included, see
e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
- Please remove trailing whitespaces and spaces after ;
Thanks for this, Marko. While at this, in the ChangeLog and in the commit
message these paths:

* locales/aa_DJ: likewise

1. Should be a relative path starting in the root directory of glibc source,
that is: "* localedata/locales/aa_DJ".
2. Should be "Likewise." (starting with an uppercase and ending with a dot).
Post by Marko Myllynen
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>; <U0065>
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>
% CYRILLIC CAPITAL LETTER U
<U0423> <U0055>; <U0055>
% CYRILLIC UNDEFINED
<U0423><U0423> <U00DA>; "<U0055><U0060>"
% CYRILLIC SMALL LETTER U
<U0443> <U0075>; <U0075>
% CYRILLIC UNDEFINED
<U0443><U0443> <U00FA>; "<U0075><U0060>"
Are the duplicates here because some Cyrillic letters may have multiple
Latin transliterations depending on the context, for example Cyrillic IE
must be transliterated sometimes as "e", sometimes as "ie", sometimes
as "ye" or "je"? Can we provide rules for groups of characters instead?
Post by Marko Myllynen
I wonder would it be possible to automate generation of this file so
that issues like the above could avoided? But perhaps that could be the
next step once this initial patch lands.
I agree with this.

Regards,

Rafal
Egor Kobylkin
2018-10-08 23:35:57 UTC
Permalink
Post by Rafal Luzynski
Post by Marko Myllynen
Hi,
Thanks for the update. I have few mostly cosmetic comments below,
hopefully we'll hear from others whether they agree with this direction.
Yeah, the earlier we have feedback the more productive we are. I'd be
happy to get much feedback on this as early as possible. So please
everybody concerned please chime in.
Post by Rafal Luzynski
Post by Marko Myllynen
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>; <U0065>
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>
% CYRILLIC CAPITAL LETTER U
<U0423> <U0055>; <U0055>
% CYRILLIC UNDEFINED
<U0423><U0423> <U00DA>; "<U0055><U0060>"
% CYRILLIC SMALL LETTER U
<U0443> <U0075>; <U0075>
% CYRILLIC UNDEFINED
<U0443><U0443> <U00FA>; "<U0075><U0060>"
Are the duplicates here because some Cyrillic letters may have multiple
Latin transliterations depending on the context, for example Cyrillic IE
must be transliterated sometimes as "e", sometimes as "ie", sometimes
as "ye" or "je"? Can we provide rules for groups of characters instead?
No, the duplicates are just by design of my line generating logic. I
have fixed (removed) them. The varying transcription between
languages/locales can not be handled in one file at all as far as I
understood.
Post by Rafal Luzynski
Post by Marko Myllynen
I wonder would it be possible to automate generation of this file so
that issues like the above could avoided? But perhaps that could be the
next step once this initial patch lands.
I am generating the content part of the translit_cyrillc from the
LibreOffice Spreadsheet. Not sure if you had time to view it by now?
https://sourceware.org/bugzilla/attachment.cgi?id=11299

Anyway I have just fixed the issues identified by Marko above in that
spreadsheet. I will do the changes for the below request and then upload
the new translit_cyrillic file to the bugzilla.
Post by Rafal Luzynski
Post by Marko Myllynen
- Please add the standard glibc locale header (see the existing
translit_* files for reference)
- Consider wrapping the header lines at or around column 70-72
- Consider describing which characters, character ranges, or blocks are
supported (perhaps also describe why some of those are not included, see
e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
- Please remove trailing whitespaces and spaces after ;
Thanks for this, Marko. While at this, in the ChangeLog and in the commit
* locales/aa_DJ: likewise
1. Should be a relative path starting in the root directory of glibc source,
that is: "* localedata/locales/aa_DJ".
2. Should be "Likewise." (starting with an uppercase and ending with a dot).
will do.

Bests,
Egor
Egor Kobylkin
2018-10-09 13:18:04 UTC
Permalink
Hi,

I have now implemented all the changes requested for translit_cyrillic
file but started hitting what seems like a bug:

- If the line <U0425> <U0048>;<U0058> is present in translt_cyrillic the
locale compilation fails i.e. grep CYRILLIC < $testfile |
LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8
iconv -f UTF-8 -t ASCII//TRANSLIT is hanging frozen.

- If the line <U0425> <U0048>;<U0058> is absent from translit_cyrillic
everything works, just the transliteration of <U0425> fails as expected
(? is displayed)

- If translit_cyrillic contains <U0425> <U0048>;<U0058> as the _only_
line the transliteration of <U0425> works again (others as ?).

Would you have any idea into what direction should I look? The new
translit_cyrillic is attached.

(<U0425> is % CYRILLIC CAPITAL LETTER HA)

Best regards,
Egor
Post by Egor Kobylkin
Post by Rafal Luzynski
Post by Marko Myllynen
Hi,
Thanks for the update. I have few mostly cosmetic comments below,
hopefully we'll hear from others whether they agree with this direction.
Yeah, the earlier we have feedback the more productive we are. I'd be
happy to get much feedback on this as early as possible. So please
everybody concerned please chime in.
Post by Rafal Luzynski
Post by Marko Myllynen
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>; <U0065>
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>
% CYRILLIC CAPITAL LETTER U
<U0423> <U0055>; <U0055>
% CYRILLIC UNDEFINED
<U0423><U0423> <U00DA>; "<U0055><U0060>"
% CYRILLIC SMALL LETTER U
<U0443> <U0075>; <U0075>
% CYRILLIC UNDEFINED
<U0443><U0443> <U00FA>; "<U0075><U0060>"
Are the duplicates here because some Cyrillic letters may have multiple
Latin transliterations depending on the context, for example Cyrillic IE
must be transliterated sometimes as "e", sometimes as "ie", sometimes
as "ye" or "je"? Can we provide rules for groups of characters instead?
No, the duplicates are just by design of my line generating logic. I
have fixed (removed) them. The varying transcription between
languages/locales can not be handled in one file at all as far as I
understood.
Post by Rafal Luzynski
Post by Marko Myllynen
I wonder would it be possible to automate generation of this file so
that issues like the above could avoided? But perhaps that could be the
next step once this initial patch lands.
I am generating the content part of the translit_cyrillc from the
LibreOffice Spreadsheet. Not sure if you had time to view it by now?
https://sourceware.org/bugzilla/attachment.cgi?id=11299
Anyway I have just fixed the issues identified by Marko above in that
spreadsheet. I will do the changes for the below request and then upload
the new translit_cyrillic file to the bugzilla.
Post by Rafal Luzynski
Post by Marko Myllynen
- Please add the standard glibc locale header (see the existing
translit_* files for reference)
- Consider wrapping the header lines at or around column 70-72
- Consider describing which characters, character ranges, or blocks are
supported (perhaps also describe why some of those are not included, see
e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
- Please remove trailing whitespaces and spaces after ;
Thanks for this, Marko. While at this, in the ChangeLog and in the commit
* locales/aa_DJ: likewise
1. Should be a relative path starting in the root directory of glibc
source,
Post by Rafal Luzynski
that is: "* localedata/locales/aa_DJ".
2. Should be "Likewise." (starting with an uppercase and ending with a
dot).
will do.
Bests,
Egor
Egor Kobylkin
2018-10-09 18:34:08 UTC
Permalink
The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"

The <U0301> is "combining" and obviously it doesn't work if enclosed in
quotes with the letter codepoint. Please let me know if there is another
explanation.

I will now make those changes and generate the patch itself.
Egor
Post by Egor Kobylkin
Hi,
I have now implemented all the changes requested for translit_cyrillic
- If the line <U0425> <U0048>;<U0058> is present in translt_cyrillic the
locale compilation fails i.e. grep CYRILLIC < $testfile |
LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8
iconv -f UTF-8 -t ASCII//TRANSLIT is hanging frozen.
- If the line <U0425> <U0048>;<U0058> is absent from translit_cyrillic
everything works, just the transliteration of <U0425> fails as expected
(? is displayed)
- If translit_cyrillic contains <U0425> <U0048>;<U0058> as the _only_
line the transliteration of <U0425> works again (others as ?).
Would you have any idea into what direction should I look? The new
translit_cyrillic is attached.
(<U0425> is % CYRILLIC CAPITAL LETTER HA)
Best regards,
Egor
Post by Egor Kobylkin
Post by Rafal Luzynski
Post by Marko Myllynen
Hi,
Thanks for the update. I have few mostly cosmetic comments below,
hopefully we'll hear from others whether they agree with this direction.
Yeah, the earlier we have feedback the more productive we are. I'd be
happy to get much feedback on this as early as possible. So please
everybody concerned please chime in.
Post by Rafal Luzynski
Post by Marko Myllynen
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>; <U0065>
% CYRILLIC SMALL LETTER IE
<U0435> <U0065>
% CYRILLIC CAPITAL LETTER U
<U0423> <U0055>; <U0055>
% CYRILLIC UNDEFINED
<U0423><U0423> <U00DA>; "<U0055><U0060>"
% CYRILLIC SMALL LETTER U
<U0443> <U0075>; <U0075>
% CYRILLIC UNDEFINED
<U0443><U0443> <U00FA>; "<U0075><U0060>"
Are the duplicates here because some Cyrillic letters may have multiple
Latin transliterations depending on the context, for example Cyrillic IE
must be transliterated sometimes as "e", sometimes as "ie", sometimes
as "ye" or "je"? Can we provide rules for groups of characters instead?
No, the duplicates are just by design of my line generating logic. I
have fixed (removed) them. The varying transcription between
languages/locales can not be handled in one file at all as far as I
understood.
Post by Rafal Luzynski
Post by Marko Myllynen
I wonder would it be possible to automate generation of this file so
that issues like the above could avoided? But perhaps that could be the
next step once this initial patch lands.
I am generating the content part of the translit_cyrillc from the
LibreOffice Spreadsheet. Not sure if you had time to view it by now?
https://sourceware.org/bugzilla/attachment.cgi?id=11299
Anyway I have just fixed the issues identified by Marko above in that
spreadsheet. I will do the changes for the below request and then upload
the new translit_cyrillic file to the bugzilla.
Post by Rafal Luzynski
Post by Marko Myllynen
- Please add the standard glibc locale header (see the existing
translit_* files for reference)
- Consider wrapping the header lines at or around column 70-72
- Consider describing which characters, character ranges, or blocks are
supported (perhaps also describe why some of those are not included, see
e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
- Please remove trailing whitespaces and spaces after ;
Thanks for this, Marko. While at this, in the ChangeLog and in the commit
* locales/aa_DJ: likewise
1. Should be a relative path starting in the root directory of glibc
source,
Post by Rafal Luzynski
that is: "* localedata/locales/aa_DJ".
2. Should be "Likewise." (starting with an uppercase and ending with a
dot).
will do.
Bests,
Egor
Rafal Luzynski
2018-10-09 22:17:36 UTC
Permalink
Post by Egor Kobylkin
The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"
[...]
I wonder why you need Cyrillic U with acute, and why you comment it
as "undefined" at all. I know that any Cyrillic vowel may appear with
an acute accent but "the diacritic is used only in dictionaries, children's
books, resources for foreign-language learners (...)". [1] So maybe
all vowels with an acute accent should be handled (which I think is fine)
rather than just U.

Regards,

Rafal


[1] https://en.wikipedia.org/wiki/Russian_alphabet#Diacritics
Egor Kobylkin
2018-10-09 22:40:31 UTC
Permalink
Post by Rafal Luzynski
Post by Egor Kobylkin
The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"
[...]
I wonder why you need Cyrillic U with acute, and why you comment it
as "undefined" at all. I know that any Cyrillic vowel may appear with
an acute accent but "the diacritic is used only in dictionaries, children's
books, resources for foreign-language learners (...)". [1] So maybe
all vowels with an acute accent should be handled (which I think is fine)
rather than just U.
I have just taken the https://en.wikipedia.org/wiki/ISO_9 table and
implemented it on Marko's suggestion. Personally I have no opinion on
what letters should be included and under what name. These funny Us just
happened to be in the ISO9 table.

There is no codepoint and no name for <U0423><U0301> and <U0443><U0301>
in Unicode. That’s why its coming through that way from my worksheet as
it does a reverse lookup on the names based on the Unicode codepoints.

Manually we can change it to whatever you’d suggest in the
translit_cyrillic. I just don’t know the right name.

On my side I think I have all outstanding tasks complete for the patch
https://sourceware.org/bugzilla/attachment.cgi?id=11144. So please let
me know explicitly if you'd like anything changed there.

I was planning to rewrite just the commit message according to your
earlier feedback and resubmit sometime soon.

Bests,
Diego
Egor Kobylkin
2018-10-09 22:42:58 UTC
Permalink
Ups, sorry, wrong link to the patch
correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
Post by Egor Kobylkin
Post by Rafal Luzynski
Post by Egor Kobylkin
The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"
[...]
I wonder why you need Cyrillic U with acute, and why you comment it
as "undefined" at all. I know that any Cyrillic vowel may appear with
an acute accent but "the diacritic is used only in dictionaries, children's
books, resources for foreign-language learners (...)". [1] So maybe
all vowels with an acute accent should be handled (which I think is fine)
rather than just U.
I have just taken the https://en.wikipedia.org/wiki/ISO_9 table and
implemented it on Marko's suggestion. Personally I have no opinion on
what letters should be included and under what name. These funny Us just
happened to be in the ISO9 table.
There is no codepoint and no name for <U0423><U0301> and <U0443><U0301>
in Unicode. That’s why its coming through that way from my worksheet as
it does a reverse lookup on the names based on the Unicode codepoints.
Manually we can change it to whatever you’d suggest in the
translit_cyrillic. I just don’t know the right name.
On my side I think I have all outstanding tasks complete for the patch
https://sourceware.org/bugzilla/attachment.cgi?id=11144. So please let
me know explicitly if you'd like anything changed there.
correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
Post by Egor Kobylkin
I was planning to rewrite just the commit message according to your
earlier feedback and resubmit sometime soon.
Bests,
Egor
Marko Myllynen
2018-10-10 11:22:59 UTC
Permalink
Hi,
Post by Egor Kobylkin
Ups, sorry, wrong link to the patch
correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
Although I haven't checked every rule this in general looks very good
(but see below). Not sure do we want to add the few missing characters
mentioned at https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode,
e.g., one instantly notices that U+0400 is missing. (I wouldn't add at
least initially the more exotic characters, like the historic ones,
though.) Perhaps filing a bug or two for these cases for separate
consideration would be ok.
Post by Egor Kobylkin
Post by Egor Kobylkin
Post by Rafal Luzynski
Post by Egor Kobylkin
The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"
[...]
I wonder why you need Cyrillic U with acute, and why you comment it
as "undefined" at all. I know that any Cyrillic vowel may appear with
an acute accent but "the diacritic is used only in dictionaries, children's
books, resources for foreign-language learners (...)". [1] So maybe
all vowels with an acute accent should be handled (which I think is fine)
rather than just U.
I have just taken the https://en.wikipedia.org/wiki/ISO_9 table and
implemented it on Marko's suggestion. Personally I have no opinion on
what letters should be included and under what name. These funny Us just
happened to be in the ISO9 table.
There is no codepoint and no name for <U0423><U0301> and <U0443><U0301>
in Unicode. That’s why its coming through that way from my worksheet as
it does a reverse lookup on the names based on the Unicode codepoints.
Manually we can change it to whatever you’d suggest in the
translit_cyrillic. I just don’t know the right name.
I'm not sure this will work, no existing rule in translit_* files
contain two characters, I'd assume that the rule for U+0423 is applied
first and then the below rule is never used.

% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"

Perhaps this should be commented out or removed altogether if it's not
working as intended.

Thanks,
--
Marko Myllynen
Egor Kobylkin
2018-10-10 12:19:37 UTC
Permalink
Post by Marko Myllynen
Post by Egor Kobylkin
correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
Although I haven't checked every rule this in general looks very good
(but see below).
Not sure do we want to add the few missing characters
mentioned at https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode,
e.g., one instantly notices that U+0400 is missing. (I wouldn't add at
least initially the more exotic characters, like the historic ones,
though.) Perhaps filing a bug or two for these cases for separate
consideration would be ok.
The question here is what should serve as their transliteration and
transcription?
They are not covered by ISO9 neither by GOST 7.79. So maybe it would be
reasonable to assume there is no notable occurrence of those anywhere?

Anyway I am happy to include your specific suggestions for all and any
Unicode quartets in this form:
[Cyrillic Unicode
; ISO9 Latin Transliteration (System A) as Unicode
; Transcription (System B) as (mulitcharacter)ASCII
; name to put in %COMMENT
].
Post by Marko Myllynen
Post by Egor Kobylkin
Post by Egor Kobylkin
Post by Rafal Luzynski
Post by Egor Kobylkin
The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"
[...]
I wonder why you need Cyrillic U with acute, and why you comment it
as "undefined" at all. I know that any Cyrillic vowel may appear with
an acute accent but "the diacritic is used only in dictionaries, children's
books, resources for foreign-language learners (...)". [1] So maybe
all vowels with an acute accent should be handled (which I think is fine)
rather than just U.
I have just taken the https://en.wikipedia.org/wiki/ISO_9 table and
implemented it on Marko's suggestion. Personally I have no opinion on
what letters should be included and under what name. These funny Us just
happened to be in the ISO9 table.
There is no codepoint and no name for <U0423><U0301> and <U0443><U0301>
in Unicode. That’s why its coming through that way from my worksheet as
it does a reverse lookup on the names based on the Unicode codepoints.
Manually we can change it to whatever you’d suggest in the
translit_cyrillic. I just don’t know the right name.
I'm not sure this will work, no existing rule in translit_* files
contain two characters, I'd assume that the rule for U+0423 is applied
first and then the below rule is never used.
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
Perhaps this should be commented out or removed altogether if it's not
working as intended.
here is a result of my test on
https://sourceware.org/bugzilla/attachment.cgi?id=11304

U0423 0301-У́ -> U0423 0301-U
U0443 0301-у́ -> U0443 0301-u

So yes, they are not processed. I would drop them to not to have special
cases. But I am also fine with keeping them because all work is done
already.

Result:
CYRILLIC RUSSIAN S``esh` eshhyo e`tih myagkih francuzskih bulok, da
vypej zhe chayu. SA`ESH` ESHHYO E`TIH MYAGKIH FRANCUZSKIH BULOK? DA
VYPEJ ZHE CHAYU!
CYRILLIC COMPLETE U0401-YO U0402-DJ U0403-G` U0404-Ye U0405-Z` U0406-I
U0407-Yi U0408-J U0409-L` U040A-N` U040B-TSH U040C-K` U040E-U` U040F-Dh
U0410-A U0411-B U0412-V U0413-G U0414-D U0415-E U0416-ZH U0417-Z U0418-I
U0419-J U041A-K U041B-L U041C-M U041D-N U041E-O U041F-P U0420-R U0421-S
U0422-T U0423-U U0423 0301-U U0424-F U0425-H U0426-C U0427-CH U0428-SH
U0429-SHH U042A-`` U042B-Y U042C-` U042D-E` U042E-YU U042F-YA U0430-a
U0431-b U0432-v U0433-g U0434-d U0435-e U0436-zh U0437-z U0438-i U0439-j
U043A-k U043B-l U043C-m U043D-n U043E-o U043F-p U0440-r U0441-s U0442-t
U0443-u U0443 0301-u U0444-f U0445-h U0446-c U0447-ch U0448-sh U0449-shh
U044A-A` U044B-y U044C-` U044D-e` U044E-yu U044F-ya U0451-yo U0452-dj
U0453-g` U0454-ye U0455-z` U0456-i U0457-yi U0458-j U0459-l` U045A-n`
U045B-tsh U045C-k` U045E-u` U045F-dh U046A-O` U046B-o` U0472-Fh U0473-fh
U0474-Yh U0475-yh U048C-E` U048D-e` U0490-G` U0491-g` U0492-GH U0493-gh
U0494-GH U0495-gh U0496-ZH` U0497-zh` U049A-K` U049B-k` U049E-K`
U049F-k` U04A2-N` U04A3-n` U04A4-NG U04A5-ng U04A6-P` U04A7-p` U04A8-O`
U04A9-o` U04AA-C` U04AB-C` U04AC-T` U04AD-t` U04AE-U U04AF-u U04B2-H`
U04B3-h` U04B4-TCZ U04B5-tcz U04BA-SH` U04BB-SH` U04BC-CH` U04BD-ch`
U04BE-CH` U04BF-ch` U04C0-i U04C1-ZH` U04C2-zh` U04CB-CH` U04CC-ch`
U04D0-A` U04D1-a` U04D2-A` U04D3-a` U04D6-E` U04D7-e` U04D8-A` U04D9-a`
U04DC-ZH` U04DD-zh` U04DE-Z` U04DF-z` U04E0-Z` U04E1-z` U04E4-I`
U04E5-i` U04E6-O` U04E7-o` U04E8-O` U04E9-o` U04F0-U` U04F1-u` U04F2-U`
U04F3-u` U04F4-CH` U04F5-ch` U04F8-Y` U04F9-y` U2019-'

Source:
CYRILLIC RUSSIAN Съешь ещё этих мягких французских булок, да выпей же
чаю. СЪЕШЬ ЕЩЁ ЭТИХ МЯГКИХ ФРАНЦУЗСКИХ БУЛОК? ДА ВЫПЕЙ ЖЕ ЧАЮ!
CYRILLIC COMPLETE U0401-Ё U0402-Ђ U0403-Ѓ U0404-Є U0405-Ѕ U0406-І
U0407-Ї U0408-Ј U0409-Љ U040A-Њ U040B-Ћ U040C-Ќ U040E-Ў U040F-Џ U0410-А
U0411-Б U0412-В U0413-Г U0414-Д U0415-Е U0416-Ж U0417-З U0418-И U0419-Й
U041A-К U041B-Л U041C-М U041D-Н U041E-О U041F-П U0420-Р U0421-С U0422-Т
U0423-У U0423 0301-У́ U0424-Ф U0425-Х U0426-Ц U0427-Ч U0428-Ш U0429-Щ
U042A-ъ U042B-Ы U042C-ь U042D-Э U042E-Ю U042F-Я U0430-а U0431-б U0432-в
U0433-г U0434-д U0435-е U0436-ж U0437-з U0438-и U0439-й U043A-к U043B-л
U043C-м U043D-н U043E-о U043F-п U0440-р U0441-с U0442-т U0443-у U0443
0301-у́ U0444-ф U0445-х U0446-ц U0447-ч U0448-ш U0449-щ U044A-Ъ U044B-ы
U044C-Ь U044D-э U044E-ю U044F-я U0451-ё U0452-ђ U0453-ѓ U0454-є U0455-ѕ
U0456-і U0457-ї U0458-ј U0459-љ U045A-њ U045B-ћ U045C-ќ U045E-ў U045F-џ
U046A-Ѫ U046B-ѫ U0472-Ѳ U0473-ѳ U0474-Ѵ U0475-ѵ U048C-Ҍ U048D-ҍ U0490-Ґ
U0491-ґ U0492-Ғ U0493-ғ U0494-Ҕ U0495-ҕ U0496-Җ U0497-җ U049A-Қ U049B-қ
U049E-Ҟ U049F-ҟ U04A2-Ң U04A3-ң U04A4-Ҥ U04A5-ҥ U04A6-Ҧ U04A7-ҧ U04A8-Ҩ
U04A9-ҩ U04AA-Ҫ U04AB-ҫ U04AC-Ҭ U04AD-ҭ U04AE-Ү U04AF-ү U04B2-Ҳ U04B3-ҳ
U04B4-Ҵ U04B5-ҵ U04BA-Һ U04BB-һ U04BC-Ҽ U04BD-ҽ U04BE-Ҿ U04BF-ҿ U04C0-Ӏ
U04C1-Ӂ U04C2-ӂ U04CB-Ӌ U04CC-ӌ U04D0-Ӑ U04D1-ӑ U04D2-Ӓ U04D3-ӓ U04D6-Ӗ
U04D7-ӗ U04D8-Ә U04D9-ә U04DC-Ӝ U04DD-ӝ U04DE-Ӟ U04DF-ӟ U04E0-Ӡ U04E1-ӡ
U04E4-Ӥ U04E5-ӥ U04E6-Ӧ U04E7-ӧ U04E8-Ө U04E9-ө U04F0-Ӱ U04F1-ӱ U04F2-Ӳ
U04F3-ӳ U04F4-Ӵ U04F5-ӵ U04F8-Ӹ U04F9-ӹ U2019-’
Marko Myllynen
2018-10-10 12:34:26 UTC
Permalink
Hi,
Post by Egor Kobylkin
Post by Marko Myllynen
Post by Egor Kobylkin
correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
Although I haven't checked every rule this in general looks very good
(but see below).
Not sure do we want to add the few missing characters
mentioned at https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode,
e.g., one instantly notices that U+0400 is missing. (I wouldn't add at
least initially the more exotic characters, like the historic ones,
though.) Perhaps filing a bug or two for these cases for separate
consideration would be ok.
The question here is what should serve as their transliteration and
transcription?
Not sure, so filing a separate bug about this once your patch is merged
might be the most suitable action for now, I don't think we want to
postpone merging your work further due to these non-ISO 9 cases.
Post by Egor Kobylkin
Post by Marko Myllynen
I'm not sure this will work, no existing rule in translit_* files
contain two characters, I'd assume that the rule for U+0423 is applied
first and then the below rule is never used.
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
Perhaps this should be commented out or removed altogether if it's not
working as intended.
So yes, they are not processed. I would drop them to not to have special
cases. But I am also fine with keeping them because all work is done
already.
I'd probably drop them but I don't feel strongly about this either way.

Thanks for your efforts, I don't have any further comments, I'll leave
this now for Rafal and Mike to provide additional feedback and hopefully
merge soon.

Thanks,
--
Marko Myllynen
Egor Kobylkin
2018-10-10 22:29:08 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=8591 [7]

to localedata/locales/ and include it in all your locales going forward.

Patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


Root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_YU, sr_CS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11302
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-10-11 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
* localedata/locales/aa_DJ: Likewise.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/sd_PK: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.

diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/C 2018-10-09 19:02:45.000000000 +0000
@@ -2293,6 +2293,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-10-09 19:02:12.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-10-09 19:02:45.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-10-09 19:02:12.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-10-09 19:02:45.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-10-09 19:02:12.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-10-09 19:02:45.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-10-09 19:02:12.000000000 +0000
+++ b/localedata/locales/am_ET 2018-10-09 19:02:45.000000000 +0000
@@ -1394,6 +1394,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-10-09 19:02:12.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-10-09 19:02:45.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/be_BY 2018-10-09 19:02:45.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-10-09 19:02:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-10-09 19:02:45.000000000 +0000
@@ -165,6 +165,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-10-09 19:02:45.000000000 +0000
@@ -85,6 +85,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-10-09 19:02:45.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-10-09 19:02:45.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-10-09 19:02:46.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-10-09 19:02:46.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-10-09 19:02:46.000000000 +0000
@@ -71,6 +71,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-10-09 19:02:46.000000000 +0000
@@ -38,6 +38,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cmn_TW b/localedata/locales/cmn_TW
--- a/localedata/locales/cmn_TW 2018-10-09 19:02:13.000000000 +0000
+++ b/localedata/locales/cmn_TW 2018-10-09 19:02:46.000000000 +0000
@@ -49,6 +49,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-10-09 19:02:46.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-10-09 19:02:46.000000000 +0000
@@ -108,6 +108,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-10-09 19:02:46.000000000 +0000
@@ -65,6 +65,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/da_DK 2018-10-09 19:02:46.000000000 +0000
@@ -166,6 +166,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/de_DE 2018-10-09 19:02:46.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-10-09 19:02:46.000000000 +0000
@@ -51,6 +51,7 @@
include "translit_combining";""


+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-10-09 19:02:46.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/el_GR 2018-10-09 19:02:46.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-10-09 19:02:46.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-10-09 19:02:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-10-09 19:02:46.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-10-09 19:02:46.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/es_CU 2018-10-09 19:02:47.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/es_ES 2018-10-09 19:02:47.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-10-09 19:02:47.000000000 +0000
@@ -112,6 +112,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-10-09 19:02:47.000000000 +0000
@@ -78,6 +78,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-10-09 19:02:47.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-10-09 19:02:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-10-09 19:02:47.000000000 +0000
@@ -136,6 +136,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-10-09 19:02:47.000000000 +0000
@@ -58,6 +58,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-10-09 19:02:47.000000000 +0000
@@ -53,6 +53,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-10-09 19:02:47.000000000 +0000
@@ -45,6 +45,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-10-09 19:02:47.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-10-09 19:02:47.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/he_IL 2018-10-09 19:02:47.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-10-09 19:02:47.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-10-09 19:02:47.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-10-09 19:02:47.000000000 +0000
@@ -61,6 +61,7 @@
% transliterate <U0111> {đ} into d + j
<U0111> "<U0064><U006A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-10-09 19:02:48.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-10-09 19:02:48.000000000 +0000
@@ -476,6 +476,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-10-09 19:02:48.000000000 +0000
@@ -75,6 +75,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-10-09 19:02:16.000000000 +0000
+++ b/localedata/locales/id_ID 2018-10-09 19:02:48.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/is_IS 2018-10-09 19:02:48.000000000 +0000
@@ -149,6 +149,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/it_IT 2018-10-09 19:02:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-10-09 19:02:48.000000000 +0000
@@ -1681,6 +1681,7 @@
include "translit_combining";""
include "translit_cjk_variants";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kab_DZ b/localedata/locales/kab_DZ
--- a/localedata/locales/kab_DZ 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/kab_DZ 2018-10-09 19:02:48.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-10-09 19:02:48.000000000 +0000
@@ -157,6 +157,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/km_KH 2018-10-09 19:02:48.000000000 +0000
@@ -42,6 +42,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-10-09 19:02:49.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-10-09 19:02:49.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-10-09 19:02:49.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-10-09 19:02:49.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-10-09 19:02:49.000000000 +0000
@@ -77,6 +77,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "e^"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-10-09 19:02:49.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-10-09 19:02:49.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-10-09 19:02:49.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-10-09 19:02:49.000000000 +0000
@@ -50,6 +50,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-10-09 19:02:49.000000000 +0000
@@ -163,6 +163,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-10-09 19:02:17.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-10-09 19:02:50.000000000 +0000
@@ -110,6 +110,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-10-09 19:02:50.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-10-09 19:02:50.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-10-09 19:02:50.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-10-09 19:02:50.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-10-09 19:02:50.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-10-09 19:02:50.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@latin
b/localedata/locales/***@latin
--- a/localedata/locales/***@latin 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/***@latin 2018-10-09 19:02:50.000000000 +0000
@@ -52,6 +52,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-10-09 19:02:50.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-10-09 19:02:50.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-10-09 19:02:50.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-10-09 19:02:50.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-10-09 19:02:50.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-10-09 19:02:50.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-10-09 19:02:50.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-10-09 19:02:50.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/om_KE 2018-10-09 19:02:50.000000000 +0000
@@ -138,6 +138,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/or_IN 2018-10-09 19:02:51.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/os_RU 2018-10-09 19:02:51.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-10-09 19:02:51.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-10-09 19:02:51.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-10-09 19:02:18.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-10-09 19:02:51.000000000 +0000
@@ -116,6 +116,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-10-09 19:02:51.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-10-09 19:02:51.000000000 +0000
@@ -55,6 +55,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-10-09 19:02:51.000000000 +0000
@@ -143,6 +143,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-10-09 19:02:51.000000000 +0000
@@ -73,6 +73,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-10-09 19:02:51.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-10-09 19:02:51.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-10-09 19:02:51.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@devanagari
b/localedata/locales/***@devanagari
--- a/localedata/locales/***@devanagari 2018-10-09 19:02:19.000000000
+0000
+++ b/localedata/locales/***@devanagari 2018-10-09 19:02:51.000000000
+0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-10-09 19:02:51.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/se_NO 2018-10-09 19:02:51.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-10-09 19:02:51.000000000 +0000
@@ -58,6 +58,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/shn_MM b/localedata/locales/shn_MM
--- a/localedata/locales/shn_MM 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/shn_MM 2018-10-09 19:02:51.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/si_LK 2018-10-09 19:02:51.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-10-09 19:02:52.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-10-09 19:02:52.000000000 +0000
@@ -90,6 +90,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-10-09 19:02:19.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-10-09 19:02:52.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/so_SO 2018-10-09 19:02:52.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-10-09 19:02:52.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-10-09 19:02:52.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-10-09 19:02:52.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-10-09 19:02:52.000000000 +0000
@@ -138,6 +138,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-10-09 19:02:52.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-10-09 19:02:52.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/te_IN 2018-10-09 19:02:52.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/th_TH 2018-10-09 19:02:52.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-10-09 19:02:52.000000000 +0000
@@ -864,6 +864,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>

include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-10-09 19:02:53.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/to_TO 2018-10-09 19:02:53.000000000 +0000
@@ -36,6 +36,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-10-09 19:02:20.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-10-09 19:02:53.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-10-09 19:02:53.000000000 +0000
@@ -2423,6 +2423,7 @@

% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-09 19:02:54.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of cyrillic letters to latin and/or ascii symbols.
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e [U4001-U4F9, U2019] but only the letters covered by ISO 9.1995
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
+
+% Usage examples:
+% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
+% | iconv -f ISO-8859-15 -t UTF-8 # System A
+% iconv -f UTF-8 -t ASCII//TRANSLIT # System B.
+
+% Contributions welcome for the rest of Cyrillic script in Unicode
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
+
+LC_CTYPE
+
+translit_start
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
+% CYRILLIC CAPITAL LETTER DJE
+<U0402> <U0110>;"<U0044><U004A>"
+% CYRILLIC CAPITAL LETTER GJE
+<U0403> <U01F4>;"<U0047><U0060>"
+% CYRILLIC CAPITAL LETTER UKRAINIAN IE
+<U0404> <U00CA>;"<U0059><U0065>"
+% CYRILLIC CAPITAL LETTER DZE
+<U0405> <U1E90>;"<U005A><U0060>"
+% CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0406> <U00CC>;<U0049>
+% CYRILLIC CAPITAL LETTER YI
+<U0407> <U00CF>;"<U0059><U0069>"
+% CYRILLIC CAPITAL LETTER JE
+<U0408> "<U004A><U030C>";<U004A>
+% CYRILLIC CAPITAL LETTER LJE
+<U0409> "<U004C><U0302>";"<U004C><U0060>"
+% CYRILLIC CAPITAL LETTER NJE
+<U040A> "<U004E><U0302>";"<U004E><U0060>"
+% CYRILLIC CAPITAL LETTER TSHE
+<U040B> <U0106>;"<U0054><U0053><U0048>"
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER DZHE
+<U040F> "<U0044><U0302>";"<U0044><U0068>"
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> <U017D>;"<U005A><U0048>"
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0048>;<U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> <U0043>;"<U0043><U005A>"
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> <U010C>;"<U0043><U0048>"
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> <U0160>;"<U0053><U0048>"
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> <U015C>;"<U0053><U0048><U0048>"
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> <U02BA>;"<U0041><U0060>"
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> <U0059>;"<U0059><U0060>"
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U02B9>;<U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> <U00C8>;"<U0045><U0060>"
+% CYRILLIC CAPITAL LETTER YU
+<U042E> <U00DB>;"<U0059><U0055>"
+% CYRILLIC CAPITAL LETTER YA
+<U042F> <U00C2>;"<U0059><U0041>"
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> <U017E>;"<U007A><U0068>"
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0068>;<U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> <U0063>;"<U0063><U007A>"
+% CYRILLIC SMALL LETTER CHE
+<U0447> <U010D>;"<U0063><U0068>"
+% CYRILLIC SMALL LETTER SHA
+<U0448> <U0161>;"<U0073><U0068>"
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> <U015D>;"<U0073><U0068><U0068>"
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> <U02BA>;"<U0060><U0060>"
+% CYRILLIC SMALL LETTER YERU
+<U044B> <U0079>;"<U0079><U0060>"
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U02B9>;<U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> <U00E8>;"<U0065><U0060>"
+% CYRILLIC SMALL LETTER YU
+<U044E> <U00FB>;"<U0079><U0075>"
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
+% CYRILLIC SMALL LETTER DJE
+<U0452> <U0111>;"<U0064><U006A>"
+% CYRILLIC SMALL LETTER GJE
+<U0453> <U01F5>;"<U0067><U0060>"
+% CYRILLIC SMALL LETTER UKRAINIAN IE
+<U0454> <U00EA>;"<U0079><U0065>"
+% CYRILLIC SMALL LETTER DZE
+<U0455> <U1E91>;"<U007A><U0060>"
+% CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0456> <U00EC>;<U0069>
+% CYRILLIC SMALL LETTER YI
+<U0457> <U00EF>;"<U0079><U0069>"
+% CYRILLIC SMALL LETTER JE
+<U0458> <U01F0>;<U006A>
+% CYRILLIC SMALL LETTER LJE
+<U0459> "<U006C><U0302>";"<U006C><U0060>"
+% CYRILLIC SMALL LETTER NJE
+<U045A> "<U006E><U0302>";"<U006E><U0060>"
+% CYRILLIC SMALL LETTER TSHE
+<U045B> <U0107>;"<U0074><U0073><U0068>"
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER FITA
+<U0472> "<U0046><U0300>";"<U0046><U0068>"
+% CYRILLIC SMALL LETTER FITA
+<U0473> "<U0066><U0300>";"<U0066><U0068>"
+% CYRILLIC CAPITAL LETTER IZHITSA
+<U0474> <U1EF2>;"<U0059><U0068>"
+% CYRILLIC SMALL LETTER IZHITSA
+<U0475> <U1EF3>;"<U0079><U0068>"
+% CYRILLIC CAPITAL LETTER SEMISOFT SIGN
+<U048C> <U011A>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER SEMISOFT SIGN
+<U048D> <U011B>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+<U0490> "<U0047><U0300>";"<U0047><U0060>"
+% CYRILLIC SMALL LETTER GHE WITH UPTURN
+<U0491> "<U0067><U0300>";"<U0067><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH STROKE
+<U0492> <U0120>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH STROKE
+<U0493> <U0121>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
+<U0494> <U011E>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
+<U0495> <U011F>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
+<U0496> "<U017D><U0327>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DESCENDER
+<U0497> "<U017E><U0327>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH DESCENDER
+<U049A> <U0136>;"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH DESCENDER
+<U049B> <U0137>;"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH STROKE
+<U049E> "<U004B><U0304>";"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH STROKE
+<U049F> "<U006B><U0304>";"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER EN WITH DESCENDER
+<U04A2> <U1E46>;"<U004E><U0060>"
+% CYRILLIC SMALL LETTER EN WITH DESCENDER
+<U04A3> <U1E47>;"<U006E><U0060>"
+% CYRILLIC CAPITAL LIGATURE EN GHE
+<U04A4> <U1E44>;"<U004E><U0047>"
+% CYRILLIC SMALL LIGATURE EN GHE
+<U04A5> <U1E45>;"<U006E><U0067>"
+% CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
+<U04A6> <U1E54>;"<U0050><U0060>"
+% CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
+<U04A7> <U1E55>;"<U0070><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN HA
+<U04A8> <U00D2>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN HA
+<U04A9> <U00F2>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER ES WITH DESCENDER
+<U04AA> <U00C7>;"<U0043><U0060>"
+% CYRILLIC SMALL LETTER ES WITH DESCENDER
+<U04AB> <U00E7>;"<U0043><U0060>"
+% CYRILLIC CAPITAL LETTER TE WITH DESCENDER
+<U04AC> <U0162>;"<U0054><U0060>"
+% CYRILLIC SMALL LETTER TE WITH DESCENDER
+<U04AD> <U0163>;"<U0074><U0060>"
+% CYRILLIC CAPITAL LETTER STRAIGHT U
+<U04AE> <U00D9>;<U0055>
+% CYRILLIC SMALL LETTER STRAIGHT U
+<U04AF> <U00F9>;<U0075>
+% CYRILLIC CAPITAL LETTER HA WITH DESCENDER
+<U04B2> <U1E28>;"<U0048><U0060>"
+% CYRILLIC SMALL LETTER HA WITH DESCENDER
+<U04B3> <U1E29>;"<U0068><U0060>"
+% CYRILLIC CAPITAL LIGATURE TE TSE
+<U04B4> "<U0043><U0304>";"<U0054><U0043><U005A>"
+% CYRILLIC SMALL LIGATURE TE TSE
+<U04B5> "<U0063><U0304>";"<U0074><U0063><U007A>"
+% CYRILLIC CAPITAL LETTER SHHA
+<U04BA> <U1E24>;"<U0053><U0048><U0060>"
+% CYRILLIC SMALL LETTER SHHA
+<U04BB> <U1E25>;"<U0053><U0048><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE
+<U04BC> "<U0043><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE
+<U04BD> "<U0063><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BE> "<U00C7><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BF> "<U00E7><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC LETTER PALOCHKA
+<U04C0> <U2021>;<U0069>
+% CYRILLIC CAPITAL LETTER ZHE WITH BREVE
+<U04C1> "<U005A><U0306>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH BREVE
+<U04C2> "<U007A><U0306>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
+<U04CB> <U00C7>;"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER KHAKASSIAN CHE
+<U04CC> <U00E7>;"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH BREVE
+<U04D0> <U0102>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH BREVE
+<U04D1> <U0103>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH DIAERESIS
+<U04D2> <U00C4>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH DIAERESIS
+<U04D3> <U00E4>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER IE WITH BREVE
+<U04D6> <U0114>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER IE WITH BREVE
+<U04D7> <U0115>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER SCHWA
+<U04D8> "<U0041><U030B>";"<U0041><U0060>"
+% CYRILLIC SMALL LETTER SCHWA
+<U04D9> "<U0061><U030B>";"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
+<U04DC> "<U005A><U0304>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
+<U04DD> "<U007A><U0304>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
+<U04DE> "<U005A><U0308>";"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ZE WITH DIAERESIS
+<U04DF> "<U007A><U0308>";"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN DZE
+<U04E0> <U0179>;"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN DZE
+<U04E1> <U017A>;"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER I WITH DIAERESIS
+<U04E4> <U00CE>;"<U0049><U0060>"
+% CYRILLIC SMALL LETTER I WITH DIAERESIS
+<U04E5> <U00EE>;"<U0069><U0060>"
+% CYRILLIC CAPITAL LETTER O WITH DIAERESIS
+<U04E6> <U00D6>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER O WITH DIAERESIS
+<U04E7> <U00F6>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER BARRED O
+<U04E8> <U00D4>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BARRED O
+<U04E9> <U00F4>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DIAERESIS
+<U04F0> <U00DC>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DIAERESIS
+<U04F1> <U00FC>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
+<U04F2> <U0170>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
+<U04F3> <U0171>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
+<U04F4> "<U0043><U0308>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER CHE WITH DIAERESIS
+<U04F5> "<U0063><U0308>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
+<U04F8> <U0178>;"<U0059><U0060>"
+% CYRILLIC SMALL LETTER YERU WITH DIAERESIS
+<U04F9> <U00FF>;"<U0079><U0060>"
+% RIGHT SINGLE QUOTATION MARK
+<U2019> <U2035>;<U0027>
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-10-09 19:02:53.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/unm_US 2018-10-09 19:02:53.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-10-09 19:02:53.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-10-09 19:02:53.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-10-09 19:02:53.000000000 +0000
@@ -65,6 +65,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-10-09 19:02:53.000000000 +0000
@@ -57,6 +57,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-10-09 19:02:53.000000000 +0000
@@ -59,6 +59,7 @@
<U00C5> "A<U030A>";"A";"AU"
<U00E5> "a<U030A>";"a";"au"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-10-09 19:02:53.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-10-09 19:02:54.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/yi_US 2018-10-09 19:02:54.000000000 +0000
@@ -66,6 +66,7 @@
<U05F0> "<U05D5><U05D5>";"ww"
<U05F1> "<U05D5><U05D9>";"wj"
<U05F2> "<U05D9><U05D9>";"jj"
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/yuw_PG b/localedata/locales/yuw_PG
--- a/localedata/locales/yuw_PG 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/yuw_PG 2018-10-09 19:02:54.000000000 +0000
@@ -40,6 +40,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-10-09 19:02:54.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-10-09 19:02:21.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-10-09 19:02:54.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Marko Myllynen
2018-10-11 09:59:54 UTC
Permalink
Hi,

Looks like there's one rule after all which might be debatable, I'll
just highlight it and let others to comment and decide what to do with it.
Post by Egor Kobylkin
+% RIGHT SINGLE QUOTATION MARK
+<U2019> <U2035>;<U0027>
translit_neutral (which is included by i18n) has:

% RIGHT SINGLE QUOTATION MARK
<U2019> <U0027> % not <U00B4> because it's often used as an apostrophe

In practice the end result might well be the same (since if U+2019 is
not available then probably U+2035 is neither and both rules produce
U+0027). However, given that translit_cyrillic would be included in
every locale, I'm not sure is this kind of minor discrepancy ok or not.

Thanks,
--
Marko Myllynen
Rafal Luzynski
2018-10-11 11:04:28 UTC
Permalink
Thank you, Egor. I am looking at your patch and although I have
not yet finished, here are some remarks:

First of all, I think that such a large patch should also include
the tests. Please see how automatic tests are performed in locale
data and write your own.
Post by Egor Kobylkin
[...]
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
[...]
I think that eventually we would like to include your translit_cyrillic
also in these locales because I assume that your rules should work good
for them as well, also should include more characters than the individual
language contributors took into account. Similarly to Mike's work on
collation: a common rules were created and all locales include them adding
their own language specific modifications.
Post by Egor Kobylkin
[...]
[...]
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
I am not sure if we want Cyrillic text in the commit message. Shouldn't
it be, uhm, tranlisterated? :-)

"sr_CS" - I guess you meant "sr_RS".

"sr_YU" has been dropped, do we want to mention it?
Post by Egor Kobylkin
[...]
[BZ #2872]
* localedata/locales/translit_cyrillic: add ISO 9.1995, GOST 7.79
Please start "Add" with an uppercase. BTW, shouldn't it be "New file"
instead?
Post by Egor Kobylkin
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
Same, "Add" here.
Post by Egor Kobylkin
* localedata/locales/aa_DJ: Likewise.
Good (here and everywhere below).
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-09 19:02:54.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of cyrillic letters to latin and/or ascii symbols.
"cyrillic" -> "Cyrillic"; "latin" -> "Latin"; "ascii" -> "ASCII".
Post by Egor Kobylkin
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e [U4001-U4F9, U2019] but only the letters covered by ISO 9.1995
Typos:

"i.e" -> "i.e.," (somebody please fix me if I'm wrong here)
"U4001" - I guess you meant "U0401"
"U4F9" -> "U04F9". I think that "U4F9" is not definitely bad but
let's be consistent.

Also I can see some gaps in the range. Are you going to fill them
or maybe for now just mention that they exist?
Post by Egor Kobylkin
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
What is "h/`" diacritics logic?
Post by Egor Kobylkin
+
+% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
+% | iconv -f ISO-8859-15 -t UTF-8 # System A
+% iconv -f UTF-8 -t ASCII//TRANSLIT # System B.
+
+% Contributions welcome for the rest of Cyrillic script in Unicode
Sure, I'm not going to stop you from pushing these changes just because
there are missing characters. I will consider adding them later.
Post by Egor Kobylkin
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
1. Is the file really generated with a script and not modified later?
If yes then maybe you should contribute the script instead? In that case,
you should also not post this file to libc-locale, maintainers and
developers should be able to regenerate it.
2. The link leads to a LibreOffice spreadsheet.
Post by Egor Kobylkin
+LC_CTYPE
+
+translit_start
+
<U0400> is missing here. Are you going to leave it for now?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
[...]
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
<U040D> is missing here. Can we add it already?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
[...]
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
This still makes me wonder.

Does it work at all?
What if we remove this rule, won't it be transliterated as
<U0423> => "U", <U0301> - left unchanged, so "U" + <U0301>"
will eventually produce "Ú"?
Why is it called "UNDEFINED"?
Do we need similar rules for other characters?
Post by Egor Kobylkin
[...]
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
Same here.
Post by Egor Kobylkin
[...]
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
Again <U0450> missing (because it is lowercase variant of <U0400>).
Post by Egor Kobylkin
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
[...]
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
<U045D> missing (same reason as <U040D>).
Post by Egor Kobylkin
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
More letters missing here. Is this because they are historic so we
don't want to include them now? Well, but "YUS" is also historic.
(Please, do not remove YUS for consistency).
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
[...]
I will continue but, again, I don't give any ETA so other reviewers
are welcome here.

Regards,

Rafal
Marko Myllynen
2018-10-11 13:10:49 UTC
Permalink
Hi,
Post by Rafal Luzynski
First of all, I think that such a large patch should also include
the tests. Please see how automatic tests are performed in locale
data and write your own.
Also I can see some gaps in the range. Are you going to fill them
or maybe for now just mention that they exist?
<U040D> is missing here. Can we add it already?
Sure, I'm not going to stop you from pushing these changes just because
there are missing characters. I will consider adding them later.
<U0400> is missing here. Are you going to leave it for now?
See check https://sourceware.org/ml/libc-alpha/2018-10/msg00160.html.
Post by Rafal Luzynski
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
This still makes me wonder.
Does it work at all?
No, see the above link.

More importantly, I realized that ICU uconv(1) I mentioned earlier
should make a great reference for this data; output of the currently
included transliteration rules should match uconv(1) output. If that is
not the case, the patch or uconv(1) might have an issue. If the outputs
match, then we should be able to safely assume the patch is ok.

It could also be considered to use uconv(1) output as reference how the
handle to currently missing characters.

(uconv(1) is part of the icu package on Fedora/CentOS/RHEL/openSUSE.)

Thanks,
--
Marko Myllynen
Volodymyr Lisivka
2018-10-11 13:50:46 UTC
Permalink
Post by Rafal Luzynski
Thank you, Egor. I am looking at your patch and although I have
First of all, I think that such a large patch should also include
the tests. Please see how automatic tests are performed in locale
data and write your own.
Post by Egor Kobylkin
[...]
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
[...]
I think that eventually we would like to include your translit_cyrillic
also in these locales because I assume that your rules should work good
for them as well, also should include more characters than the individual
language contributors took into account.
It's very good idea. Transliteration in Ukrainian locale predates this
work for about decade. It well tested. I also have automatic test
cases, which I can adapt to current standard. Let's drop Russian
transliteration rules and replace them with Ukrainian transliteration
rules. I assume that Ukrainian rules should work good for them as
well.

Ukrainian language is the oldest and most developed language in Slavic
family - last king of all Slavs named Madzhak/Muzhik (Brave), leader
of Volyniana union, was lived in Western Ukraine in Volyn` region.
After Madzhak capturing of Madzhak, kingdom was split into multiple
western parts and eastern part, where 9 Slavic tribes were united by
Rus` tribe, which abandoned their city, now known as Old Russa,
because of epidemic. IMHO, it's will be fair to use rules of the
oldest Slavic union.
Post by Rafal Luzynski
Similarly to Mike's work on
collation: a common rules were created and all locales include them adding
their own language specific modifications.
It's good idea too. In our own locale we prefer that words in our
language will be at top of a sorted list. Currently, in Ukrainian
locale it works as intended, but Russian locale has inverted order.
IMHO, Russian locale should use Ukrainian rules.

$ echo 'один два three four'| tr ' ' '\n' | LANG=uk_UA.utf8 sort
два
один
four
three
$ echo 'один два three four'| tr ' ' '\n' | LANG=ru_RU.utf8 sort
four
three
два
один
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
[...]
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
I am not sure if we want Cyrillic text in the commit message. Shouldn't
it be, uhm, tranlisterated? :-)
"sr_CS" - I guess you meant "sr_RS".
"sr_YU" has been dropped, do we want to mention it?
Post by Egor Kobylkin
[...]
[BZ #2872]
* localedata/locales/translit_cyrillic: add ISO 9.1995, GOST 7.79
Please start "Add" with an uppercase. BTW, shouldn't it be "New file"
instead?
Post by Egor Kobylkin
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
Same, "Add" here.
Post by Egor Kobylkin
* localedata/locales/aa_DJ: Likewise.
Good (here and everywhere below).
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-09 19:02:54.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of cyrillic letters to latin and/or ascii symbols.
"cyrillic" -> "Cyrillic"; "latin" -> "Latin"; "ascii" -> "ASCII".
Post by Egor Kobylkin
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e [U4001-U4F9, U2019] but only the letters covered by ISO 9.1995
"i.e" -> "i.e.," (somebody please fix me if I'm wrong here)
"U4001" - I guess you meant "U0401"
"U4F9" -> "U04F9". I think that "U4F9" is not definitely bad but
let's be consistent.
Also I can see some gaps in the range. Are you going to fill them
or maybe for now just mention that they exist?
Post by Egor Kobylkin
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
What is "h/`" diacritics logic?
Post by Egor Kobylkin
+
+% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
+% | iconv -f ISO-8859-15 -t UTF-8 # System A
+% iconv -f UTF-8 -t ASCII//TRANSLIT # System B.
+
+% Contributions welcome for the rest of Cyrillic script in Unicode
Sure, I'm not going to stop you from pushing these changes just because
there are missing characters. I will consider adding them later.
Post by Egor Kobylkin
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
1. Is the file really generated with a script and not modified later?
If yes then maybe you should contribute the script instead? In that case,
you should also not post this file to libc-locale, maintainers and
developers should be able to regenerate it.
2. The link leads to a LibreOffice spreadsheet.
Post by Egor Kobylkin
+LC_CTYPE
+
+translit_start
+
<U0400> is missing here. Are you going to leave it for now?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
[...]
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
<U040D> is missing here. Can we add it already?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
[...]
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
This still makes me wonder.
Does it work at all?
What if we remove this rule, won't it be transliterated as
<U0423> => "U", <U0301> - left unchanged, so "U" + <U0301>"
will eventually produce "Ú"?
Why is it called "UNDEFINED"?
Do we need similar rules for other characters?
Post by Egor Kobylkin
[...]
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
Same here.
Post by Egor Kobylkin
[...]
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
Again <U0450> missing (because it is lowercase variant of <U0400>).
Post by Egor Kobylkin
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
[...]
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
<U045D> missing (same reason as <U040D>).
Post by Egor Kobylkin
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
More letters missing here. Is this because they are historic so we
don't want to include them now? Well, but "YUS" is also historic.
(Please, do not remove YUS for consistency).
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
[...]
I will continue but, again, I don't give any ETA so other reviewers
are welcome here.
Regards,
Rafal
Egor Kobylkin
2018-10-11 14:59:11 UTC
Permalink
Hi Rafal
Post by Rafal Luzynski
Thank you, Egor. I am looking at your patch and although I have
First of all, I think that such a large patch should also include
the tests. Please see how automatic tests are performed in locale
data and write your own.
Could you please point me to the existing automatic tests?
Locally I am using the test suggested in glibc locales wiki.
From my commit message:
"The glibc wiki explicitly lists this use case as the test example
https://sourceware.org/glibc/wiki/Locales#Testing_Locales :
LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt
"
I am visually checking whether any iconv run fails for all those locales
but you must refer to some automated unit test with a boolean outcome,
right?
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
[...]
I think that eventually we would like to include your translit_cyrillic
also in these locales because I assume that your rules should work good
for them as well, also should include more characters than the individual
language contributors took into account. Similarly to Mike's work on
collation: a common rules were created and all locales include them adding
their own language specific modifications.
This is fine with me. Should anybody supply translit_xxxxxxxxxxxx for
any of the mentioned locales we can include them as well. Wouldn't it be
easier to coordinate those as separate patches though?
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
[...]
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
I am not sure if we want Cyrillic text in the commit message. Shouldn't
it be, uhm, tranlisterated? :-)
I do not see any Cyrillic text in the commit message.
the ?????? you see are the actual "?" symbols coming out of iconv now.
Post by Rafal Luzynski
"sr_CS" - I guess you meant "sr_RS".
"sr_YU" has been dropped, do we want to mention it?
The list of locales and the patch itself is generated from the actual
locales - I do not hand pick them, only exclude the ones in the
exclusion list above.
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
[BZ #2872]
* localedata/locales/translit_cyrillic: add ISO 9.1995, GOST 7.79
Please start "Add" with an uppercase. BTW, shouldn't it be "New file"
instead?
Post by Egor Kobylkin
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
Same, "Add" here.
Post by Egor Kobylkin
* localedata/locales/aa_DJ: Likewise.
Good (here and everywhere below).
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-09 19:02:54.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of cyrillic letters to latin and/or ascii symbols.
"cyrillic" -> "Cyrillic"; "latin" -> "Latin"; "ascii" -> "ASCII".
Post by Egor Kobylkin
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e [U4001-U4F9, U2019] but only the letters covered by ISO 9.1995
"i.e" -> "i.e.," (somebody please fix me if I'm wrong here)
"U4001" - I guess you meant "U0401"
"U4F9" -> "U04F9". I think that "U4F9" is not definitely bad but
let's be consistent.
These are all good catches. I will fix them and resubmit.
Post by Rafal Luzynski
Also I can see some gaps in the range. Are you going to fill them
or maybe for now just mention that they exist?
Post by Egor Kobylkin
Post by Marko Myllynen
Post by Egor Kobylkin
correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
Although I haven't checked every rule this in general looks very good
(but see below).
Not sure do we want to add the few missing characters
mentioned at https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode,
e.g., one instantly notices that U+0400 is missing. (I wouldn't add at
least initially the more exotic characters, like the historic ones,
though.) Perhaps filing a bug or two for these cases for separate
consideration would be ok.
The question here is what should serve as their transliteration and
transcription?
Not sure, so filing a separate bug about this once your patch is merged
might be the most suitable action for now, I don't think we want to
postpone merging your work further due to these non-ISO 9 cases.
Post by Egor Kobylkin
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
What is "h/`" diacritics logic?
Basically some Linguist mentioned that they have chosen "h" and '`" to
represent the diacritics for the transcription (i.e. GOST 7.79 System
B). This way there is some resemblance to the watertight transliteration
as per ISO 9 (Sysetem A) but it is still all in ASCII. We have decided
to extend GOST 7.79 to the all ISO 9 characters and so I have extended
it following that Linguist logic.
Post by Rafal Luzynski
Post by Egor Kobylkin
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
1. Is the file really generated with a script and not modified later?
If yes then maybe you should contribute the script instead? In that case,
you should also not post this file to libc-locale, maintainers and
developers should be able to regenerate it.
2. The link leads to a LibreOffice spreadsheet.
No, I do not have a script. The "generated" means it is a result of
formulas in that spreadsheet. People are welcome to write a script that
should be straightforward implementation of those rules in formulas.
Post by Rafal Luzynski
Post by Egor Kobylkin
+LC_CTYPE
+
+translit_start
+
<U0400> is missing here. Are you going to leave it for now?
Yes, it is to be left out, not in ISO 9. See the exchange with Marko above.
Post by Rafal Luzynski
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
[...]
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
<U040D> is missing here. Can we add it already?
Yes, it is to be left out, not in ISO 9. See the exchange with Marko above.
Post by Rafal Luzynski
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
[...]
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
This still makes me wonder.
Does it work at all?
What if we remove this rule, won't it be transliterated as
<U0423> => "U", <U0301> - left unchanged, so "U" + <U0301>"
will eventually produce "Ú"?
Why is it called "UNDEFINED"?
...
Post by Rafal Luzynski
Post by Egor Kobylkin
Post by Marko Myllynen
I'm not sure this will work, no existing rule in translit_* files
contain two characters, I'd assume that the rule for U+0423 is applied
first and then the below rule is never used.
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
Perhaps this should be commented out or removed altogether if it's not
working as intended.
So yes, they are not processed. I would drop them to not to have special
cases. But I am also fine with keeping them because all work is done
already.
I'd probably drop them but I don't feel strongly about this either way.
Thanks for your efforts, I don't have any further comments, I'll leave
this now for Rafal and Mike to provide additional feedback and hopefully
merge soon.
Could you also please check the discussion with Marko on UNDEFINED and
other related topics? You were on To: or CC: for those emails.
The same for the other characters below.
Post by Rafal Luzynski
Do we need similar rules for other characters?
Post by Egor Kobylkin
[...]
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
Same here.
Post by Egor Kobylkin
[...]
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
Again <U0450> missing (because it is lowercase variant of <U0400>).
Post by Egor Kobylkin
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
[...]
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
<U045D> missing (same reason as <U040D>).
Post by Egor Kobylkin
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
More letters missing here. Is this because they are historic so we
don't want to include them now? Well, but "YUS" is also historic.
(Please, do not remove YUS for consistency).
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
[...]
I will continue but, again, I don't give any ETA so other reviewers
are welcome here.
Regards,
Rafal
Bests,
Egor
Egor Kobylkin
2018-10-11 21:30:48 UTC
Permalink
Post by Egor Kobylkin
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
I am not sure if we want Cyrillic text in the commit message. Shouldn't
it be, uhm, tranlisterated? :-)
I do not see any Cyrillic text in the commit message.
the ?????? you see are the actual "?" symbols coming out of iconv now.
Post by Rafal Luzynski
"sr_CS" - I guess you meant "sr_RS".
"sr_YU" has been dropped, do we want to mention it?
The list of locales and the patch itself is generated from the actual
locales - I do not hand pick them, only exclude the ones in the
exclusion list above.
Ah, yes, that message above should read sr_RS. Will fix.
There is no sr_YU anymore indeed, so I will drop it. No changes to the
patch, just the commit message.

Bests,
Egor
Egor Kobylkin
2018-10-11 15:05:39 UTC
Permalink
Post by Rafal Luzynski
Thank you, Egor. I am looking at your patch and although I have
...
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
[BZ #2872]
* localedata/locales/translit_cyrillic: add ISO 9.1995, GOST 7.79
Please start "Add" with an uppercase. BTW, shouldn't it be "New file"
instead?
"New file or Add" - I don't know. You tell me.
Post by Rafal Luzynski
Post by Egor Kobylkin
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
Same, "Add" here.
Same, please advise.
Bests,
Egor
Egor Kobylkin
2018-10-11 15:44:17 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11317 [7]

to localedata/locales/ and include it in all your locales going forward.

Patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


Root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_YU, sr_CS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11317
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-10-11 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
* localedata/locales/aa_DJ: Likewise.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/sd_PK: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.

diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/C 2018-10-11 15:10:43.000000000 +0000
@@ -2293,6 +2293,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-10-11 15:10:43.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-10-11 15:10:43.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-10-11 15:10:43.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/am_ET 2018-10-11 15:10:43.000000000 +0000
@@ -1394,6 +1394,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-10-11 15:10:43.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/be_BY 2018-10-11 15:10:43.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-10-11 15:10:43.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-10-11 15:10:43.000000000 +0000
@@ -165,6 +165,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-10-11 15:10:44.000000000 +0000
@@ -85,6 +85,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-10-11 15:10:44.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-10-11 15:10:44.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-10-11 15:10:44.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-10-11 15:10:44.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-10-11 15:10:44.000000000 +0000
@@ -71,6 +71,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-10-11 15:10:44.000000000 +0000
@@ -38,6 +38,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cmn_TW b/localedata/locales/cmn_TW
--- a/localedata/locales/cmn_TW 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cmn_TW 2018-10-11 15:10:44.000000000 +0000
@@ -49,6 +49,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-10-11 15:10:44.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-10-11 15:10:44.000000000 +0000
@@ -108,6 +108,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-10-11 15:10:44.000000000 +0000
@@ -65,6 +65,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/da_DK 2018-10-11 15:10:44.000000000 +0000
@@ -166,6 +166,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/de_DE 2018-10-11 15:10:44.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-10-11 15:10:44.000000000 +0000
@@ -51,6 +51,7 @@
include "translit_combining";""


+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-10-11 15:10:44.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/el_GR 2018-10-11 15:10:44.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-10-11 15:10:44.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-10-11 15:10:45.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-10-11 15:10:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/es_CU 2018-10-11 15:10:45.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/es_ES 2018-10-11 15:10:45.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-10-11 15:10:45.000000000 +0000
@@ -112,6 +112,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-10-11 15:10:45.000000000 +0000
@@ -78,6 +78,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-10-11 15:10:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-10-11 15:10:45.000000000 +0000
@@ -136,6 +136,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-10-11 15:10:45.000000000 +0000
@@ -58,6 +58,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-10-11 15:10:45.000000000 +0000
@@ -53,6 +53,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-10-11 15:10:45.000000000 +0000
@@ -45,6 +45,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-10-11 15:10:45.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-10-11 15:10:45.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/he_IL 2018-10-11 15:10:45.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-10-11 15:10:45.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-10-11 15:10:45.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-10-11 15:10:45.000000000 +0000
@@ -61,6 +61,7 @@
% transliterate <U0111> {đ} into d + j
<U0111> "<U0064><U006A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-10-11 15:10:45.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-10-11 15:10:46.000000000 +0000
@@ -476,6 +476,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-10-11 15:10:46.000000000 +0000
@@ -75,6 +75,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/id_ID 2018-10-11 15:10:46.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/is_IS 2018-10-11 15:10:46.000000000 +0000
@@ -149,6 +149,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/it_IT 2018-10-11 15:10:46.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-10-11 15:10:46.000000000 +0000
@@ -1681,6 +1681,7 @@
include "translit_combining";""
include "translit_cjk_variants";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kab_DZ b/localedata/locales/kab_DZ
--- a/localedata/locales/kab_DZ 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kab_DZ 2018-10-11 15:10:46.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-10-11 15:10:46.000000000 +0000
@@ -157,6 +157,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/km_KH 2018-10-11 15:10:46.000000000 +0000
@@ -42,6 +42,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-10-11 15:10:46.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-10-11 15:10:47.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-10-11 15:10:47.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-10-11 15:10:47.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-10-11 15:10:47.000000000 +0000
@@ -77,6 +77,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "e^"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-10-11 15:10:47.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-10-11 15:10:47.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-10-11 15:10:47.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-10-11 15:10:47.000000000 +0000
@@ -50,6 +50,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-10-11 15:10:47.000000000 +0000
@@ -163,6 +163,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-10-11 15:10:47.000000000 +0000
@@ -110,6 +110,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-10-11 15:10:47.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-10-11 15:10:47.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-10-11 15:10:47.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-10-11 15:10:47.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-10-11 15:10:48.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-10-11 15:10:48.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@latin
b/localedata/locales/***@latin
--- a/localedata/locales/***@latin 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/***@latin 2018-10-11 15:10:48.000000000 +0000
@@ -52,6 +52,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-10-11 15:10:48.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-10-11 15:10:48.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-10-11 15:10:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-10-11 15:10:48.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-10-11 15:10:48.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-10-11 15:10:48.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/om_KE 2018-10-11 15:10:48.000000000 +0000
@@ -138,6 +138,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/or_IN 2018-10-11 15:10:48.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/os_RU 2018-10-11 15:10:48.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-10-11 15:10:48.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-10-11 15:10:48.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-10-11 15:10:48.000000000 +0000
@@ -116,6 +116,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-10-11 15:10:48.000000000 +0000
@@ -55,6 +55,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-10-11 15:10:49.000000000 +0000
@@ -143,6 +143,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-10-11 15:10:49.000000000 +0000
@@ -73,6 +73,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-10-11 15:10:49.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-10-11 15:10:49.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-10-11 15:10:49.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@devanagari
b/localedata/locales/***@devanagari
--- a/localedata/locales/***@devanagari 2018-10-11 15:10:18.000000000
+0000
+++ b/localedata/locales/***@devanagari 2018-10-11 15:10:49.000000000
+0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-10-11 15:10:49.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/se_NO 2018-10-11 15:10:49.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-10-11 15:10:49.000000000 +0000
@@ -58,6 +58,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/shn_MM b/localedata/locales/shn_MM
--- a/localedata/locales/shn_MM 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/shn_MM 2018-10-11 15:10:49.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/si_LK 2018-10-11 15:10:49.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-10-11 15:10:49.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-10-11 15:10:49.000000000 +0000
@@ -90,6 +90,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-10-11 15:10:49.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/so_SO 2018-10-11 15:10:49.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-10-11 15:10:49.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-10-11 15:10:49.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-10-11 15:10:50.000000000 +0000
@@ -138,6 +138,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-10-11 15:10:50.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-10-11 15:10:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/te_IN 2018-10-11 15:10:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/th_TH 2018-10-11 15:10:50.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-10-11 15:10:50.000000000 +0000
@@ -864,6 +864,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>

include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/to_TO 2018-10-11 15:10:50.000000000 +0000
@@ -36,6 +36,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-10-11 15:10:50.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-10-11 15:10:50.000000000 +0000
@@ -2423,6 +2423,7 @@

% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-11 15:10:52.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of Cyrillic letters to Latin and/or ASCII symbols.
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e. [U0401-U04F9, U2019] but only the letters covered by ISO 9.1995
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
+
+% Usage examples:
+% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
+% | iconv -f ISO-8859-15 -t UTF-8 # System A
+% iconv -f UTF-8 -t ASCII//TRANSLIT # System B.
+
+% Contributions welcome for the rest of Cyrillic script in Unicode
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
+
+LC_CTYPE
+
+translit_start
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
+% CYRILLIC CAPITAL LETTER DJE
+<U0402> <U0110>;"<U0044><U004A>"
+% CYRILLIC CAPITAL LETTER GJE
+<U0403> <U01F4>;"<U0047><U0060>"
+% CYRILLIC CAPITAL LETTER UKRAINIAN IE
+<U0404> <U00CA>;"<U0059><U0065>"
+% CYRILLIC CAPITAL LETTER DZE
+<U0405> <U1E90>;"<U005A><U0060>"
+% CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0406> <U00CC>;<U0049>
+% CYRILLIC CAPITAL LETTER YI
+<U0407> <U00CF>;"<U0059><U0069>"
+% CYRILLIC CAPITAL LETTER JE
+<U0408> "<U004A><U030C>";<U004A>
+% CYRILLIC CAPITAL LETTER LJE
+<U0409> "<U004C><U0302>";"<U004C><U0060>"
+% CYRILLIC CAPITAL LETTER NJE
+<U040A> "<U004E><U0302>";"<U004E><U0060>"
+% CYRILLIC CAPITAL LETTER TSHE
+<U040B> <U0106>;"<U0054><U0053><U0048>"
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER DZHE
+<U040F> "<U0044><U0302>";"<U0044><U0068>"
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> <U017D>;"<U005A><U0048>"
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0048>;<U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> <U0043>;"<U0043><U005A>"
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> <U010C>;"<U0043><U0048>"
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> <U0160>;"<U0053><U0048>"
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> <U015C>;"<U0053><U0048><U0048>"
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> <U02BA>;"<U0041><U0060>"
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> <U0059>;"<U0059><U0060>"
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U02B9>;<U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> <U00C8>;"<U0045><U0060>"
+% CYRILLIC CAPITAL LETTER YU
+<U042E> <U00DB>;"<U0059><U0055>"
+% CYRILLIC CAPITAL LETTER YA
+<U042F> <U00C2>;"<U0059><U0041>"
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> <U017E>;"<U007A><U0068>"
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0068>;<U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> <U0063>;"<U0063><U007A>"
+% CYRILLIC SMALL LETTER CHE
+<U0447> <U010D>;"<U0063><U0068>"
+% CYRILLIC SMALL LETTER SHA
+<U0448> <U0161>;"<U0073><U0068>"
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> <U015D>;"<U0073><U0068><U0068>"
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> <U02BA>;"<U0060><U0060>"
+% CYRILLIC SMALL LETTER YERU
+<U044B> <U0079>;"<U0079><U0060>"
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U02B9>;<U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> <U00E8>;"<U0065><U0060>"
+% CYRILLIC SMALL LETTER YU
+<U044E> <U00FB>;"<U0079><U0075>"
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
+% CYRILLIC SMALL LETTER DJE
+<U0452> <U0111>;"<U0064><U006A>"
+% CYRILLIC SMALL LETTER GJE
+<U0453> <U01F5>;"<U0067><U0060>"
+% CYRILLIC SMALL LETTER UKRAINIAN IE
+<U0454> <U00EA>;"<U0079><U0065>"
+% CYRILLIC SMALL LETTER DZE
+<U0455> <U1E91>;"<U007A><U0060>"
+% CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0456> <U00EC>;<U0069>
+% CYRILLIC SMALL LETTER YI
+<U0457> <U00EF>;"<U0079><U0069>"
+% CYRILLIC SMALL LETTER JE
+<U0458> <U01F0>;<U006A>
+% CYRILLIC SMALL LETTER LJE
+<U0459> "<U006C><U0302>";"<U006C><U0060>"
+% CYRILLIC SMALL LETTER NJE
+<U045A> "<U006E><U0302>";"<U006E><U0060>"
+% CYRILLIC SMALL LETTER TSHE
+<U045B> <U0107>;"<U0074><U0073><U0068>"
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER FITA
+<U0472> "<U0046><U0300>";"<U0046><U0068>"
+% CYRILLIC SMALL LETTER FITA
+<U0473> "<U0066><U0300>";"<U0066><U0068>"
+% CYRILLIC CAPITAL LETTER IZHITSA
+<U0474> <U1EF2>;"<U0059><U0068>"
+% CYRILLIC SMALL LETTER IZHITSA
+<U0475> <U1EF3>;"<U0079><U0068>"
+% CYRILLIC CAPITAL LETTER SEMISOFT SIGN
+<U048C> <U011A>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER SEMISOFT SIGN
+<U048D> <U011B>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+<U0490> "<U0047><U0300>";"<U0047><U0060>"
+% CYRILLIC SMALL LETTER GHE WITH UPTURN
+<U0491> "<U0067><U0300>";"<U0067><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH STROKE
+<U0492> <U0120>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH STROKE
+<U0493> <U0121>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
+<U0494> <U011E>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
+<U0495> <U011F>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
+<U0496> "<U017D><U0327>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DESCENDER
+<U0497> "<U017E><U0327>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH DESCENDER
+<U049A> <U0136>;"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH DESCENDER
+<U049B> <U0137>;"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH STROKE
+<U049E> "<U004B><U0304>";"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH STROKE
+<U049F> "<U006B><U0304>";"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER EN WITH DESCENDER
+<U04A2> <U1E46>;"<U004E><U0060>"
+% CYRILLIC SMALL LETTER EN WITH DESCENDER
+<U04A3> <U1E47>;"<U006E><U0060>"
+% CYRILLIC CAPITAL LIGATURE EN GHE
+<U04A4> <U1E44>;"<U004E><U0047>"
+% CYRILLIC SMALL LIGATURE EN GHE
+<U04A5> <U1E45>;"<U006E><U0067>"
+% CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
+<U04A6> <U1E54>;"<U0050><U0060>"
+% CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
+<U04A7> <U1E55>;"<U0070><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN HA
+<U04A8> <U00D2>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN HA
+<U04A9> <U00F2>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER ES WITH DESCENDER
+<U04AA> <U00C7>;"<U0043><U0060>"
+% CYRILLIC SMALL LETTER ES WITH DESCENDER
+<U04AB> <U00E7>;"<U0043><U0060>"
+% CYRILLIC CAPITAL LETTER TE WITH DESCENDER
+<U04AC> <U0162>;"<U0054><U0060>"
+% CYRILLIC SMALL LETTER TE WITH DESCENDER
+<U04AD> <U0163>;"<U0074><U0060>"
+% CYRILLIC CAPITAL LETTER STRAIGHT U
+<U04AE> <U00D9>;<U0055>
+% CYRILLIC SMALL LETTER STRAIGHT U
+<U04AF> <U00F9>;<U0075>
+% CYRILLIC CAPITAL LETTER HA WITH DESCENDER
+<U04B2> <U1E28>;"<U0048><U0060>"
+% CYRILLIC SMALL LETTER HA WITH DESCENDER
+<U04B3> <U1E29>;"<U0068><U0060>"
+% CYRILLIC CAPITAL LIGATURE TE TSE
+<U04B4> "<U0043><U0304>";"<U0054><U0043><U005A>"
+% CYRILLIC SMALL LIGATURE TE TSE
+<U04B5> "<U0063><U0304>";"<U0074><U0063><U007A>"
+% CYRILLIC CAPITAL LETTER SHHA
+<U04BA> <U1E24>;"<U0053><U0048><U0060>"
+% CYRILLIC SMALL LETTER SHHA
+<U04BB> <U1E25>;"<U0053><U0048><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE
+<U04BC> "<U0043><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE
+<U04BD> "<U0063><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BE> "<U00C7><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BF> "<U00E7><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC LETTER PALOCHKA
+<U04C0> <U2021>;<U0069>
+% CYRILLIC CAPITAL LETTER ZHE WITH BREVE
+<U04C1> "<U005A><U0306>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH BREVE
+<U04C2> "<U007A><U0306>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
+<U04CB> <U00C7>;"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER KHAKASSIAN CHE
+<U04CC> <U00E7>;"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH BREVE
+<U04D0> <U0102>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH BREVE
+<U04D1> <U0103>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH DIAERESIS
+<U04D2> <U00C4>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH DIAERESIS
+<U04D3> <U00E4>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER IE WITH BREVE
+<U04D6> <U0114>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER IE WITH BREVE
+<U04D7> <U0115>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER SCHWA
+<U04D8> "<U0041><U030B>";"<U0041><U0060>"
+% CYRILLIC SMALL LETTER SCHWA
+<U04D9> "<U0061><U030B>";"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
+<U04DC> "<U005A><U0304>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
+<U04DD> "<U007A><U0304>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
+<U04DE> "<U005A><U0308>";"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ZE WITH DIAERESIS
+<U04DF> "<U007A><U0308>";"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN DZE
+<U04E0> <U0179>;"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN DZE
+<U04E1> <U017A>;"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER I WITH DIAERESIS
+<U04E4> <U00CE>;"<U0049><U0060>"
+% CYRILLIC SMALL LETTER I WITH DIAERESIS
+<U04E5> <U00EE>;"<U0069><U0060>"
+% CYRILLIC CAPITAL LETTER O WITH DIAERESIS
+<U04E6> <U00D6>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER O WITH DIAERESIS
+<U04E7> <U00F6>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER BARRED O
+<U04E8> <U00D4>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BARRED O
+<U04E9> <U00F4>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DIAERESIS
+<U04F0> <U00DC>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DIAERESIS
+<U04F1> <U00FC>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
+<U04F2> <U0170>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
+<U04F3> <U0171>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
+<U04F4> "<U0043><U0308>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER CHE WITH DIAERESIS
+<U04F5> "<U0063><U0308>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
+<U04F8> <U0178>;"<U0059><U0060>"
+% CYRILLIC SMALL LETTER YERU WITH DIAERESIS
+<U04F9> <U00FF>;"<U0079><U0060>"
+% RIGHT SINGLE QUOTATION MARK
+<U2019> <U2035>;<U0027>
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/unm_US 2018-10-11 15:10:51.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-10-11 15:10:51.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-10-11 15:10:51.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -65,6 +65,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-10-11 15:10:51.000000000 +0000
@@ -57,6 +57,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-10-11 15:10:51.000000000 +0000
@@ -59,6 +59,7 @@
<U00C5> "A<U030A>";"A";"AU"
<U00E5> "a<U030A>";"a";"au"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-10-11 15:10:51.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/yi_US 2018-10-11 15:10:51.000000000 +0000
@@ -66,6 +66,7 @@
<U05F0> "<U05D5><U05D5>";"ww"
<U05F1> "<U05D5><U05D9>";"wj"
<U05F2> "<U05D9><U05D9>";"jj"
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/yuw_PG b/localedata/locales/yuw_PG
--- a/localedata/locales/yuw_PG 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/yuw_PG 2018-10-11 15:10:51.000000000 +0000
@@ -40,6 +40,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-10-11 15:10:51.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Egor Kobylkin
2018-10-11 21:33:00 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11317 [7]

to localedata/locales/ and include it in all your locales going forward.

Patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


Root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11317
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-10-11 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: add include "translit_cyrillic";"" to LC_CTYPE
translit section.
* localedata/locales/aa_DJ: Likewise.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/sd_PK: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.

diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/C 2018-10-11 15:10:43.000000000 +0000
@@ -2293,6 +2293,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-10-11 15:10:43.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-10-11 15:10:43.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-10-11 15:10:43.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/am_ET 2018-10-11 15:10:43.000000000 +0000
@@ -1394,6 +1394,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-10-11 15:10:43.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/be_BY 2018-10-11 15:10:43.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-10-11 15:10:43.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-10-11 15:10:43.000000000 +0000
@@ -165,6 +165,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-10-11 15:10:44.000000000 +0000
@@ -85,6 +85,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-10-11 15:10:44.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-10-11 15:10:44.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-10-11 15:10:44.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-10-11 15:10:44.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-10-11 15:10:44.000000000 +0000
@@ -71,6 +71,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-10-11 15:10:44.000000000 +0000
@@ -38,6 +38,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cmn_TW b/localedata/locales/cmn_TW
--- a/localedata/locales/cmn_TW 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cmn_TW 2018-10-11 15:10:44.000000000 +0000
@@ -49,6 +49,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-10-11 15:10:44.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-10-11 15:10:44.000000000 +0000
@@ -108,6 +108,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-10-11 15:10:44.000000000 +0000
@@ -65,6 +65,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/da_DK 2018-10-11 15:10:44.000000000 +0000
@@ -166,6 +166,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/de_DE 2018-10-11 15:10:44.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-10-11 15:10:44.000000000 +0000
@@ -51,6 +51,7 @@
include "translit_combining";""


+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-10-11 15:10:44.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/el_GR 2018-10-11 15:10:44.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-10-11 15:10:44.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-10-11 15:10:45.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-10-11 15:10:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/es_CU 2018-10-11 15:10:45.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/es_ES 2018-10-11 15:10:45.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-10-11 15:10:45.000000000 +0000
@@ -112,6 +112,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-10-11 15:10:45.000000000 +0000
@@ -78,6 +78,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-10-11 15:10:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-10-11 15:10:45.000000000 +0000
@@ -136,6 +136,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-10-11 15:10:45.000000000 +0000
@@ -58,6 +58,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-10-11 15:10:45.000000000 +0000
@@ -53,6 +53,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-10-11 15:10:45.000000000 +0000
@@ -45,6 +45,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-10-11 15:10:45.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-10-11 15:10:45.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/he_IL 2018-10-11 15:10:45.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-10-11 15:10:45.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-10-11 15:10:45.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-10-11 15:10:45.000000000 +0000
@@ -61,6 +61,7 @@
% transliterate <U0111> {đ} into d + j
<U0111> "<U0064><U006A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-10-11 15:10:45.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-10-11 15:10:46.000000000 +0000
@@ -476,6 +476,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-10-11 15:10:46.000000000 +0000
@@ -75,6 +75,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/id_ID 2018-10-11 15:10:46.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/is_IS 2018-10-11 15:10:46.000000000 +0000
@@ -149,6 +149,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/it_IT 2018-10-11 15:10:46.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-10-11 15:10:46.000000000 +0000
@@ -1681,6 +1681,7 @@
include "translit_combining";""
include "translit_cjk_variants";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kab_DZ b/localedata/locales/kab_DZ
--- a/localedata/locales/kab_DZ 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kab_DZ 2018-10-11 15:10:46.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-10-11 15:10:46.000000000 +0000
@@ -157,6 +157,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/km_KH 2018-10-11 15:10:46.000000000 +0000
@@ -42,6 +42,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-10-11 15:10:46.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-10-11 15:10:47.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-10-11 15:10:47.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-10-11 15:10:47.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-10-11 15:10:47.000000000 +0000
@@ -77,6 +77,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "e^"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-10-11 15:10:47.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-10-11 15:10:47.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-10-11 15:10:47.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-10-11 15:10:47.000000000 +0000
@@ -50,6 +50,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-10-11 15:10:47.000000000 +0000
@@ -163,6 +163,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-10-11 15:10:47.000000000 +0000
@@ -110,6 +110,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-10-11 15:10:47.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-10-11 15:10:47.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-10-11 15:10:47.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-10-11 15:10:47.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-10-11 15:10:48.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-10-11 15:10:48.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@latin
b/localedata/locales/***@latin
--- a/localedata/locales/***@latin 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/***@latin 2018-10-11 15:10:48.000000000 +0000
@@ -52,6 +52,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-10-11 15:10:48.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-10-11 15:10:48.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-10-11 15:10:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-10-11 15:10:48.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-10-11 15:10:48.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-10-11 15:10:48.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/om_KE 2018-10-11 15:10:48.000000000 +0000
@@ -138,6 +138,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/or_IN 2018-10-11 15:10:48.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/os_RU 2018-10-11 15:10:48.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-10-11 15:10:48.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-10-11 15:10:48.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-10-11 15:10:48.000000000 +0000
@@ -116,6 +116,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-10-11 15:10:48.000000000 +0000
@@ -55,6 +55,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-10-11 15:10:49.000000000 +0000
@@ -143,6 +143,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-10-11 15:10:49.000000000 +0000
@@ -73,6 +73,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-10-11 15:10:49.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-10-11 15:10:49.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-10-11 15:10:49.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@devanagari
b/localedata/locales/***@devanagari
--- a/localedata/locales/***@devanagari 2018-10-11 15:10:18.000000000
+0000
+++ b/localedata/locales/***@devanagari 2018-10-11 15:10:49.000000000
+0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-10-11 15:10:49.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/se_NO 2018-10-11 15:10:49.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-10-11 15:10:49.000000000 +0000
@@ -58,6 +58,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/shn_MM b/localedata/locales/shn_MM
--- a/localedata/locales/shn_MM 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/shn_MM 2018-10-11 15:10:49.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/si_LK 2018-10-11 15:10:49.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-10-11 15:10:49.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-10-11 15:10:49.000000000 +0000
@@ -90,6 +90,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-10-11 15:10:49.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/so_SO 2018-10-11 15:10:49.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-10-11 15:10:49.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-10-11 15:10:49.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-10-11 15:10:50.000000000 +0000
@@ -138,6 +138,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-10-11 15:10:50.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-10-11 15:10:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/te_IN 2018-10-11 15:10:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/th_TH 2018-10-11 15:10:50.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-10-11 15:10:50.000000000 +0000
@@ -864,6 +864,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>

include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/to_TO 2018-10-11 15:10:50.000000000 +0000
@@ -36,6 +36,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-10-11 15:10:50.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-10-11 15:10:50.000000000 +0000
@@ -2423,6 +2423,7 @@

% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-11 15:10:52.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of Cyrillic letters to Latin and/or ASCII symbols.
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e. [U0401-U04F9, U2019] but only the letters covered by ISO 9.1995
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
+
+% Usage examples:
+% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
+% | iconv -f ISO-8859-15 -t UTF-8 # System A
+% iconv -f UTF-8 -t ASCII//TRANSLIT # System B.
+
+% Contributions welcome for the rest of Cyrillic script in Unicode
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
+
+LC_CTYPE
+
+translit_start
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
+% CYRILLIC CAPITAL LETTER DJE
+<U0402> <U0110>;"<U0044><U004A>"
+% CYRILLIC CAPITAL LETTER GJE
+<U0403> <U01F4>;"<U0047><U0060>"
+% CYRILLIC CAPITAL LETTER UKRAINIAN IE
+<U0404> <U00CA>;"<U0059><U0065>"
+% CYRILLIC CAPITAL LETTER DZE
+<U0405> <U1E90>;"<U005A><U0060>"
+% CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0406> <U00CC>;<U0049>
+% CYRILLIC CAPITAL LETTER YI
+<U0407> <U00CF>;"<U0059><U0069>"
+% CYRILLIC CAPITAL LETTER JE
+<U0408> "<U004A><U030C>";<U004A>
+% CYRILLIC CAPITAL LETTER LJE
+<U0409> "<U004C><U0302>";"<U004C><U0060>"
+% CYRILLIC CAPITAL LETTER NJE
+<U040A> "<U004E><U0302>";"<U004E><U0060>"
+% CYRILLIC CAPITAL LETTER TSHE
+<U040B> <U0106>;"<U0054><U0053><U0048>"
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER DZHE
+<U040F> "<U0044><U0302>";"<U0044><U0068>"
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> <U017D>;"<U005A><U0048>"
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0048>;<U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> <U0043>;"<U0043><U005A>"
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> <U010C>;"<U0043><U0048>"
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> <U0160>;"<U0053><U0048>"
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> <U015C>;"<U0053><U0048><U0048>"
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> <U02BA>;"<U0041><U0060>"
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> <U0059>;"<U0059><U0060>"
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U02B9>;<U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> <U00C8>;"<U0045><U0060>"
+% CYRILLIC CAPITAL LETTER YU
+<U042E> <U00DB>;"<U0059><U0055>"
+% CYRILLIC CAPITAL LETTER YA
+<U042F> <U00C2>;"<U0059><U0041>"
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> <U017E>;"<U007A><U0068>"
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0068>;<U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> <U0063>;"<U0063><U007A>"
+% CYRILLIC SMALL LETTER CHE
+<U0447> <U010D>;"<U0063><U0068>"
+% CYRILLIC SMALL LETTER SHA
+<U0448> <U0161>;"<U0073><U0068>"
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> <U015D>;"<U0073><U0068><U0068>"
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> <U02BA>;"<U0060><U0060>"
+% CYRILLIC SMALL LETTER YERU
+<U044B> <U0079>;"<U0079><U0060>"
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U02B9>;<U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> <U00E8>;"<U0065><U0060>"
+% CYRILLIC SMALL LETTER YU
+<U044E> <U00FB>;"<U0079><U0075>"
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
+% CYRILLIC SMALL LETTER DJE
+<U0452> <U0111>;"<U0064><U006A>"
+% CYRILLIC SMALL LETTER GJE
+<U0453> <U01F5>;"<U0067><U0060>"
+% CYRILLIC SMALL LETTER UKRAINIAN IE
+<U0454> <U00EA>;"<U0079><U0065>"
+% CYRILLIC SMALL LETTER DZE
+<U0455> <U1E91>;"<U007A><U0060>"
+% CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0456> <U00EC>;<U0069>
+% CYRILLIC SMALL LETTER YI
+<U0457> <U00EF>;"<U0079><U0069>"
+% CYRILLIC SMALL LETTER JE
+<U0458> <U01F0>;<U006A>
+% CYRILLIC SMALL LETTER LJE
+<U0459> "<U006C><U0302>";"<U006C><U0060>"
+% CYRILLIC SMALL LETTER NJE
+<U045A> "<U006E><U0302>";"<U006E><U0060>"
+% CYRILLIC SMALL LETTER TSHE
+<U045B> <U0107>;"<U0074><U0073><U0068>"
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER FITA
+<U0472> "<U0046><U0300>";"<U0046><U0068>"
+% CYRILLIC SMALL LETTER FITA
+<U0473> "<U0066><U0300>";"<U0066><U0068>"
+% CYRILLIC CAPITAL LETTER IZHITSA
+<U0474> <U1EF2>;"<U0059><U0068>"
+% CYRILLIC SMALL LETTER IZHITSA
+<U0475> <U1EF3>;"<U0079><U0068>"
+% CYRILLIC CAPITAL LETTER SEMISOFT SIGN
+<U048C> <U011A>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER SEMISOFT SIGN
+<U048D> <U011B>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+<U0490> "<U0047><U0300>";"<U0047><U0060>"
+% CYRILLIC SMALL LETTER GHE WITH UPTURN
+<U0491> "<U0067><U0300>";"<U0067><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH STROKE
+<U0492> <U0120>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH STROKE
+<U0493> <U0121>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
+<U0494> <U011E>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
+<U0495> <U011F>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
+<U0496> "<U017D><U0327>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DESCENDER
+<U0497> "<U017E><U0327>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH DESCENDER
+<U049A> <U0136>;"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH DESCENDER
+<U049B> <U0137>;"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH STROKE
+<U049E> "<U004B><U0304>";"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH STROKE
+<U049F> "<U006B><U0304>";"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER EN WITH DESCENDER
+<U04A2> <U1E46>;"<U004E><U0060>"
+% CYRILLIC SMALL LETTER EN WITH DESCENDER
+<U04A3> <U1E47>;"<U006E><U0060>"
+% CYRILLIC CAPITAL LIGATURE EN GHE
+<U04A4> <U1E44>;"<U004E><U0047>"
+% CYRILLIC SMALL LIGATURE EN GHE
+<U04A5> <U1E45>;"<U006E><U0067>"
+% CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
+<U04A6> <U1E54>;"<U0050><U0060>"
+% CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
+<U04A7> <U1E55>;"<U0070><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN HA
+<U04A8> <U00D2>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN HA
+<U04A9> <U00F2>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER ES WITH DESCENDER
+<U04AA> <U00C7>;"<U0043><U0060>"
+% CYRILLIC SMALL LETTER ES WITH DESCENDER
+<U04AB> <U00E7>;"<U0043><U0060>"
+% CYRILLIC CAPITAL LETTER TE WITH DESCENDER
+<U04AC> <U0162>;"<U0054><U0060>"
+% CYRILLIC SMALL LETTER TE WITH DESCENDER
+<U04AD> <U0163>;"<U0074><U0060>"
+% CYRILLIC CAPITAL LETTER STRAIGHT U
+<U04AE> <U00D9>;<U0055>
+% CYRILLIC SMALL LETTER STRAIGHT U
+<U04AF> <U00F9>;<U0075>
+% CYRILLIC CAPITAL LETTER HA WITH DESCENDER
+<U04B2> <U1E28>;"<U0048><U0060>"
+% CYRILLIC SMALL LETTER HA WITH DESCENDER
+<U04B3> <U1E29>;"<U0068><U0060>"
+% CYRILLIC CAPITAL LIGATURE TE TSE
+<U04B4> "<U0043><U0304>";"<U0054><U0043><U005A>"
+% CYRILLIC SMALL LIGATURE TE TSE
+<U04B5> "<U0063><U0304>";"<U0074><U0063><U007A>"
+% CYRILLIC CAPITAL LETTER SHHA
+<U04BA> <U1E24>;"<U0053><U0048><U0060>"
+% CYRILLIC SMALL LETTER SHHA
+<U04BB> <U1E25>;"<U0053><U0048><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE
+<U04BC> "<U0043><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE
+<U04BD> "<U0063><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BE> "<U00C7><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BF> "<U00E7><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC LETTER PALOCHKA
+<U04C0> <U2021>;<U0069>
+% CYRILLIC CAPITAL LETTER ZHE WITH BREVE
+<U04C1> "<U005A><U0306>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH BREVE
+<U04C2> "<U007A><U0306>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
+<U04CB> <U00C7>;"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER KHAKASSIAN CHE
+<U04CC> <U00E7>;"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH BREVE
+<U04D0> <U0102>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH BREVE
+<U04D1> <U0103>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH DIAERESIS
+<U04D2> <U00C4>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH DIAERESIS
+<U04D3> <U00E4>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER IE WITH BREVE
+<U04D6> <U0114>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER IE WITH BREVE
+<U04D7> <U0115>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER SCHWA
+<U04D8> "<U0041><U030B>";"<U0041><U0060>"
+% CYRILLIC SMALL LETTER SCHWA
+<U04D9> "<U0061><U030B>";"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
+<U04DC> "<U005A><U0304>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
+<U04DD> "<U007A><U0304>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
+<U04DE> "<U005A><U0308>";"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ZE WITH DIAERESIS
+<U04DF> "<U007A><U0308>";"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN DZE
+<U04E0> <U0179>;"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN DZE
+<U04E1> <U017A>;"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER I WITH DIAERESIS
+<U04E4> <U00CE>;"<U0049><U0060>"
+% CYRILLIC SMALL LETTER I WITH DIAERESIS
+<U04E5> <U00EE>;"<U0069><U0060>"
+% CYRILLIC CAPITAL LETTER O WITH DIAERESIS
+<U04E6> <U00D6>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER O WITH DIAERESIS
+<U04E7> <U00F6>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER BARRED O
+<U04E8> <U00D4>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BARRED O
+<U04E9> <U00F4>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DIAERESIS
+<U04F0> <U00DC>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DIAERESIS
+<U04F1> <U00FC>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
+<U04F2> <U0170>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
+<U04F3> <U0171>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
+<U04F4> "<U0043><U0308>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER CHE WITH DIAERESIS
+<U04F5> "<U0063><U0308>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
+<U04F8> <U0178>;"<U0059><U0060>"
+% CYRILLIC SMALL LETTER YERU WITH DIAERESIS
+<U04F9> <U00FF>;"<U0079><U0060>"
+% RIGHT SINGLE QUOTATION MARK
+<U2019> <U2035>;<U0027>
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/unm_US 2018-10-11 15:10:51.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-10-11 15:10:51.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-10-11 15:10:51.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -65,6 +65,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-10-11 15:10:51.000000000 +0000
@@ -57,6 +57,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-10-11 15:10:51.000000000 +0000
@@ -59,6 +59,7 @@
<U00C5> "A<U030A>";"A";"AU"
<U00E5> "a<U030A>";"a";"au"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-10-11 15:10:51.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/yi_US 2018-10-11 15:10:51.000000000 +0000
@@ -66,6 +66,7 @@
<U05F0> "<U05D5><U05D5>";"ww"
<U05F1> "<U05D5><U05D9>";"wj"
<U05F2> "<U05D9><U05D9>";"jj"
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/yuw_PG b/localedata/locales/yuw_PG
--- a/localedata/locales/yuw_PG 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/yuw_PG 2018-10-11 15:10:51.000000000 +0000
@@ -40,6 +40,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-10-11 15:10:51.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Egor Kobylkin
2018-10-12 14:05:59 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11317 [7]

to localedata/locales/ and include it in all your locales going forward.

The patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11317
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-10-11 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/C: Add include "translit_cyrillic";"" to LC_CTYPE
translit section.
* localedata/locales/aa_DJ: Likewise.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/sd_PK: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.

diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/C 2018-10-11 15:10:43.000000000 +0000
@@ -2293,6 +2293,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/aa_DJ b/localedata/locales/aa_DJ
--- a/localedata/locales/aa_DJ 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/aa_DJ 2018-10-11 15:10:43.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/af_ZA b/localedata/locales/af_ZA
--- a/localedata/locales/af_ZA 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/af_ZA 2018-10-11 15:10:43.000000000 +0000
@@ -70,6 +70,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ak_GH b/localedata/locales/ak_GH
--- a/localedata/locales/ak_GH 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/ak_GH 2018-10-11 15:10:43.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/am_ET 2018-10-11 15:10:43.000000000 +0000
@@ -1394,6 +1394,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/ar_EG b/localedata/locales/ar_EG
--- a/localedata/locales/ar_EG 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/ar_EG 2018-10-11 15:10:43.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/be_BY b/localedata/locales/be_BY
--- a/localedata/locales/be_BY 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/be_BY 2018-10-11 15:10:43.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bem_ZM b/localedata/locales/bem_ZM
--- a/localedata/locales/bem_ZM 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bem_ZM 2018-10-11 15:10:43.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_DZ b/localedata/locales/ber_DZ
--- a/localedata/locales/ber_DZ 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/ber_DZ 2018-10-11 15:10:43.000000000 +0000
@@ -165,6 +165,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ber_MA b/localedata/locales/ber_MA
--- a/localedata/locales/ber_MA 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/ber_MA 2018-10-11 15:10:44.000000000 +0000
@@ -85,6 +85,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bg_BG b/localedata/locales/bg_BG
--- a/localedata/locales/bg_BG 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bg_BG 2018-10-11 15:10:44.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bi_VU b/localedata/locales/bi_VU
--- a/localedata/locales/bi_VU 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bi_VU 2018-10-11 15:10:44.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bn_BD b/localedata/locales/bn_BD
--- a/localedata/locales/bn_BD 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bn_BD 2018-10-11 15:10:44.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/bo_CN b/localedata/locales/bo_CN
--- a/localedata/locales/bo_CN 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/bo_CN 2018-10-11 15:10:44.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ca_ES b/localedata/locales/ca_ES
--- a/localedata/locales/ca_ES 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/ca_ES 2018-10-11 15:10:44.000000000 +0000
@@ -71,6 +71,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ce_RU b/localedata/locales/ce_RU
--- a/localedata/locales/ce_RU 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/ce_RU 2018-10-11 15:10:44.000000000 +0000
@@ -38,6 +38,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cmn_TW b/localedata/locales/cmn_TW
--- a/localedata/locales/cmn_TW 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cmn_TW 2018-10-11 15:10:44.000000000 +0000
@@ -49,6 +49,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/cs_CZ b/localedata/locales/cs_CZ
--- a/localedata/locales/cs_CZ 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cs_CZ 2018-10-11 15:10:44.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cv_RU b/localedata/locales/cv_RU
--- a/localedata/locales/cv_RU 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cv_RU 2018-10-11 15:10:44.000000000 +0000
@@ -108,6 +108,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/cy_GB b/localedata/locales/cy_GB
--- a/localedata/locales/cy_GB 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/cy_GB 2018-10-11 15:10:44.000000000 +0000
@@ -65,6 +65,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/da_DK b/localedata/locales/da_DK
--- a/localedata/locales/da_DK 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/da_DK 2018-10-11 15:10:44.000000000 +0000
@@ -166,6 +166,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/de_DE b/localedata/locales/de_DE
--- a/localedata/locales/de_DE 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/de_DE 2018-10-11 15:10:44.000000000 +0000
@@ -78,6 +78,7 @@
% DOUBLE HIGH-REVERSED-9 QUOTATION MARK
<U201F> <U00AB>;<U0022>

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/dv_MV b/localedata/locales/dv_MV
--- a/localedata/locales/dv_MV 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/dv_MV 2018-10-11 15:10:44.000000000 +0000
@@ -51,6 +51,7 @@
include "translit_combining";""


+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/dz_BT b/localedata/locales/dz_BT
--- a/localedata/locales/dz_BT 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/dz_BT 2018-10-11 15:10:44.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/el_GR b/localedata/locales/el_GR
--- a/localedata/locales/el_GR 2018-10-11 15:10:13.000000000 +0000
+++ b/localedata/locales/el_GR 2018-10-11 15:10:44.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_GB b/localedata/locales/en_GB
--- a/localedata/locales/en_GB 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_GB 2018-10-11 15:10:44.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_NG b/localedata/locales/en_NG
--- a/localedata/locales/en_NG 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_NG 2018-10-11 15:10:45.000000000 +0000
@@ -49,6 +49,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/en_ZM b/localedata/locales/en_ZM
--- a/localedata/locales/en_ZM 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/en_ZM 2018-10-11 15:10:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_CU b/localedata/locales/es_CU
--- a/localedata/locales/es_CU 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/es_CU 2018-10-11 15:10:45.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/es_ES b/localedata/locales/es_ES
--- a/localedata/locales/es_ES 2018-10-11 15:10:14.000000000 +0000
+++ b/localedata/locales/es_ES 2018-10-11 15:10:45.000000000 +0000
@@ -72,6 +72,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/et_EE b/localedata/locales/et_EE
--- a/localedata/locales/et_EE 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/et_EE 2018-10-11 15:10:45.000000000 +0000
@@ -112,6 +112,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fa_IR b/localedata/locales/fa_IR
--- a/localedata/locales/fa_IR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fa_IR 2018-10-11 15:10:45.000000000 +0000
@@ -78,6 +78,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ff_SN b/localedata/locales/ff_SN
--- a/localedata/locales/ff_SN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/ff_SN 2018-10-11 15:10:45.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fi_FI b/localedata/locales/fi_FI
--- a/localedata/locales/fi_FI 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fi_FI 2018-10-11 15:10:45.000000000 +0000
@@ -136,6 +136,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/fr_FR b/localedata/locales/fr_FR
--- a/localedata/locales/fr_FR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/fr_FR 2018-10-11 15:10:45.000000000 +0000
@@ -58,6 +58,7 @@
% In France, accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ga_IE b/localedata/locales/ga_IE
--- a/localedata/locales/ga_IE 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/ga_IE 2018-10-11 15:10:45.000000000 +0000
@@ -53,6 +53,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gd_GB b/localedata/locales/gd_GB
--- a/localedata/locales/gd_GB 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gd_GB 2018-10-11 15:10:45.000000000 +0000
@@ -45,6 +45,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gu_IN b/localedata/locales/gu_IN
--- a/localedata/locales/gu_IN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gu_IN 2018-10-11 15:10:45.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/gv_GB b/localedata/locales/gv_GB
--- a/localedata/locales/gv_GB 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/gv_GB 2018-10-11 15:10:45.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/he_IL b/localedata/locales/he_IL
--- a/localedata/locales/he_IL 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/he_IL 2018-10-11 15:10:45.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hi_IN b/localedata/locales/hi_IN
--- a/localedata/locales/hi_IN 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hi_IN 2018-10-11 15:10:45.000000000 +0000
@@ -61,6 +61,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hif_FJ b/localedata/locales/hif_FJ
--- a/localedata/locales/hif_FJ 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hif_FJ 2018-10-11 15:10:45.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hr_HR b/localedata/locales/hr_HR
--- a/localedata/locales/hr_HR 2018-10-11 15:10:15.000000000 +0000
+++ b/localedata/locales/hr_HR 2018-10-11 15:10:45.000000000 +0000
@@ -61,6 +61,7 @@
% transliterate <U0111> {đ} into d + j
<U0111> "<U0064><U006A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ht_HT b/localedata/locales/ht_HT
--- a/localedata/locales/ht_HT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ht_HT 2018-10-11 15:10:45.000000000 +0000
@@ -57,6 +57,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/hu_HU b/localedata/locales/hu_HU
--- a/localedata/locales/hu_HU 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/hu_HU 2018-10-11 15:10:46.000000000 +0000
@@ -476,6 +476,7 @@
<U00FC> "<U0075><U0308>";"<U0075><U00A8>";"<U0075><U003A>"
<U0171> "<U0075><U030B>";"<U0075><U02DD>";"<U0075><U0022>"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/hy_AM b/localedata/locales/hy_AM
--- a/localedata/locales/hy_AM 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/hy_AM 2018-10-11 15:10:46.000000000 +0000
@@ -75,6 +75,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/id_ID b/localedata/locales/id_ID
--- a/localedata/locales/id_ID 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/id_ID 2018-10-11 15:10:46.000000000 +0000
@@ -54,6 +54,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/is_IS b/localedata/locales/is_IS
--- a/localedata/locales/is_IS 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/is_IS 2018-10-11 15:10:46.000000000 +0000
@@ -149,6 +149,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/it_IT b/localedata/locales/it_IT
--- a/localedata/locales/it_IT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/it_IT 2018-10-11 15:10:46.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ja_JP b/localedata/locales/ja_JP
--- a/localedata/locales/ja_JP 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ja_JP 2018-10-11 15:10:46.000000000 +0000
@@ -1681,6 +1681,7 @@
include "translit_combining";""
include "translit_cjk_variants";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/kab_DZ b/localedata/locales/kab_DZ
--- a/localedata/locales/kab_DZ 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kab_DZ 2018-10-11 15:10:46.000000000 +0000
@@ -41,6 +41,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kk_KZ b/localedata/locales/kk_KZ
--- a/localedata/locales/kk_KZ 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kk_KZ 2018-10-11 15:10:46.000000000 +0000
@@ -157,6 +157,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/km_KH b/localedata/locales/km_KH
--- a/localedata/locales/km_KH 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/km_KH 2018-10-11 15:10:46.000000000 +0000
@@ -42,6 +42,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kn_IN b/localedata/locales/kn_IN
--- a/localedata/locales/kn_IN 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kn_IN 2018-10-11 15:10:46.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ko_KR b/localedata/locales/ko_KR
--- a/localedata/locales/ko_KR 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ko_KR 2018-10-11 15:10:47.000000000 +0000
@@ -6099,6 +6099,7 @@
include "translit_combining";""
include "translit_hangul";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/ks_IN b/localedata/locales/ks_IN
--- a/localedata/locales/ks_IN 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ks_IN 2018-10-11 15:10:47.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/kw_GB b/localedata/locales/kw_GB
--- a/localedata/locales/kw_GB 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/kw_GB 2018-10-11 15:10:47.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lb_LU b/localedata/locales/lb_LU
--- a/localedata/locales/lb_LU 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lb_LU 2018-10-11 15:10:47.000000000 +0000
@@ -77,6 +77,7 @@
% LATIN SMALL LETTER E WITH CIRCUMFLEX
<U00EA> "e^"

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/lg_UG b/localedata/locales/lg_UG
--- a/localedata/locales/lg_UG 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lg_UG 2018-10-11 15:10:47.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lij_IT b/localedata/locales/lij_IT
--- a/localedata/locales/lij_IT 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/lij_IT 2018-10-11 15:10:47.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ln_CD b/localedata/locales/ln_CD
--- a/localedata/locales/ln_CD 2018-10-11 15:10:16.000000000 +0000
+++ b/localedata/locales/ln_CD 2018-10-11 15:10:47.000000000 +0000
@@ -39,6 +39,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lo_LA b/localedata/locales/lo_LA
--- a/localedata/locales/lo_LA 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lo_LA 2018-10-11 15:10:47.000000000 +0000
@@ -50,6 +50,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lt_LT b/localedata/locales/lt_LT
--- a/localedata/locales/lt_LT 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lt_LT 2018-10-11 15:10:47.000000000 +0000
@@ -163,6 +163,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/lv_LV b/localedata/locales/lv_LV
--- a/localedata/locales/lv_LV 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/lv_LV 2018-10-11 15:10:47.000000000 +0000
@@ -110,6 +110,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mg_MG b/localedata/locales/mg_MG
--- a/localedata/locales/mg_MG 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mg_MG 2018-10-11 15:10:47.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/mhr_RU b/localedata/locales/mhr_RU
--- a/localedata/locales/mhr_RU 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mhr_RU 2018-10-11 15:10:47.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mk_MK b/localedata/locales/mk_MK
--- a/localedata/locales/mk_MK 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mk_MK 2018-10-11 15:10:47.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ml_IN b/localedata/locales/ml_IN
--- a/localedata/locales/ml_IN 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ml_IN 2018-10-11 15:10:47.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
%
diff -uNr a/localedata/locales/ms_MY b/localedata/locales/ms_MY
--- a/localedata/locales/ms_MY 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ms_MY 2018-10-11 15:10:48.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/mt_MT b/localedata/locales/mt_MT
--- a/localedata/locales/mt_MT 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/mt_MT 2018-10-11 15:10:48.000000000 +0000
@@ -47,6 +47,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@latin
b/localedata/locales/***@latin
--- a/localedata/locales/***@latin 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/***@latin 2018-10-11 15:10:48.000000000 +0000
@@ -52,6 +52,7 @@
% accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/nb_NO b/localedata/locales/nb_NO
--- a/localedata/locales/nb_NO 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nb_NO 2018-10-11 15:10:48.000000000 +0000
@@ -154,6 +154,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ne_NP b/localedata/locales/ne_NP
--- a/localedata/locales/ne_NP 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/ne_NP 2018-10-11 15:10:48.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nhn_MX b/localedata/locales/nhn_MX
--- a/localedata/locales/nhn_MX 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nhn_MX 2018-10-11 15:10:48.000000000 +0000
@@ -59,6 +59,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NU b/localedata/locales/niu_NU
--- a/localedata/locales/niu_NU 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/niu_NU 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/niu_NZ b/localedata/locales/niu_NZ
--- a/localedata/locales/niu_NZ 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/niu_NZ 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nl_NL b/localedata/locales/nl_NL
--- a/localedata/locales/nl_NL 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nl_NL 2018-10-11 15:10:48.000000000 +0000
@@ -56,6 +56,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/nr_ZA b/localedata/locales/nr_ZA
--- a/localedata/locales/nr_ZA 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/nr_ZA 2018-10-11 15:10:48.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/oc_FR b/localedata/locales/oc_FR
--- a/localedata/locales/oc_FR 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/oc_FR 2018-10-11 15:10:48.000000000 +0000
@@ -54,6 +54,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/om_KE b/localedata/locales/om_KE
--- a/localedata/locales/om_KE 2018-10-11 15:10:17.000000000 +0000
+++ b/localedata/locales/om_KE 2018-10-11 15:10:48.000000000 +0000
@@ -138,6 +138,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/or_IN b/localedata/locales/or_IN
--- a/localedata/locales/or_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/or_IN 2018-10-11 15:10:48.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/os_RU b/localedata/locales/os_RU
--- a/localedata/locales/os_RU 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/os_RU 2018-10-11 15:10:48.000000000 +0000
@@ -69,6 +69,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/pa_IN b/localedata/locales/pa_IN
--- a/localedata/locales/pa_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pa_IN 2018-10-11 15:10:48.000000000 +0000
@@ -60,6 +60,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pa_PK b/localedata/locales/pa_PK
--- a/localedata/locales/pa_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pa_PK 2018-10-11 15:10:48.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pl_PL b/localedata/locales/pl_PL
--- a/localedata/locales/pl_PL 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pl_PL 2018-10-11 15:10:48.000000000 +0000
@@ -116,6 +116,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/pt_PT b/localedata/locales/pt_PT
--- a/localedata/locales/pt_PT 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/pt_PT 2018-10-11 15:10:48.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/quz_PE b/localedata/locales/quz_PE
--- a/localedata/locales/quz_PE 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/quz_PE 2018-10-11 15:10:48.000000000 +0000
@@ -55,6 +55,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ro_RO b/localedata/locales/ro_RO
--- a/localedata/locales/ro_RO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/ro_RO 2018-10-11 15:10:49.000000000 +0000
@@ -143,6 +143,7 @@
<U0162> "<U021A>";"<U0054>"
<U0163> "<U021B>";"<U0074>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ru_RU b/localedata/locales/ru_RU
--- a/localedata/locales/ru_RU 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/ru_RU 2018-10-11 15:10:49.000000000 +0000
@@ -73,6 +73,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/rw_RW b/localedata/locales/rw_RW
--- a/localedata/locales/rw_RW 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/rw_RW 2018-10-11 15:10:49.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sa_IN b/localedata/locales/sa_IN
--- a/localedata/locales/sa_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sa_IN 2018-10-11 15:10:49.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_IN b/localedata/locales/sd_IN
--- a/localedata/locales/sd_IN 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_IN 2018-10-11 15:10:49.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/***@devanagari
b/localedata/locales/***@devanagari
--- a/localedata/locales/***@devanagari 2018-10-11 15:10:18.000000000
+0000
+++ b/localedata/locales/***@devanagari 2018-10-11 15:10:49.000000000
+0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-10-11 15:10:49.000000000 +0000
@@ -39,6 +39,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/se_NO b/localedata/locales/se_NO
--- a/localedata/locales/se_NO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/se_NO 2018-10-11 15:10:49.000000000 +0000
@@ -204,6 +204,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sgs_LT b/localedata/locales/sgs_LT
--- a/localedata/locales/sgs_LT 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sgs_LT 2018-10-11 15:10:49.000000000 +0000
@@ -58,6 +58,7 @@
copy "i18n"
translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/shn_MM b/localedata/locales/shn_MM
--- a/localedata/locales/shn_MM 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/shn_MM 2018-10-11 15:10:49.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/si_LK b/localedata/locales/si_LK
--- a/localedata/locales/si_LK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/si_LK 2018-10-11 15:10:49.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sk_SK b/localedata/locales/sk_SK
--- a/localedata/locales/sk_SK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sk_SK 2018-10-11 15:10:49.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sl_SI b/localedata/locales/sl_SI
--- a/localedata/locales/sl_SI 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sl_SI 2018-10-11 15:10:49.000000000 +0000
@@ -90,6 +90,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sm_WS b/localedata/locales/sm_WS
--- a/localedata/locales/sm_WS 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sm_WS 2018-10-11 15:10:49.000000000 +0000
@@ -37,6 +37,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/so_SO b/localedata/locales/so_SO
--- a/localedata/locales/so_SO 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/so_SO 2018-10-11 15:10:49.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sq_AL b/localedata/locales/sq_AL
--- a/localedata/locales/sq_AL 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sq_AL 2018-10-11 15:10:49.000000000 +0000
@@ -45,6 +45,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ss_ZA b/localedata/locales/ss_ZA
--- a/localedata/locales/ss_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ss_ZA 2018-10-11 15:10:49.000000000 +0000
@@ -66,6 +66,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/st_ZA b/localedata/locales/st_ZA
--- a/localedata/locales/st_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/st_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sv_SE b/localedata/locales/sv_SE
--- a/localedata/locales/sv_SE 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/sv_SE 2018-10-11 15:10:50.000000000 +0000
@@ -138,6 +138,7 @@
% LATIN SMALL LETTER O WITH STROKE -> "oe"
<U00F8> "<U006F><U0338>";"<U006F><U0065>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/sw_KE b/localedata/locales/sw_KE
--- a/localedata/locales/sw_KE 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/sw_KE 2018-10-11 15:10:50.000000000 +0000
@@ -43,6 +43,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ta_IN b/localedata/locales/ta_IN
--- a/localedata/locales/ta_IN 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ta_IN 2018-10-11 15:10:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/te_IN b/localedata/locales/te_IN
--- a/localedata/locales/te_IN 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/te_IN 2018-10-11 15:10:50.000000000 +0000
@@ -63,6 +63,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/th_TH b/localedata/locales/th_TH
--- a/localedata/locales/th_TH 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/th_TH 2018-10-11 15:10:50.000000000 +0000
@@ -57,6 +57,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ti_ET b/localedata/locales/ti_ET
--- a/localedata/locales/ti_ET 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/ti_ET 2018-10-11 15:10:50.000000000 +0000
@@ -864,6 +864,7 @@
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>

include "translit_combining";""
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
diff -uNr a/localedata/locales/tn_ZA b/localedata/locales/tn_ZA
--- a/localedata/locales/tn_ZA 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/tn_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -67,6 +67,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/to_TO b/localedata/locales/to_TO
--- a/localedata/locales/to_TO 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/to_TO 2018-10-11 15:10:50.000000000 +0000
@@ -36,6 +36,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tpi_PG b/localedata/locales/tpi_PG
--- a/localedata/locales/tpi_PG 2018-10-11 15:10:19.000000000 +0000
+++ b/localedata/locales/tpi_PG 2018-10-11 15:10:50.000000000 +0000
@@ -44,6 +44,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/tr_TR b/localedata/locales/tr_TR
--- a/localedata/locales/tr_TR 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/tr_TR 2018-10-11 15:10:50.000000000 +0000
@@ -2423,6 +2423,7 @@

% TURKISH LIRA SIGN
<U20BA> "<U0054><U004C>"
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-11 15:10:52.000000000
+0000
@@ -0,0 +1,383 @@
+escape_char /
+comment_char %
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Transliterations of Cyrillic letters to Latin and/or ASCII symbols.
+% Inspired by ISO 9.1995 / GOST 7.79-2000.
+% Covers Unicode Range https://www.unicode.org/charts/PDF/U0400.pdf
+% i.e. [U0401-U04F9, U2019] but only the letters covered by ISO 9.1995
+% It implements the GOST_7.79 System A (Latin Script) as a first
+% option and System B Cyrillic (ASCII) as a second option. Check
+% https://en.wikipedia.org/wiki/ISO_9 for reference.
+% The System B is extended from GOST_7.79-Russian using open sources
+% of the transliteration mappings and the "h/`" diacritics logic.
+
+% Usage examples:
+% iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
+% | iconv -f ISO-8859-15 -t UTF-8 # System A
+% iconv -f UTF-8 -t ASCII//TRANSLIT # System B.
+
+% Contributions welcome for the rest of Cyrillic script in Unicode
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
+% Bugfix for https://sourceware.org/bugzilla/show_bug.cgi?id=2872.
+% Generated from UnicodeData.txt with
+% https://sourceware.org/bugzilla/attachment.cgi?id=11301.
+
+LC_CTYPE
+
+translit_start
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
+% CYRILLIC CAPITAL LETTER DJE
+<U0402> <U0110>;"<U0044><U004A>"
+% CYRILLIC CAPITAL LETTER GJE
+<U0403> <U01F4>;"<U0047><U0060>"
+% CYRILLIC CAPITAL LETTER UKRAINIAN IE
+<U0404> <U00CA>;"<U0059><U0065>"
+% CYRILLIC CAPITAL LETTER DZE
+<U0405> <U1E90>;"<U005A><U0060>"
+% CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0406> <U00CC>;<U0049>
+% CYRILLIC CAPITAL LETTER YI
+<U0407> <U00CF>;"<U0059><U0069>"
+% CYRILLIC CAPITAL LETTER JE
+<U0408> "<U004A><U030C>";<U004A>
+% CYRILLIC CAPITAL LETTER LJE
+<U0409> "<U004C><U0302>";"<U004C><U0060>"
+% CYRILLIC CAPITAL LETTER NJE
+<U040A> "<U004E><U0302>";"<U004E><U0060>"
+% CYRILLIC CAPITAL LETTER TSHE
+<U040B> <U0106>;"<U0054><U0053><U0048>"
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER DZHE
+<U040F> "<U0044><U0302>";"<U0044><U0068>"
+% CYRILLIC CAPITAL LETTER A
+<U0410> <U0041>
+% CYRILLIC CAPITAL LETTER BE
+<U0411> <U0042>
+% CYRILLIC CAPITAL LETTER VE
+<U0412> <U0056>
+% CYRILLIC CAPITAL LETTER GHE
+<U0413> <U0047>
+% CYRILLIC CAPITAL LETTER DE
+<U0414> <U0044>
+% CYRILLIC CAPITAL LETTER IE
+<U0415> <U0045>
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> <U017D>;"<U005A><U0048>"
+% CYRILLIC CAPITAL LETTER ZE
+<U0417> <U005A>
+% CYRILLIC CAPITAL LETTER I
+<U0418> <U0049>
+% CYRILLIC CAPITAL LETTER SHORT I
+<U0419> <U004A>
+% CYRILLIC CAPITAL LETTER KA
+<U041A> <U004B>
+% CYRILLIC CAPITAL LETTER EL
+<U041B> <U004C>
+% CYRILLIC CAPITAL LETTER EM
+<U041C> <U004D>
+% CYRILLIC CAPITAL LETTER EN
+<U041D> <U004E>
+% CYRILLIC CAPITAL LETTER O
+<U041E> <U004F>
+% CYRILLIC CAPITAL LETTER PE
+<U041F> <U0050>
+% CYRILLIC CAPITAL LETTER ER
+<U0420> <U0052>
+% CYRILLIC CAPITAL LETTER ES
+<U0421> <U0053>
+% CYRILLIC CAPITAL LETTER TE
+<U0422> <U0054>
+% CYRILLIC CAPITAL LETTER U
+<U0423> <U0055>
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
+% CYRILLIC CAPITAL LETTER EF
+<U0424> <U0046>
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0048>;<U0058>
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> <U0043>;"<U0043><U005A>"
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> <U010C>;"<U0043><U0048>"
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> <U0160>;"<U0053><U0048>"
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> <U015C>;"<U0053><U0048><U0048>"
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> <U02BA>;"<U0041><U0060>"
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> <U0059>;"<U0059><U0060>"
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U02B9>;<U0060>
+% CYRILLIC CAPITAL LETTER E
+<U042D> <U00C8>;"<U0045><U0060>"
+% CYRILLIC CAPITAL LETTER YU
+<U042E> <U00DB>;"<U0059><U0055>"
+% CYRILLIC CAPITAL LETTER YA
+<U042F> <U00C2>;"<U0059><U0041>"
+% CYRILLIC SMALL LETTER A
+<U0430> <U0061>
+% CYRILLIC SMALL LETTER BE
+<U0431> <U0062>
+% CYRILLIC SMALL LETTER VE
+<U0432> <U0076>
+% CYRILLIC SMALL LETTER GHE
+<U0433> <U0067>
+% CYRILLIC SMALL LETTER DE
+<U0434> <U0064>
+% CYRILLIC SMALL LETTER IE
+<U0435> <U0065>
+% CYRILLIC SMALL LETTER ZHE
+<U0436> <U017E>;"<U007A><U0068>"
+% CYRILLIC SMALL LETTER ZE
+<U0437> <U007A>
+% CYRILLIC SMALL LETTER I
+<U0438> <U0069>
+% CYRILLIC SMALL LETTER SHORT I
+<U0439> <U006A>
+% CYRILLIC SMALL LETTER KA
+<U043A> <U006B>
+% CYRILLIC SMALL LETTER EL
+<U043B> <U006C>
+% CYRILLIC SMALL LETTER EM
+<U043C> <U006D>
+% CYRILLIC SMALL LETTER EN
+<U043D> <U006E>
+% CYRILLIC SMALL LETTER O
+<U043E> <U006F>
+% CYRILLIC SMALL LETTER PE
+<U043F> <U0070>
+% CYRILLIC SMALL LETTER ER
+<U0440> <U0072>
+% CYRILLIC SMALL LETTER ES
+<U0441> <U0073>
+% CYRILLIC SMALL LETTER TE
+<U0442> <U0074>
+% CYRILLIC SMALL LETTER U
+<U0443> <U0075>
+% CYRILLIC UNDEFINED
+<U0443><U0301> <U00FA>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER EF
+<U0444> <U0066>
+% CYRILLIC SMALL LETTER HA
+<U0445> <U0068>;<U0078>
+% CYRILLIC SMALL LETTER TSE
+<U0446> <U0063>;"<U0063><U007A>"
+% CYRILLIC SMALL LETTER CHE
+<U0447> <U010D>;"<U0063><U0068>"
+% CYRILLIC SMALL LETTER SHA
+<U0448> <U0161>;"<U0073><U0068>"
+% CYRILLIC SMALL LETTER SHCHA
+<U0449> <U015D>;"<U0073><U0068><U0068>"
+% CYRILLIC SMALL LETTER HARD SIGN
+<U044A> <U02BA>;"<U0060><U0060>"
+% CYRILLIC SMALL LETTER YERU
+<U044B> <U0079>;"<U0079><U0060>"
+% CYRILLIC SMALL LETTER SOFT SIGN
+<U044C> <U02B9>;<U0060>
+% CYRILLIC SMALL LETTER E
+<U044D> <U00E8>;"<U0065><U0060>"
+% CYRILLIC SMALL LETTER YU
+<U044E> <U00FB>;"<U0079><U0075>"
+% CYRILLIC SMALL LETTER YA
+<U044F> <U00E2>;"<U0079><U0061>"
+% CYRILLIC SMALL LETTER IO
+<U0451> <U00EB>;"<U0079><U006F>"
+% CYRILLIC SMALL LETTER DJE
+<U0452> <U0111>;"<U0064><U006A>"
+% CYRILLIC SMALL LETTER GJE
+<U0453> <U01F5>;"<U0067><U0060>"
+% CYRILLIC SMALL LETTER UKRAINIAN IE
+<U0454> <U00EA>;"<U0079><U0065>"
+% CYRILLIC SMALL LETTER DZE
+<U0455> <U1E91>;"<U007A><U0060>"
+% CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0456> <U00EC>;<U0069>
+% CYRILLIC SMALL LETTER YI
+<U0457> <U00EF>;"<U0079><U0069>"
+% CYRILLIC SMALL LETTER JE
+<U0458> <U01F0>;<U006A>
+% CYRILLIC SMALL LETTER LJE
+<U0459> "<U006C><U0302>";"<U006C><U0060>"
+% CYRILLIC SMALL LETTER NJE
+<U045A> "<U006E><U0302>";"<U006E><U0060>"
+% CYRILLIC SMALL LETTER TSHE
+<U045B> <U0107>;"<U0074><U0073><U0068>"
+% CYRILLIC SMALL LETTER KJE
+<U045C> <U1E31>;"<U006B><U0060>"
+% CYRILLIC SMALL LETTER SHORT U
+<U045E> <U016D>;"<U0075><U0060>"
+% CYRILLIC SMALL LETTER DZHE
+<U045F> "<U0064><U0302>";"<U0064><U0068>"
+% CYRILLIC CAPITAL LETTER BIG YUS
+<U046A> <U01CD>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BIG YUS
+<U046B> <U01CE>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER FITA
+<U0472> "<U0046><U0300>";"<U0046><U0068>"
+% CYRILLIC SMALL LETTER FITA
+<U0473> "<U0066><U0300>";"<U0066><U0068>"
+% CYRILLIC CAPITAL LETTER IZHITSA
+<U0474> <U1EF2>;"<U0059><U0068>"
+% CYRILLIC SMALL LETTER IZHITSA
+<U0475> <U1EF3>;"<U0079><U0068>"
+% CYRILLIC CAPITAL LETTER SEMISOFT SIGN
+<U048C> <U011A>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER SEMISOFT SIGN
+<U048D> <U011B>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+<U0490> "<U0047><U0300>";"<U0047><U0060>"
+% CYRILLIC SMALL LETTER GHE WITH UPTURN
+<U0491> "<U0067><U0300>";"<U0067><U0060>"
+% CYRILLIC CAPITAL LETTER GHE WITH STROKE
+<U0492> <U0120>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH STROKE
+<U0493> <U0121>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
+<U0494> <U011E>;"<U0047><U0048>"
+% CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
+<U0495> <U011F>;"<U0067><U0068>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
+<U0496> "<U017D><U0327>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DESCENDER
+<U0497> "<U017E><U0327>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH DESCENDER
+<U049A> <U0136>;"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH DESCENDER
+<U049B> <U0137>;"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER KA WITH STROKE
+<U049E> "<U004B><U0304>";"<U004B><U0060>"
+% CYRILLIC SMALL LETTER KA WITH STROKE
+<U049F> "<U006B><U0304>";"<U006B><U0060>"
+% CYRILLIC CAPITAL LETTER EN WITH DESCENDER
+<U04A2> <U1E46>;"<U004E><U0060>"
+% CYRILLIC SMALL LETTER EN WITH DESCENDER
+<U04A3> <U1E47>;"<U006E><U0060>"
+% CYRILLIC CAPITAL LIGATURE EN GHE
+<U04A4> <U1E44>;"<U004E><U0047>"
+% CYRILLIC SMALL LIGATURE EN GHE
+<U04A5> <U1E45>;"<U006E><U0067>"
+% CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
+<U04A6> <U1E54>;"<U0050><U0060>"
+% CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
+<U04A7> <U1E55>;"<U0070><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN HA
+<U04A8> <U00D2>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN HA
+<U04A9> <U00F2>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER ES WITH DESCENDER
+<U04AA> <U00C7>;"<U0043><U0060>"
+% CYRILLIC SMALL LETTER ES WITH DESCENDER
+<U04AB> <U00E7>;"<U0043><U0060>"
+% CYRILLIC CAPITAL LETTER TE WITH DESCENDER
+<U04AC> <U0162>;"<U0054><U0060>"
+% CYRILLIC SMALL LETTER TE WITH DESCENDER
+<U04AD> <U0163>;"<U0074><U0060>"
+% CYRILLIC CAPITAL LETTER STRAIGHT U
+<U04AE> <U00D9>;<U0055>
+% CYRILLIC SMALL LETTER STRAIGHT U
+<U04AF> <U00F9>;<U0075>
+% CYRILLIC CAPITAL LETTER HA WITH DESCENDER
+<U04B2> <U1E28>;"<U0048><U0060>"
+% CYRILLIC SMALL LETTER HA WITH DESCENDER
+<U04B3> <U1E29>;"<U0068><U0060>"
+% CYRILLIC CAPITAL LIGATURE TE TSE
+<U04B4> "<U0043><U0304>";"<U0054><U0043><U005A>"
+% CYRILLIC SMALL LIGATURE TE TSE
+<U04B5> "<U0063><U0304>";"<U0074><U0063><U007A>"
+% CYRILLIC CAPITAL LETTER SHHA
+<U04BA> <U1E24>;"<U0053><U0048><U0060>"
+% CYRILLIC SMALL LETTER SHHA
+<U04BB> <U1E25>;"<U0053><U0048><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE
+<U04BC> "<U0043><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE
+<U04BD> "<U0063><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BE> "<U00C7><U0306>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
+<U04BF> "<U00E7><U0306>";"<U0063><U0068><U0060>"
+% CYRILLIC LETTER PALOCHKA
+<U04C0> <U2021>;<U0069>
+% CYRILLIC CAPITAL LETTER ZHE WITH BREVE
+<U04C1> "<U005A><U0306>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH BREVE
+<U04C2> "<U007A><U0306>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
+<U04CB> <U00C7>;"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER KHAKASSIAN CHE
+<U04CC> <U00E7>;"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH BREVE
+<U04D0> <U0102>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH BREVE
+<U04D1> <U0103>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER A WITH DIAERESIS
+<U04D2> <U00C4>;"<U0041><U0060>"
+% CYRILLIC SMALL LETTER A WITH DIAERESIS
+<U04D3> <U00E4>;"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER IE WITH BREVE
+<U04D6> <U0114>;"<U0045><U0060>"
+% CYRILLIC SMALL LETTER IE WITH BREVE
+<U04D7> <U0115>;"<U0065><U0060>"
+% CYRILLIC CAPITAL LETTER SCHWA
+<U04D8> "<U0041><U030B>";"<U0041><U0060>"
+% CYRILLIC SMALL LETTER SCHWA
+<U04D9> "<U0061><U030B>";"<U0061><U0060>"
+% CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
+<U04DC> "<U005A><U0304>";"<U005A><U0048><U0060>"
+% CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
+<U04DD> "<U007A><U0304>";"<U007A><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
+<U04DE> "<U005A><U0308>";"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ZE WITH DIAERESIS
+<U04DF> "<U007A><U0308>";"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER ABKHASIAN DZE
+<U04E0> <U0179>;"<U005A><U0060>"
+% CYRILLIC SMALL LETTER ABKHASIAN DZE
+<U04E1> <U017A>;"<U007A><U0060>"
+% CYRILLIC CAPITAL LETTER I WITH DIAERESIS
+<U04E4> <U00CE>;"<U0049><U0060>"
+% CYRILLIC SMALL LETTER I WITH DIAERESIS
+<U04E5> <U00EE>;"<U0069><U0060>"
+% CYRILLIC CAPITAL LETTER O WITH DIAERESIS
+<U04E6> <U00D6>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER O WITH DIAERESIS
+<U04E7> <U00F6>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER BARRED O
+<U04E8> <U00D4>;"<U004F><U0060>"
+% CYRILLIC SMALL LETTER BARRED O
+<U04E9> <U00F4>;"<U006F><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DIAERESIS
+<U04F0> <U00DC>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DIAERESIS
+<U04F1> <U00FC>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
+<U04F2> <U0170>;"<U0055><U0060>"
+% CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
+<U04F3> <U0171>;"<U0075><U0060>"
+% CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
+<U04F4> "<U0043><U0308>";"<U0043><U0048><U0060>"
+% CYRILLIC SMALL LETTER CHE WITH DIAERESIS
+<U04F5> "<U0063><U0308>";"<U0063><U0068><U0060>"
+% CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
+<U04F8> <U0178>;"<U0059><U0060>"
+% CYRILLIC SMALL LETTER YERU WITH DIAERESIS
+<U04F9> <U00FF>;"<U0079><U0060>"
+% RIGHT SINGLE QUOTATION MARK
+<U2019> <U2035>;<U0027>
+
+translit_end
+
+END LC_CTYPE
diff -uNr a/localedata/locales/ts_ZA b/localedata/locales/ts_ZA
--- a/localedata/locales/ts_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ts_ZA 2018-10-11 15:10:50.000000000 +0000
@@ -62,6 +62,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/unm_US b/localedata/locales/unm_US
--- a/localedata/locales/unm_US 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/unm_US 2018-10-11 15:10:51.000000000 +0000
@@ -48,6 +48,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_IN b/localedata/locales/ur_IN
--- a/localedata/locales/ur_IN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ur_IN 2018-10-11 15:10:51.000000000 +0000
@@ -46,6 +46,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ur_PK b/localedata/locales/ur_PK
--- a/localedata/locales/ur_PK 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ur_PK 2018-10-11 15:10:51.000000000 +0000
@@ -57,6 +57,7 @@
% Farsi yeh -> yeh
<U06CC> "<U064A>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/ve_ZA b/localedata/locales/ve_ZA
--- a/localedata/locales/ve_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/ve_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -65,6 +65,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/vi_VN b/localedata/locales/vi_VN
--- a/localedata/locales/vi_VN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/vi_VN 2018-10-11 15:10:51.000000000 +0000
@@ -57,6 +57,7 @@
% dong sign -> d// -> dd
<U20AB> "<U0111>";"<U0064><U0064>"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wa_BE b/localedata/locales/wa_BE
--- a/localedata/locales/wa_BE 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/wa_BE 2018-10-11 15:10:51.000000000 +0000
@@ -59,6 +59,7 @@
<U00C5> "A<U030A>";"A";"AU"
<U00E5> "a<U030A>";"a";"au"

+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/wo_SN b/localedata/locales/wo_SN
--- a/localedata/locales/wo_SN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/wo_SN 2018-10-11 15:10:51.000000000 +0000
@@ -54,6 +54,7 @@
% Accents are simply omitted if they cannot be represented.
include "translit_combining";""

+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/xh_ZA b/localedata/locales/xh_ZA
--- a/localedata/locales/xh_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/xh_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -64,6 +64,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE

diff -uNr a/localedata/locales/yi_US b/localedata/locales/yi_US
--- a/localedata/locales/yi_US 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/yi_US 2018-10-11 15:10:51.000000000 +0000
@@ -66,6 +66,7 @@
<U05F0> "<U05D5><U05D5>";"ww"
<U05F1> "<U05D5><U05D9>";"wj"
<U05F2> "<U05D9><U05D9>";"jj"
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/yuw_PG b/localedata/locales/yuw_PG
--- a/localedata/locales/yuw_PG 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/yuw_PG 2018-10-11 15:10:51.000000000 +0000
@@ -40,6 +40,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

END LC_CTYPE
diff -uNr a/localedata/locales/zh_CN b/localedata/locales/zh_CN
--- a/localedata/locales/zh_CN 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/zh_CN 2018-10-11 15:10:51.000000000 +0000
@@ -58,6 +58,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end

class "hanzi"; /
diff -uNr a/localedata/locales/zu_ZA b/localedata/locales/zu_ZA
--- a/localedata/locales/zu_ZA 2018-10-11 15:10:20.000000000 +0000
+++ b/localedata/locales/zu_ZA 2018-10-11 15:10:51.000000000 +0000
@@ -68,6 +68,7 @@

translit_start
include "translit_combining";""
+include "translit_cyrillic";""
translit_end
END LC_CTYPE
Rafal Luzynski
2018-10-13 00:59:17 UTC
Permalink
Egor,

Thank you for the update. I took a closer look at your patch so this
time my review is more complete than before although not yet fully complete.

As far as I understand, ISO-9 and its GOST variants are meant to be
universal rather than Russian-specific. Therefore it is correct to place
them in the external file, like translit_cyrillic, and then include this
file in other locales adding locale specific modifications, if required.
For example, if there are any Russian-specific rules not included in this
file, they should go to ru_RU.

The text of the ISO-9 standard is not available in public, have we got
anything better than an article in Wikipedia?

Regarding the format of your commit message, I hesitate to say anything
more because there are more experienced maintainers around here. Please
take a look at the Contribution Checklist. [1]

While at this, what is your legal relationship with GLIBC project? Have
you signed the FSF Copyright Assignment? It is not necessary for the locale
data but it might be necessary if you are going to contribute the testing code.

Regarding the tests, I think there is no complete transliteration test
suite at the moment. Probably the only test is localedata/bug-iconv-trans.c.
You can also see the collation tests placed in the same directory, they
use those multiple *.UTF-8.in files.

You can skip the tests for now.

Technical issue: Please either attach your patch to the email message or
paste it inline, not both. The patch as it is now is not applicable.
I had to edit it manually to apply.
Post by Egor Kobylkin
[...]
From this patch I have excluded locales that already mention cyrillic or
az_AZ
iso14651_t1_common
ky_KG
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
I confirm that these locales are excluded and there are no other missing
locales.
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/C b/localedata/locales/C
--- a/localedata/locales/C 2018-10-11 15:10:12.000000000 +0000
+++ b/localedata/locales/C 2018-10-11 15:10:43.000000000 +0000
There is no such file. Where have you got the source code from? Are you
sure this is glibc? :-)
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/am_ET b/localedata/locales/am_ET
--- a/localedata/locales/am_ET 2018-10-11 15:10:11.000000000 +0000
+++ b/localedata/locales/am_ET 2018-10-11 15:10:43.000000000 +0000
@@ -1394,6 +1394,7 @@
<U137A> <U0060><U0039><U0030>
<U137B> <U0060><U0031><U0030><U0030>
<U137C> <U0060><U0031><U0030><U0030><U0030><U0030>
+include "translit_cyrillic";""
translit_end
%
END LC_CTYPE
Shouldn't “include "translit_cyrillic";""” be placed before the custom rules,
together with other includes? The same in more files, I will not mention
them all.
Post by Egor Kobylkin
[...]
+0000
+0000
Those 3 lines have been broken by the email agent, the patch is not applicable.
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/sd_PK b/localedata/locales/sd_PK
--- a/localedata/locales/sd_PK 2018-10-11 15:10:18.000000000 +0000
+++ b/localedata/locales/sd_PK 2018-10-11 15:10:49.000000000 +0000
There is no such file in glibc.
Post by Egor Kobylkin
[...]
diff -uNr a/localedata/locales/translit_cyrillic
b/localedata/locales/translit_cyrillic
--- a/localedata/locales/translit_cyrillic 1970-01-01 00:00:00.000000000
+0000
+++ b/localedata/locales/translit_cyrillic 2018-10-11 15:10:52.000000000
+0000
Again 3 lines broken, the patch is not applicable.
Post by Egor Kobylkin
[...]
+% Contributions welcome for the rest of Cyrillic script in Unicode
+% https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode.
I am still tempted to add more Cyrillic characters but I understand
that it must be clearly separated which transliteration rules come from
ISO-9 and which are our own invention. But that's not for now.
Post by Egor Kobylkin
[...]
+translit_start
+
+% CYRILLIC CAPITAL LETTER IO
+<U0401> <U00CB>;"<U0059><U004F>"
This says that for ASCII (GOST 7.79 System B) you would like to transliterate
"Ё" as "YO" but the table in Wikipedia says "Yo". I understand that one or
another may be correct depending on the context but we should be consistent
and also better let's stick with the standard.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER DJE
+<U0402> <U0110>;"<U0044><U004A>"
This says "DJ" but System B does not mention it. Where does it come from?
Also, I think it should be "Dj" rather than "DJ".
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER GJE
+<U0403> <U01F4>;"<U0047><U0060>"
Correct, according to both systems.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER UKRAINIAN IE
+<U0404> <U00CA>;"<U0059><U0065>"
"Ye" - correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER DZE
+<U0405> <U1E90>;"<U005A><U0060>"
Correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+<U0406> <U00CC>;<U0049>
Correct. The table mentions an alternative transliteration "I`" but
says that it is "only before vowels for Old Russian and Old Bulgarian".
I think we can skip this other variant.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER YI
+<U0407> <U00CF>;"<U0059><U0069>"
"Yi" - correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER JE
+<U0408> "<U004A><U030C>";<U004A>
Correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER LJE
+<U0409> "<U004C><U0302>";"<U004C><U0060>"
Correct, according to the standard. If Serbian language requires "Lj"
then overrides should go to sr_RS file.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER NJE
+<U040A> "<U004E><U0302>";"<U004E><U0060>"
Correct, the same comment.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER TSHE
+<U040B> <U0106>;"<U0054><U0053><U0048>"
Where does "TSH" come from? It is not mentioned by the System B table.
Also I am afraid this is not correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER KJE
+<U040C> <U1E30>;"<U004B><U0060>"
Correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SHORT U
+<U040E> <U016C>;"<U0055><U0060>"
"U`" - correct.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER DZHE
+<U040F> "<U0044><U0302>";"<U0044><U0068>"
"Dh" - correct.
Post by Egor Kobylkin
[...]
+% CYRILLIC CAPITAL LETTER ZHE
+<U0416> <U017D>;"<U005A><U0048>"
"ZH" - shouldn't be "Zh"?
Post by Egor Kobylkin
[...]
+% CYRILLIC UNDEFINED
+<U0423><U0301> <U00DA>;"<U0055><U0060>"
1. I think it should be named "CYRILLIC CAPITAL LETTER U WITH ACUTE".
2. OK, the System A table mentions this letter but System B does not.
Somehow we should handle it. I think that "U`" is the best we can
do for now.
3. It must be tested whether this actually works.
Post by Egor Kobylkin
[...]
+% CYRILLIC CAPITAL LETTER HA
+<U0425> <U0048>;<U0058>
I don't think that "H" is unavailable in any encoding therefore it will
always be transliterated as "H" and never as "X". We can't help it and
I don't think it is bad.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER TSE
+<U0426> <U0043>;"<U0043><U005A>"
1. "CZ" - maybe should be "Cz"?
2. Are we able to implement the rule: "c before i, e, y, j"?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER CHE
+<U0427> <U010C>;"<U0043><U0048>"
"CH" -> "Ch"?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SHA
+<U0428> <U0160>;"<U0053><U0048>"
"SH" -> "Sh"?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SHCHA
+<U0429> <U015C>;"<U0053><U0048><U0048>"
"SHH" -> "Shh"?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER HARD SIGN
+<U042A> <U02BA>;"<U0041><U0060>"
"A`" is only for Bulgarian and should go to bg_BG. How should
we transliterate an upper case hard sign to plain ASCII? I think
that just "``", same as lower case.
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER YERU
+<U042B> <U0059>;"<U0059><U0060>"
Again, as "Y" is always available it will never be transliterated
as "Y`".
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER SOFT SIGN
+<U042C> <U02B9>;<U0060>
OK, I like it to be transliterated to plain ASCII as "`".
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER E
+<U042D> <U00C8>;"<U0045><U0060>"
OK
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER YU
+<U042E> <U00DB>;"<U0059><U0055>"
"YU" -> "Yu"?
Post by Egor Kobylkin
+% CYRILLIC CAPITAL LETTER YA
+<U042F> <U00C2>;"<U0059><U0041>"
"YA" -> "Ya"?
Post by Egor Kobylkin
[...]
I am sorry, this is of course incomplete but that's enough for tonight.

Regards,

Rafal


[1] https://sourceware.org/glibc/wiki/Contribution%20checklist
Egor Kobylkin
2018-10-13 16:58:17 UTC
Permalink
Hi Rafal,

Thanks for the thorough checking, it really helps.
Post by Rafal Luzynski
Technical issue: Please either attach your patch to the email
message or paste it inline, not both. The patch as it is now is not
applicable. I had to edit it manually to apply.
diff -uNr a/localedata/locales/C b/localedata/locales/C ---
a/localedata/locales/C 2018-10-11 15:10:12.000000000 +0000 +++
b/localedata/locales/C 2018-10-11 15:10:43.000000000 +0000
There is no such file. Where have you got the source code from?
Are you sure this is glibc? :-)
I was running my patch process against the Ubuntu 18.04 version of
localedata/locales. Now I have checked out the GitHub glibc source v2.28
and done the same. Please find the new patch attached. I am not
submitting it as a patch request because we have not yet addressed the
rest of your comments below. But at least this should be working as a
patch for you. Please let me know if there is any problem there still.
Post by Rafal Luzynski
[...] From this patch I have excluded locales that already mention
cyrillic or have a transliteration table for it: az_AZ
iso14651_t1_common ky_KG mn_MN sr_RS tg_TJ tk_TM tt_RU uk_UA uz_UZ
I confirm that these locales are excluded and there are no other
missing locales.
Because of the surprisingly different list of locales between Ubuntu and
glibc there is now a different list of excluded ones as well.

mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic
uk_UA

az_AZ, ky_KG are now included because they don't have cyrillic translit
in glibc. iso14651_t1_common is still implicitly excluded, because it
doesn't have 'translit_end' string.

Somehow az_AZ and tr_TR from glibc fail to transliterate Cyrillic even
after the patch applied (az_AZ is explicitly including tr_TR). I do not
see a reason, maybe you could check?
Post by Rafal Luzynski
Regarding the tests, I think there is no complete transliteration
test suite at the moment. Probably the only test is
localedata/bug-iconv-trans.c. You can also see the collation tests
placed in the same directory, they use those multiple *.UTF-8.in
files.
You can skip the tests for now.
In the copy of localedata/bug-iconv-trans.c lines 10-11 we could just
change the list of the symbols we are now transliterating

const char str[] = "ÄÀÖöÜÌß";
const char expected[] = "AEaeOEoeUEuess";

like this

const char str[] =
"ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ЀХЊЧКЩъЫьЭЮЯабвгЎежзОйклЌМПпрстуу́фхцчшщЪыЬэюяёђѓєѕіїјљњћќўџѪѫѲѳюѵҌҍ
ҐґҒғҔҕҖҗҚқҞҟҢңҀҥҊҧҚҩҪҫҬҭҮүҲҳҎҵҺһҌҜҟҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӀӥӊӧӚөӰӱӲӳӎӵӞӹ’"
const char expected[] =
"YODJG`YEZ`IYIJL`N`TSHK`U`DHABVGDEZHZIJKLMNOPRSTUU`FXCZCHSHSHHA`Y``E`YUYAabvgdezhzijklmnoprstuu`fxczchsh
shh``y``e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FHfhYHyhE`e`G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`
T`t`UuH`h`TCZtczSH`SH`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`
Y`y`'";

First I though they could just be added but not all locales
transliterate Umlauts so just extending the current test won't do as it
will fail for those locales.
Post by Rafal Luzynski
[...] diff -uNr a/localedata/locales/am_ET
b/localedata/locales/am_ET --- a/localedata/locales/am_ET
2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET
<U0060><U0039><U0030> <U137B> <U0060><U0031><U0030><U0030> <U137C>
<U0060><U0031><U0030><U0030><U0030><U0030> +include
"translit_cyrillic";"" translit_end % END LC_CTYPE
Shouldn't “include "translit_cyrillic";""” be placed before the
custom rules, together with other includes? The same in more files,
I will not mention them all.
If I recall correctly it is because of the
"translit_end
END LC_CTYPE"
part at the end of the translit_cyrillic. This way it works for any
locale, regardless whether it has translit itself or not. And being at
the end it does not supersede any previous transliteration that may be
there for a reason.

As with some other comments, I am not super familiar with the formats of
glibc files. So if you have a definitive suggestion - pls. formulate it
as an imperative, not a question.
Post by Rafal Luzynski
[...] +translit_start + +% CYRILLIC CAPITAL LETTER IO +<U0401>
<U00CB>;"<U0059><U004F>"
This says that for ASCII (GOST 7.79 System B) you would like to
transliterate "Ё" as "YO" but the table in Wikipedia says "Yo". I
understand that one or another may be correct depending on the
context but we should be consistent and also better let's stick with
the standard.
The choice for YO, SH, YA, ZH etc. is to avoid naming collisions for
example for "Сх" and "К" that would both transliterate to Sh:
With SH:"СхеЌа"->"Shema" but "КеЌа"->"SHema"
With Sh:"СхеЌа"->"Shema" and "КеЌа"->"Shema". Collision!
This is important e.g. for renaming files, grouping as in using uniq etc.
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER DJE +<U0402> <U0110>;"<U0044><U004A>"
This says "DJ" but System B does not mention it. Where does it come
from? Also, I think it should be "Dj" rather than "DJ".
I took the first two letters from its name.
Post by Rafal Luzynski
[...] +% CYRILLIC UNDEFINED +<U0423><U0301>
<U00DA>;"<U0055><U0060>"
1. I think it should be named "CYRILLIC CAPITAL LETTER U WITH ACUTE".
2. OK, the System A table mentions this letter but System B does not.
Somehow we should handle it. I think that "U`" is the best we can do
for now. 3. It must be tested whether this actually works.
1. Let's do it just before you are ready to commit the patch, because it
breaks formulas in my worksheet and I will have to do it manually?
3. I have tested and it doesn't work/gets ignored. But if you were to
handle COMBINING it would work, wouldn't it?
Post by Rafal Luzynski
[...] +% CYRILLIC CAPITAL LETTER HA +<U0425> <U0048>;<U0058>
I don't think that "H" is unavailable in any encoding therefore it
will always be transliterated as "H" and never as "X". We can't
help it and I don't think it is bad.
But we can keep this for when/if there is a way to explicitly request
transcription instead of transliteration.
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER TSE +<U0426> <U0043>;"<U0043><U005A>"
1. "CZ" - maybe should be "Cz"?> 2. Are we able to implement the
rule: "c before i, e, y, j"?
1. see for CYRILLIC CAPITAL LETTER IO
2. not sure what you are talking about in 2. but I believe it's not
possible as per Marko's email.
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER HARD SIGN +<U042A>
<U02BA>;"<U0041><U0060>"
"A`" is only for Bulgarian and should go to bg_BG. How should we
transliterate an upper case hard sign to plain ASCII? I think that
just "``", same as lower case.
This is to avoid collision. Besides AFAIK e.g. in Russian there is no
capital hard sign because there are no words starting with it.
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER YERU +<U042B> <U0059>;"<U0059><U0060>"
Again, as "Y" is always available it will never be transliterated as
"Y`".
But we can keep this for when/if there is a way to explicitly request
transcription instead of transliteration.


Bests,
Diego
Marko Myllynen
2018-10-15 11:04:52 UTC
Permalink
Hi,
Post by Egor Kobylkin
Post by Rafal Luzynski
Regarding the tests, I think there is no complete transliteration
test suite at the moment. Probably the only test is
localedata/bug-iconv-trans.c. You can also see the collation tests
placed in the same directory, they use those multiple *.UTF-8.in
files.
You can skip the tests for now.
First I though they could just be added but not all locales
transliterate Umlauts so just extending the current test won't do as it
will fail for those locales.
I still think a one-time check against uconv(1) (part of Unicode's ICU
project) for discrepancies.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] diff -uNr a/localedata/locales/am_ET
b/localedata/locales/am_ET --- a/localedata/locales/am_ET
2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET
<U0060><U0039><U0030> <U137B> <U0060><U0031><U0030><U0030> <U137C>
<U0060><U0031><U0030><U0030><U0030><U0030> +include
"translit_cyrillic";"" translit_end % END LC_CTYPE
Shouldn't “include "translit_cyrillic";""” be placed before the
custom rules, together with other includes? The same in more files,
I will not mention them all.
If I recall correctly it is because of the
"translit_end
END LC_CTYPE"
part at the end of the translit_cyrillic. This way it works for any
locale, regardless whether it has translit itself or not. And being at
the end it does not supersede any previous transliteration that may be
there for a reason.
I suspect one problem would be that the latter rule wins, so if there
are some locale-specific rules than possible translit_* inclusions would
override them if not included before the locale-specific rules.

Cheers,
--
Marko Myllynen
Egor Kobylkin
2018-10-15 11:54:53 UTC
Permalink
Post by Marko Myllynen
Hi,
Post by Egor Kobylkin
Post by Rafal Luzynski
Regarding the tests, I think there is no complete transliteration
test suite at the moment. Probably the only test is
localedata/bug-iconv-trans.c. You can also see the collation tests
placed in the same directory, they use those multiple *.UTF-8.in
files.
You can skip the tests for now.
First I though they could just be added but not all locales
transliterate Umlauts so just extending the current test won't do as it
will fail for those locales.
I still think a one-time check against uconv(1) (part of Unicode's ICU
project) for discrepancies.
Just an addition. I have changes a few constants to see whether
localedata/bug-iconv-trans.c could be made to test cyrillic. Attached is
the bug-iconv-trans-cyr.c that goes through in this form. I had to save
it as UTF-8 instead of ISO-8859-15 for localedata/bug-iconv-trans.c.
Post by Marko Myllynen
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] diff -uNr a/localedata/locales/am_ET
b/localedata/locales/am_ET --- a/localedata/locales/am_ET
2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET
<U0060><U0039><U0030> <U137B> <U0060><U0031><U0030><U0030> <U137C>
<U0060><U0031><U0030><U0030><U0030><U0030> +include
"translit_cyrillic";"" translit_end % END LC_CTYPE
Shouldn't “include "translit_cyrillic";""” be placed before the
custom rules, together with other includes? The same in more files,
I will not mention them all.
If I recall correctly it is because of the
"translit_end
END LC_CTYPE"
part at the end of the translit_cyrillic. This way it works for any
locale, regardless whether it has translit itself or not. And being at
the end it does not supersede any previous transliteration that may be
there for a reason.
I suspect one problem would be that the latter rule wins, so if there
are some locale-specific rules than possible translit_* inclusions would
override them if not included before the locale-specific rules.
What is the best way forward here? Can somebody make an explicit
suggestion on how to change the current approach if needed?

Bests,
Egor
Rafal Luzynski
2018-10-23 23:08:33 UTC
Permalink
Hi Egor,

Thank you for your updates and again I'm sorry for my delayed response.
A general remark about this: if you are in a hurry and you need the
corrected transliteration rules for yourself or for your users then
you don't have to wait for the patch to be reviewed and accepted here.
You can make your own locale and use it, you don't need to rebuild glibc,
you don't even need root privileges to do it. The locale data subsystem
is designed to allow users create and use their own locales.

I have seen and tested locally your newer patch [1] but I will reply
in this thread because I think it is easier to reply in context.

I would like to summarize the differences between v5 [2] and v6 to make
sure that I noticed them all and that you have not introduced any changes
inadvertently. (Yes, that means I have skipped another patch which you
sent between those two.)

* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* You consequently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".

Again I must say that I experienced lots of technical difficulties to apply
the patch and I had to rework it manually because it is not applicable as
Post by Egor Kobylkin
Hi Rafal,
Thanks for the thorough checking, it really helps.
Post by Rafal Luzynski
Technical issue: Please either attach your patch to the email
message or paste it inline, not both. The patch as it is now is not
applicable. I had to edit it manually to apply.
diff -uNr a/localedata/locales/C b/localedata/locales/C ---
a/localedata/locales/C 2018-10-11 15:10:12.000000000 +0000 +++
b/localedata/locales/C 2018-10-11 15:10:43.000000000 +0000
There is no such file. Where have you got the source code from?
Are you sure this is glibc? :-)
I was running my patch process against the Ubuntu 18.04 version of
localedata/locales. Now I have checked out the GitHub glibc source v2.28
and done the same. [...]
Remarks:

* Please use the repository at https://sourceware.org/git/?p=glibc.git
rather than a copy at GitHub.
* Please use the master branch rather than 2.28.
* Commit your work locally.
* Use "git format-patch" (e.g., "git format-patch HEAD^..HEAD") to generate
the patch, then you can email it to this list.
* You can email it inline or, if your email client breaks the lines and
inserts
other unnecessary characters, send as an attachment.
* Use "git pull --rebase" to keep your work up to date.
* Read the Contribution Checklist [3] for more details.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] From this patch I have excluded locales that already mention
cyrillic or have a transliteration table for it: az_AZ
iso14651_t1_common ky_KG mn_MN sr_RS tg_TJ tk_TM tt_RU uk_UA uz_UZ
I confirm that these locales are excluded and there are no other
missing locales.
Because of the surprisingly different list of locales between Ubuntu and
glibc there is now a different list of excluded ones as well.
mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
uk_UA
az_AZ, ky_KG are now included
As far as I can see, there are no other differences between those two
patches.
Post by Egor Kobylkin
because they don't have cyrillic translit
in glibc. iso14651_t1_common is still implicitly excluded, because it
doesn't have 'translit_end' string.
Somehow az_AZ and tr_TR from glibc fail to transliterate Cyrillic even
after the patch applied (az_AZ is explicitly including tr_TR). I do not
see a reason, maybe you could check?
I noticed that az_AZ does not build at all, localedef program reports
a "circular dependency" (if I recall correctly). I think that since az_AZ
contains “copy "tr_TR"” and tr_TR already contains (in your patch)
“include "translit_cyrillic";""” you should just remove
“include "translit_cyrillic";""” from az_AZ which effectively means that
there are no changes in az_AZ. Optionally, you can add a comment to az_AZ
to explain why it does not contain “include "translit_cyrillic";""” and to
make sure that if anyone removes “copy "tr_TR"” ever in the future, the
“include "translit_cyrillic";""” will be added at the same time. I have
verified that removing that line makes the locale data build without an
error but I have not yet verified that they work as expected.
Post by Egor Kobylkin
Post by Rafal Luzynski
Regarding the tests, I think there is no complete transliteration
test suite at the moment. Probably the only test is
localedata/bug-iconv-trans.c. You can also see the collation tests
placed in the same directory, they use those multiple *.UTF-8.in
files.
You can skip the tests for now.
In the copy of localedata/bug-iconv-trans.c lines 10-11 we could just
change the list of the symbols we are now transliterating
const char str[] = "ÄäÖöÜüß";
const char expected[] = "AEaeOEoeUEuess";
like this
const char str[] =
"ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩъЫьЭЮЯабвгдежзийклмнопрстуу́фхцчшщЪыЬэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍ
ҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’"
const char expected[] =
"YODJG`YEZ`IYIJL`N`TSHK`U`DHABVGDEZHZIJKLMNOPRSTUU`FXCZCHSHSHHA`Y``E`YUYAabvgdezhzijklmnoprstuu`fxczchsh
shh``y``e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FHfhYHyhE`e`G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`
T`t`UuH`h`TCZtczSH`SH`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`
Y`y`'";
First I though they could just be added but not all locales
transliterate Umlauts so just extending the current test won't do as it
will fail for those locales.
I noticed that you pasted a patch in a Bugzilla comment. [4] If I understand
correctly you suggest to rework the existing test case to test Cyrillic
transliteration instead of German. Please don't do it: the existing test
cases may be extended but must not be removed. I think we should rework
this
test case to handle multiple locales and multiple transliteration pairs;
optionally we can add a new case instead. Currently I lean into reworking
the existing test case.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] diff -uNr a/localedata/locales/am_ET
b/localedata/locales/am_ET --- a/localedata/locales/am_ET
2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET
<U0060><U0039><U0030> <U137B> <U0060><U0031><U0030><U0030> <U137C>
<U0060><U0031><U0030><U0030><U0030><U0030> +include
"translit_cyrillic";"" translit_end % END LC_CTYPE
Shouldn't “include "translit_cyrillic";""” be placed before the
custom rules, together with other includes? The same in more files,
I will not mention them all.
If I recall correctly it is because of the
"translit_end
END LC_CTYPE"
part at the end of the translit_cyrillic. This way it works for any
locale, regardless whether it has translit itself or not. And being at
the end it does not supersede any previous transliteration that may be
there for a reason.
As with some other comments, I am not super familiar with the formats of
glibc files. So if you have a definitive suggestion - pls. formulate it
as an imperative, not a question.
I feel like a newcomer here so it was meant to be a question to other
more experienced maintainers but probably it's time to change this attitude.
So, also taking into account what Marko wrote, [5] please put the include
directive after all other include directives, or after the "translit_start"
directive if there are no other includes, rather than putting it just before
"translit_end". Even if putting it at the dnd works sometimes or even
always.
Same as you put #include's near top of the file when writing a C program
even
if sometimes you may put it anywhere and it will work. If you use a script
to insert your include directives then please rework it, if you insert them
manually then just move them manually.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] +translit_start + +% CYRILLIC CAPITAL LETTER IO +<U0401>
<U00CB>;"<U0059><U004F>"
This says that for ASCII (GOST 7.79 System B) you would like to
transliterate "Ё" as "YO" but the table in Wikipedia says "Yo". I
understand that one or another may be correct depending on the
context but we should be consistent and also better let's stick with
the standard.
The choice for YO, SH, YA, ZH etc. is to avoid naming collisions for
With SH:"Схема"->"Shema" but "Шема"->"SHema"
With Sh:"Схема"->"Shema" and "Шема"->"Shema". Collision!
This is important e.g. for renaming files, grouping as in using uniq etc.
I understand this idea. Is this part of any existing standard? I can't
see it regulated by GOST 7.79.

I'd rather not include the transliteration rules which seems reasonable to
us (the developers) but are not known and therefore not acceptable by the
outer world.
Post by Egor Kobylkin
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER DJE +<U0402> <U0110>;"<U0044><U004A>"
This says "DJ" but System B does not mention it. Where does it come
from? Also, I think it should be "Dj" rather than "DJ".
I took the first two letters from its name.
As I said previously, I would like to add more Cyrillic letters even if
they are not regulated by any standard. But let's separate them and make
it clear that these rules are based on GOST 7.79 and those are our own
invention (or come from other standard etc.) I think that all these
rules may even be in the same file but in different parts of it.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] +% CYRILLIC UNDEFINED +<U0423><U0301>
<U00DA>;"<U0055><U0060>"
1. I think it should be named "CYRILLIC CAPITAL LETTER U WITH ACUTE".
2. OK, the System A table mentions this letter but System B does not.
Somehow we should handle it. I think that "U`" is the best we can do
for now. 3. It must be tested whether this actually works.
1. Let's do it just before you are ready to commit the patch, because it
breaks formulas in my worksheet and I will have to do it manually?
3. I have tested and it doesn't work/gets ignored. But if you were to
handle COMBINING it would work, wouldn't it?
My guess is that since translit_combining just removes all those combining
diacritic characters and translit_combining is usually included before
translit_cyrillic then <U0301> is removed even before <U0423> is taken
into account. Also my another guess is that it might work good if you
just removed this rule: <U0423> would be translated to "U" and <U0301>
would remain unchanged and eventually those two characters would produce
"Ú". But, again, that's just a guess, I have not tested.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...] +% CYRILLIC CAPITAL LETTER HA +<U0425> <U0048>;<U0058>
I don't think that "H" is unavailable in any encoding therefore it
will always be transliterated as "H" and never as "X". We can't
help it and I don't think it is bad.
But we can keep this for when/if there is a way to explicitly request
transcription instead of transliteration.
Note that either it will make the test cases fail or we will have to
prepare the test cases deliberately skip the translation of <U0425>
into "X" because "H" will be always working. We can't force iconv
to choose the second transliteration rule if the first one works.

That means we will have a problem to construct the test cases.
Post by Egor Kobylkin
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER TSE +<U0426> <U0043>;"<U0043><U005A>"
1. "CZ" - maybe should be "Cz"?> 2. Are we able to implement the
rule: "c before i, e, y, j"?
1. see for CYRILLIC CAPITAL LETTER IO
2. not sure what you are talking about in 2. but I believe it's not
possible as per Marko's email.
Hm... I can't find a good example now. Maybe I was mislead by the rules
of Cyrillic transliteration which I learned at school and which are not
necessarily universal and not necessarily useful for English readers.
Post by Egor Kobylkin
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER HARD SIGN +<U042A>
<U02BA>;"<U0041><U0060>"
"A`" is only for Bulgarian and should go to bg_BG. How should we
transliterate an upper case hard sign to plain ASCII? I think that
just "``", same as lower case.
This is to avoid collision.
What collision?
Post by Egor Kobylkin
Besides AFAIK e.g. in Russian there is no
capital hard sign because there are no words starting with it.
True but it can be used in ALL UPPERCASE text. Therefore we need a clear
and correct transliteration rule for it.
Post by Egor Kobylkin
Post by Rafal Luzynski
+% CYRILLIC CAPITAL LETTER YERU +<U042B> <U0059>;"<U0059><U0060>"
Again, as "Y" is always available it will never be transliterated as
"Y`".
But we can keep this for when/if there is a way to explicitly request
transcription instead of transliteration.
Again, it will be difficult or impossible to construct a correct test case
and we must be aware of this.

Regards,

Rafal


[1] https://sourceware.org/ml/libc-alpha/2018-10/msg00300.html
[2] https://sourceware.org/ml/libc-alpha/2018-10/msg00213.html
[3] https://sourceware.org/glibc/wiki/Contribution%20checklist
[4] https://sourceware.org/bugzilla/show_bug.cgi?id=2872#c47
[5] https://sourceware.org/ml/libc-alpha/2018-10/msg00232.html
Egor Kobylkin
2018-10-17 14:16:32 UTC
Permalink
Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11340 [7]

to localedata/locales/ and include it in all your locales going forward.

The patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:

mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic
uk_UA

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that have a
translit_start/end stance and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11340
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-10-17 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/aa_DJ: Add 'include "translit_cyrillic";""' to
LC_CTYPE translit section.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/az_AZ: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.
Egor Kobylkin
2018-11-01 22:51:55 UTC
Permalink
Changelog v7:
* Generated against git://sourceware.org/git/glibc.git master with git
format-patch.
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.


Changelog v6:
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* Consistently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".

Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11340 [7]

to localedata/locales/ and include it in all your locales going forward.

The patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:

mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic
uk_UA

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that already have
'include .*translit.*;""' string and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11340
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-10-17 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/aa_DJ: Add 'include "translit_cyrillic";""' to
LC_CTYPE translit section.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.
Egor Kobylkin
2018-11-02 00:00:26 UTC
Permalink
Changelog v8:
* Re-added missing translit_cyrillic in patch v7 (due to missing "git
add" in the script).

Changelog v7:
* Generated against git://sourceware.org/git/glibc.git master with git
format-patch.
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.


Changelog v6:
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* Consistently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".

Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11340 [7]

to localedata/locales/ and include it in all your locales going forward.

The patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:

mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic
uk_UA

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that already have
'include .*translit.*;""' string and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11340
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin

---
2018-11-02 Egor Kobylkin <***@kobylkin.com>

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic to
Latin/ASCII.
* localedata/locales/aa_DJ: Add 'include "translit_cyrillic";""' to
LC_CTYPE translit section.
* localedata/locales/af_ZA: Likewise.
* localedata/locales/ak_GH: Likewise.
* localedata/locales/am_ET: Likewise.
* localedata/locales/ar_EG: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/bem_ZM: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/bi_VU: Likewise.
* localedata/locales/bn_BD: Likewise.
* localedata/locales/bo_CN: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/ce_RU: Likewise.
* localedata/locales/cmn_TW: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/de_DE: Likewise.
* localedata/locales/dv_MV: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/el_GR: Likewise.
* localedata/locales/en_GB: Likewise.
* localedata/locales/en_NG: Likewise.
* localedata/locales/en_ZM: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/ff_SN: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fr_FR: Likewise.
* localedata/locales/ga_IE: Likewise.
* localedata/locales/gd_GB: Likewise.
* localedata/locales/gu_IN: Likewise.
* localedata/locales/gv_GB: Likewise.
* localedata/locales/he_IL: Likewise.
* localedata/locales/hi_IN: Likewise.
* localedata/locales/hif_FJ: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/ht_HT: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/hy_AM: Likewise.
* localedata/locales/id_ID: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/it_IT: Likewise.
* localedata/locales/ja_JP: Likewise.
* localedata/locales/kab_DZ: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/km_KH: Likewise.
* localedata/locales/kn_IN: Likewise.
* localedata/locales/ko_KR: Likewise.
* localedata/locales/ks_IN: Likewise.
* localedata/locales/kw_GB: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/lb_LU: Likewise.
* localedata/locales/lg_UG: Likewise.
* localedata/locales/lij_IT: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lo_LA: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mg_MG: Likewise.
* localedata/locales/mhr_RU: Likewise.
* localedata/locales/mk_MK: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/ms_MY: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/***@latin: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/ne_NP: Likewise.
* localedata/locales/nhn_MX: Likewise.
* localedata/locales/niu_NU: Likewise.
* localedata/locales/niu_NZ: Likewise.
* localedata/locales/nl_NL: Likewise.
* localedata/locales/nr_ZA: Likewise.
* localedata/locales/oc_FR: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/or_IN: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pa_IN: Likewise.
* localedata/locales/pa_PK: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/pt_PT: Likewise.
* localedata/locales/quz_PE: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/rw_RW: Likewise.
* localedata/locales/sa_IN: Likewise.
* localedata/locales/sd_IN: Likewise.
* localedata/locales/***@devanagari: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/sgs_LT: Likewise.
* localedata/locales/shn_MM: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sk_SK: Likewise.
* localedata/locales/sl_SI: Likewise.
* localedata/locales/sm_WS: Likewise.
* localedata/locales/so_SO: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/ss_ZA: Likewise.
* localedata/locales/st_ZA: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/sw_KE: Likewise.
* localedata/locales/ta_IN: Likewise.
* localedata/locales/te_IN: Likewise.
* localedata/locales/th_TH: Likewise.
* localedata/locales/ti_ET: Likewise.
* localedata/locales/tn_ZA: Likewise.
* localedata/locales/to_TO: Likewise.
* localedata/locales/tpi_PG: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/ts_ZA: Likewise.
* localedata/locales/unm_US: Likewise.
* localedata/locales/ur_IN: Likewise.
* localedata/locales/ur_PK: Likewise.
* localedata/locales/ve_ZA: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/wa_BE: Likewise.
* localedata/locales/wo_SN: Likewise.
* localedata/locales/xh_ZA: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yuw_PG: Likewise.
* localedata/locales/zh_CN: Likewise.
* localedata/locales/zu_ZA: Likewise.
Rafal Luzynski
2018-11-02 22:22:08 UTC
Permalink
Hi Egor,

I have applied your patch locally and I am going to start reviewing it.
I can tell you already that it applies correctly but git reports these
warnings:

Applying: v8 Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
.git/rebase-apply/patch:1520: trailing whitespace.
% i.e. [U0401-U04F9, U2019] but only the letters covered by ISO 9.1995
.git/rebase-apply/patch:1521: trailing whitespace.
% It implements the GOST_7.79 System A (Latin Script) as a first
.git/rebase-apply/patch:1523: trailing whitespace.
% https://en.wikipedia.org/wiki/ISO_9 for reference.
.git/rebase-apply/patch:1524: trailing whitespace.
% The System B is extended from GOST_7.79-Russian using open sources
.git/rebase-apply/patch:1535: trailing whitespace.
% Generated from UnicodeData.txt with a spreadsheet referenced
warning: 5 lines add whitespace errors.

Also the commit message is missing from your patch because probably it is
missing from your local repository. Please re-add it and please remember
that it must contain a summary like this:

[BZ #2872]
* localedata/locales/translit_cyrillic: Add ISO 9.1995, GOST 7.79
System A transliteration System B transcription table from Cyrillic
to Latin/ASCII.
* localedata/locales/aa_DJ: Add 'include "translit_cyrillic";""' to
LC_CTYPE translit section.
* localedata/locales/af_ZA: Likewise.

Hm... as I look at this now I think it should rather be:

[BZ #2872]
* localedata/locales/translit_cyrillic: New file.
* localedata/locales/aa_DJ (LC_CTYPE): Add
“'include "translit_cyrillic";""”
* localedata/locales/af_ZA (LC_CTYPE): Likewise.

... and so on. Optionally you can use:

* localedata/locales/translit_cyrillic: New file. Supports
ISO 9.1995, GOST 7.79 System A transliteration System B
transcription table from Cyrillic to Latin/ASCII.

I will appreciate more hints about how to write the ChangeLog entry
correctly
from more experienced maintainers.
Post by Egor Kobylkin
[...]
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
I confirm that this is the only relevant difference between v6 and v8.
Post by Egor Kobylkin
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
Has this list changed, that is, has any locale been added or removed?
Post by Egor Kobylkin
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.
True, this is another difference and I hope this is correct (I have not
yet tested).
Post by Egor Kobylkin
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
Correct.
Post by Egor Kobylkin
* Consistently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".
I think you have not yet explained whether this is required by any existing
standard (please provide links) or whether this is your genuine idea to
distinguish between the cases like "Ш" transliterated to "Sh" and "Сх"
also transliterated to "Sh".

Again, I have not yet started reviewing and testing, this is just a feedback
after applying the patch locally.

Regards,

Rafal
Egor Kobylkin
2018-11-02 23:27:10 UTC
Permalink
Moving everybody from To: and CC: on BCC. It seems at this stage it is
Rafal and me. It is still going to libc-alpha and libc-locales. If you
are interested to be put back on CC - please let me know.
Post by Rafal Luzynski
* Consistently transliterate single uppercase Cyrillic letters to
sequences of all uppercase Latin letters in all languages
(whenever a Cyrillic letter is transliterated to more than one
Latin letter), for example "Ї" is now transliterated as "YI" rather
than "Yi".
I think you have not yet explained whether this is required by any
existing standard (please provide links) or whether this is your
genuine idea to distinguish between the cases like "Ш" transliterated > to "Sh" and
"Сх" also transliterated to "Sh".

I remember seeing this form of the capitalization it in actual
transliterated texts long time ago but can't find a formal description
as of now. Just don't want to claim this to be my original idea.
Post by Rafal Luzynski
The choice for YO, SH, YA, ZH etc. is to avoid naming collisions for
With SH:"Схема"->"Shema" but "Шема"->"SHema"
With Sh:"Схема"->"Shema" and "Шема"->"Shema". Collision!
This is important e.g. for renaming files, grouping as in using uniq >> etc.
As for the users - I am a user and I have demonstrated the use cases
where the collisions due to "one symbol capitalization" would cause
irreversible damage to data. For a library like glibc this seems like a
relevant issue to consider.

The "two symbol capitalization" on the other hand would prevent
collision and can be easily corrected in the userspace if needed
with something like

foo="SHema"
foo="${foo:0:1}$(tr '[:upper:]' '[:lower:]' <<<${foo:1})"
echo "$foo"
Shema

It looks like everyone really using transliteration for something
sensitive already have done it the userspace since at least 2006 when
this bug was first logged. So we won't brake the official use cases
where the capitalization should be done in a certain way. But we will
prevent new bugs due to collision if we use "two symbol capitalization"
indeed.

Happy to hear arguments to the contrary.

Bests,
Egor Kobylkin
Egor Kobylkin
2018-11-14 21:25:42 UTC
Permalink
Changelog v9:
* Fixed formatting (trailing spaces etc.)
* Put commit summary in the patch file, now it is generated completely
by git format-patch

Changelog v8:
* Re-added missing translit_cyrillic in patch v7 (due to missing "git
add" in the script).

Changelog v7:
* Generated against git://sourceware.org/git/glibc.git master with git
format-patch.
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.

Changelog v6:
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* Consistently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".

Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration table translit_cyrillic file

https://sourceware.org/bugzilla/attachment.cgi?id=11340 [7]

to localedata/locales/ and include it in all your locales going forward.

The patch included inline below.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:

mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic
uk_UA

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Examples: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription and iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8 will produce Latin transliteration as per
ISO 9.1995.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is attached as a file translit_cyrillic
[7]. Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 official source (Federal Agency on Technical
Regulating and Metrology Of Russian Federation [2]). Technically an
independent but mostly identical source [3] was used and prepared in a
spreadsheet [6].

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically I have searched for all locales that already have
'include .*translit.*;""' string and generated a patch for them.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11301
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?id=11340
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A

Best regards,
Egor Kobylkin
Rafal Luzynski
2018-11-16 22:17:27 UTC
Permalink
Thank you for working on this, Egor.

Before I start reviewing I would like to summarize the things which
I think are blocking for this patch.

1. I think we need tests for transliteration. Currently there is only
one test program which is similar to what we need,
localedata/bug-iconv-trans.c. It is old and it is not quite clear
what bug it is trying to test. Therefore I think we need a new
framework to test transliteration. Is it a good idea to base the
test on the iconv(1) command line utility which is part of glibc?

2. I made few tests in the command line and it seems to me that the
transliteration from "З" to "Z" (+ lowercase as well) in uk_UA does
not work and has not been working for some time already because
I've checked some older systems as well and the result is always
the same. I think that the reason is that uk_UA defines multiple
transliteration rules for "З" depending on what is the letter following
it. It does not seem to work. AFAIK the reason is that the syntax of
transliteration rules says that a single non-Latin character may map
one or more Latin strings, each consisting of one or more characters.
There cannot be a rule transliterating multiple source characters into
one or multiple destination characters. Is it a bug in transliteration
implementation? Or maybe in the specification, including POSIX standard?
The definition of transliteration says that it is one-to-one mapping
of graphemes while a grapheme may be one or multiple characters.
It does not have to be always mapping one-to-one character. Should we
fix this bug first, make uk_UA transliteration work, and only then
add a generic Cyrillic transliteration? Egor's patch already contains
transliteration of "У" + combining acute accent to "Ú" which most
probably
will not work.

I still think that in the longer term all existing custom transliterations
of Cyrillic alphabets should be ported to a modification of your patch.

Egor, while at this I was thinking about your idea to transliterate letters
like "Ш" (uppercase) to "SH" (always uppercase) in order to distinguish
between "Шема" (-> "SHema") and "Схема" (-> "Shema" or "Sxema"). Also
you include a rule to transliterate "Х" to "H" or "X" depending on which
destination characters are available, which I told you already that will
not work because both "H" and "X" are always available and therefore only
the first rule will always be used. I still don't like the idea to
put two uppercase letters in a beginning of a word in titlecase only to
indicate that there was originally a single letter. What if we:

* drop the rule of transliterating "Х" to "H" and transliterate always to
"X",
* transliterate uppercase "Ш" to "Sh" (so it will work fine for titlecase
words)?

As a result the Latin letter "h" will only appear as part of a digraph and
never as a transliteration of "Х" and therefore will never cause a conflict.
Examples:

* "Шема" -> "Shema",
* "Схема" -> "Sxema".

Will this solve the problem?

Regards,

Rafal
Egor Kobylkin
2018-11-17 18:34:45 UTC
Permalink
Hi Rafal,
thanks for putting it into a clear issue statement on SH/Sh problem. I'm
totally with you on this being a good thing to discuss. It is orthogonal
to the tests so let me focus on SH/Sh and System A/B problematic here.

Looks like we have three issues:
1. lack of explicit control which transformation to use (System A or
System B) via //TRANSLIT
2. possibility of collision for System B if used CAP/low transcription
for capital letters
3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per
System B because it's equivalent 'X'/'x' from System A is always present
and takes precedence.

As a solution shouldn't we only keep System B in a new file
transcribe_cyrillic and put it in place as the explicit ASCII
transcription for targeted locales (as opposed to transliteration)?

We would keep System A as translit_cyrillic but won't include it into
this patch. Once you have resolved an issue of having two conflicting
rule-sets but only one key //TRANSLIT you could add the System A back.

The SH/Sh can be decided on either way - seems like an easy change any way.
Post by Rafal Luzynski
Egor, while at this I was thinking about your idea to transliterate
letters like "Ш" (uppercase) to "SH" (always uppercase) in order to
distinguish between "Шема" (-> "SHema") and "Схема" (-> "Shema" or
"Sxema").
to clarify, this SH/Sh collision issue relates only to iconv -f UTF-8 -t
ASCII//TRANSLIT (i.e. System B transcription).
But it's not only SH/Sh, there are following combinations used to
transcribe capital letters:

YO, DJ, YE, TSH, DH, ZH, CZ, CH, SH, SHH, YU, YA, FH, YH, GH, NG, TCZ

Arguably any of them (if not in that CAP/CAP form) could collide with
their CAP/low equivalent from a different word. (there may be language
grammar rules that in fact prevent some but we don't know for sure)

With transcription we are basically striping information from the data,
mapping it into a smaller character set. The idea to keep them in
CAP/CAP is to try to preserve as much information as possible.
Post by Rafal Luzynski
Also you include a rule to transliterate "Х" to "H" or "X" depending
on which destination characters are available, which I told you
already that will not work because both "H" and "X" are always
available and therefore only the first rule will always be used.
Just to have this here for reference, the idea was to have both rules in
one file so

iconv -f UTF-8 -t ASCII//TRANSLIT
will produce ASCII compatible _transcription_ (System B)

iconv -f UTF-8 -t ISO-8859-15//TRANSLIT |
iconv -f ISO-8859-15 -t UTF-8
will produce Latin _transliteration_ as per ISO 9.1995. (System A)

So in fact we have two rules for each letter in the same file (System A
and System B), where System A takes precedence.

I have a question then: isn't this more like a hack than a right thing
to do?

Shouldn't we have two explicit rules for transcription and
transliteration not dependent on a destination character set?
Post by Rafal Luzynski
I still don't like the idea to
put two uppercase letters in a beginning of a word in titlecase only
* drop the rule of transliterating "Х" to "H" and transliterate
always to "X",
This would contradict ISO 9.1995. (System A).
System A was added on Marko's request (so setting him on TO:) I am
neutral on keeping it or dropping it, just to be clear.
Post by Rafal Luzynski
* transliterate uppercase "Ш" to "Sh" (so it will work fine for
titlecase words)?
As a result the Latin letter "h" will only appear as part of a
digraph and never as a transliteration of "Х" and therefore will
* "Шема" -> "Shema", * "Схема" -> "Sxema".
Will this solve the problem?
This particular rule with h/x would make sense it's own.
But again - it would contradict the standards.
On the other hand, for my personal needs I care less about standards but
about current functionality and data loss because of missing
transcription altogether due to the BZ #2872.

Bests,
Egor
Marko Myllynen
2018-11-19 07:13:55 UTC
Permalink
Hi,
Post by Egor Kobylkin
1. lack of explicit control which transformation to use (System A or
System B) via //TRANSLIT
2. possibility of collision for System B if used CAP/low transcription
for capital letters
3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per
System B because it's equivalent 'X'/'x' from System A is always present
and takes precedence.
As a solution shouldn't we only keep System B in a new file
transcribe_cyrillic and put it in place as the explicit ASCII
transcription for targeted locales (as opposed to transliteration)?
We would keep System A as translit_cyrillic but won't include it into
this patch. Once you have resolved an issue of having two conflicting
rule-sets but only one key //TRANSLIT you could add the System A back.
The SH/Sh can be decided on either way - seems like an easy change any way.
I have a question then: isn't this more like a hack than a right thing
to do?
Shouldn't we have two explicit rules for transcription and
transliteration not dependent on a destination character set?
This would contradict ISO 9.1995. (System A).
System A was added on Marko's request (so setting him on TO:) I am
neutral on keeping it or dropping it, just to be clear.
This particular rule with h/x would make sense it's own.
But again - it would contradict the standards.
On the other hand, for my personal needs I care less about standards but
about current functionality and data loss because of missing
transcription altogether due to the BZ #2872.
Given the amount of questions above I think the way forward is to try
follow the relevant standards as closely as possible and also check what
the other implementations (i.e., uconv(1)) do. For example, checking the
case earlier mentioned case may or may not give some hints:

$ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Šema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Shema
$ uconv -V
uconv v2.1 ICU 50.1.2

Thanks,
--
Marko Myllynen
Egor Kobylkin
2018-11-19 09:21:55 UTC
Permalink
Post by Marko Myllynen
Hi,
Post by Egor Kobylkin
Shouldn't we have two explicit rules for transcription and
transliteration not dependent on a destination character set?
This would contradict ISO 9.1995. (System A).
System A was added on Marko's request (so setting him on TO:) I am
neutral on keeping it or dropping it, just to be clear.
This particular rule with h/x would make sense it's own.
But again - it would contradict the standards.
On the other hand, for my personal needs I care less about standards but
about current functionality and data loss because of missing
transcription altogether due to the BZ #2872.
Given the amount of questions above I think the way forward is to try
follow the relevant standards as closely as possible and also check what
the other implementations (i.e., uconv(1)) do. For example, checking the
$ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Šema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Shema
$ uconv -V
uconv v2.1 ICU 50.1.2
Marko,

Your example only covers _tansliteration_ to Latin Diacritics
iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
| iconv -f ISO-8859-15 -t UTF-8

while BZ #2872 is about _transcription_ to ASCII
iconv -f UTF-8 -t ASCII//TRANSLIT

The glibc wiki explicitly lists this use case (ASCII) as the test
example https://sourceware.org/glibc/wiki/Locales#Testing_Locales

So again, you are asking to have ISO 9.1995. System A but the bug is
about ISO 9.1995. System B (GOST 7.79-2000)


Bests,
Egor
Marko Myllynen
2018-11-19 19:35:58 UTC
Permalink
Hi,
Post by Egor Kobylkin
Your example only covers _tansliteration_ to Latin Diacritics
iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
| iconv -f ISO-8859-15 -t UTF-8
while BZ #2872 is about _transcription_ to ASCII
iconv -f UTF-8 -t ASCII//TRANSLIT
AFAICS v9 (unlike v10) supported both of the above cases.
Post by Egor Kobylkin
The glibc wiki explicitly lists this use case (ASCII) as the test
example https://sourceware.org/glibc/wiki/Locales#Testing_Locales
I wrote that section and I certainly wasn't considering Cyrillic aspects
at that time (IIRC it was written even before Mike did the major update
for transliteration rules at the end of 2015). The context back then was
mostly about handling Latin letters like Å, Ä, Ö, Ø, etc.
Post by Egor Kobylkin
So again, you are asking to have ISO 9.1995. System A but the bug is
about ISO 9.1995. System B (GOST 7.79-2000)
We certainly can decide here what's the best course of action, we do not
have to slavishly follow some old bug report when deciding the direction
for the implementation. But I think I've made my position clear by now
so I'm not going to repeat it anymore.

In any case once your patch lands I'm going to submit a follow-up patch
for fi_FI to make it compliant with the applicable national standard
(SFS 4900) which defines how to do Cyrillic transliteration /
transcription in the context Finnish.

Thanks,
--
Marko Myllynen
Rafal Luzynski
2018-12-01 22:07:19 UTC
Permalink
Post by Marko Myllynen
[...]
Given the amount of questions above I think the way forward is to try
follow the relevant standards as closely as possible and also check what
the other implementations (i.e., uconv(1)) do. For example, checking the
$ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Šema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Shema
$ uconv -V
uconv v2.1 ICU 50.1.2
I've played a little with uconv and unfortunately it does not look good
to me.

It does not have any fallback transliteration to plain ASCII. When it says
that 'Ш' is transliterated to 'Š' then it always uses 'Š' and if the target
charset does not have this character then crashes:

$ echo Шема | uconv -f UTF-8 -t ASCII -x cyrillic-latin
Conversion from Unicode to codepage failed at output byte position 0.
Unicode: 0160 Error: Invalid character found
$ echo Шема | uconv -f UTF-8 -t ISO-8859-1 -x cyrillic-latin
Conversion from Unicode to codepage failed at output byte position 0.
Unicode: 0160 Error: Invalid character found
$ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin
�ema
$ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin | uconv -f
ISO-8859-2 -t UTF-8
Šema

It seems to follow ISO 9 (GOST 7.79) System A. However, the transliteration
of the hard sign is rather strange:

$ echo нъе | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
nʺe

The above was correct but:

$ echo НЪЕ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Nʺ̱E
$ echo Ъ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
ʺ̱
$ echo Ъ | uconv -f UTF-8 -t UTF-16 -x cyrillic-latin| hexdump -x
0000000 feff 02ba 0331 000a
0000008

So this generates:
02BA MODIFIER LETTER DOUBLE PRIME
0331 COMBINING MACRON BELOW

There is are more transliteration methods, for example Russian-Latin/BGN:

$ echo Шема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
Shema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
Skhema

Converting 'х' to 'kh' seems to be common in English transliteration but
it does not follow any ISO standard.

$ echo ХА ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
KHA kha

This means that the choice whether a digraph in the output should be
all uppercase or maybe upper+lower is context based, something which we
probably cannot implement. But definitely a good thing.

Two more tests:

$ echo Ещё | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
Yeshchë
$ echo Ещё | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN
Conversion from Unicode to codepage failed at output byte position 6.
Unicode: 00eb Error: Invalid character found

So the output is not plain ASCII.

$ echo е же ле не | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN
ye zhe le ne

Again this means that transliteration of 'е' is context based:
it is 'ye' in the beginning of a word and 'e' otherwise.

The version which I've tested:

$ uconv -V
uconv v2.1 ICU 60.2

It seems that uconv will not be a good hint about transliterating
to plain ASCII.

Also, the difference between uconv and iconv is that we can provide
multiple transliterations for any source character but we can't group
them into standards so we can't tell iconv to use this or another
system. It will just choose the best fitting the current output
character set and the only thing we can choose is the locale.

This makes me think: should we add a locale like ***@SystemA or
***@SystemB?

Regards,

Rafal
Egor Kobylkin
2018-12-01 22:53:17 UTC
Permalink
Post by Rafal Luzynski
Also, the difference between uconv and iconv is that we can provide
multiple transliterations for any source character but we can't group
them into standards so we can't tell iconv to use this or another
system. It will just choose the best fitting the current output
character set and the only thing we can choose is the locale.
Wouldn't it require to create 3 versions of every locale that would
include the translit_cyrillic file then? I.e. en_US + ***@SystemA,
***@SystemB etc.?

This in turn will make two of them optional (as cyrillic fonts are at
the moment). The highest value is in having the default locale being
able to transliterate, isn't it? So putting the transliteration to
optional locales kind of defeats the purpose.

An example from my experience as a user - a networked device or host
would often have the en_US as the default (only?) locale with no viable
way to change it or install cyrillic fonts. Anyway, this is the most
dire situation where the ASCII transliteration certainly helps most.
Having ***@SystemA or ***@SystemB theoretically available but not
compiled by the distributor wouldn't help here, would it?

So the only useful scenario here would be to ship your locales with the
transliteration already included by default in en_US. This way the
distributor won't have to get active to include transliteration as
***@SystemA or ***@SystemB.

From my (however limited) point of view it is better to have the System
B in first, then see if some code need to be changed to accommodate
System A/System B problematic. Again, System B is _transcription_ to
ASCII and System A _transliteration_ to Latin with different use cases.

It's insightful to see your comparison of the uconv vs. iconv!
Similar to your checks this is what I was using to see whether any
locale fails the transliteration for any cyrillic letter:

echo
"ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ЀХЊЧКЩЪЫЬЭЮЯабвгЎежзОйклЌМПпрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳюѵҌҍ
ҐґҒғҔҕҖҗҚқҞҟҢңҀҥҊҧҚҩҪҫҬҭҮүҲҳҎҵҺһҌҜҟҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӀӥӊӧӚөӰӱӲӳӎӵӞӹ’"|
LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8
iconv -f UTF-8 -t ASCII//TRANSLIT

should give (can be asserted with bash string comparison):

AaOoUussYODJG`YeZ`IYiJL`N`TSHK`U`DhABVGDEZHZIJKLMNOPRSTUUFHCCHSHSHHA`Y`E`YUYAabvgdezhzijklmnoprstuufhcchshshh``y`e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FhfhYhyhE`e`
G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`T`t`UuH`h`TCZtczSH`SH`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`Y`y`'

And I am attaching another file that has the Unicode Codepoints next to
the letters for easier identification of failures. (like "U0401-Ё
U0402-Ђ U0403-Ѓ etc.) Hope it will be helpful in creating the tests.

Best regards,
Egor Kobylkin
Egor Kobylkin
2018-12-03 22:19:03 UTC
Permalink
Rafal,

Just to touch base on this, what is the best way forward? Did you get
any input/feedback on your questions below? Are you expecting input from
anyone but myself?

On the blocking issue #2: I really don’t see the connection to the uk_UA
locale that has its transliteration table inline and is explicitly
excluded from my patch. It may be revealing another issue you have with
glibc but wouldn’t that be better addressed in a new bug?
Again, in the v10 of my patch I have removed multicharacter source
graphemes, so that issue is moot there.

If you’d like to overhaul the glibc translit system wouldn’t it be
better to commit the simple text file with the Cyrillic
translit(transcription) table first, fix the bug from the year 2006 and
then proceed from there all due diligence?

The same with having both System A and System B. Initially I went along
with the suggestion to include the system A but it is clear now that it
doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose
to set it aside for the moment and use the v10 without the system A.
That is the whole reason I have submitted it, to be superclear on that.

Now you saw that uconv is transcribing «ХА» as KHA (cap/cap/cap) that
should mitigate your concern about that issue too (somewhat, anyway).
Making it context based would also be about adding new code, see above.

Let me know if there’s anything I can help with getting more progress
with the decision

Bests,
Egor
Post by Rafal Luzynski
2. I made few tests in the command line and it seems to me that the
transliteration from "З" to "Z" (+ lowercase as well) in uk_UA does
not work and has not been working for some time already because I've
checked some older systems as well and the result is always the same.
I think that the reason is that uk_UA defines multiple
transliteration rules for "З" depending on what is the letter
following it. It does not seem to work. AFAIK the reason is that
the syntax of transliteration rules says that a single non-Latin
character may map one or more Latin strings, each consisting of one
or more characters. There cannot be a rule transliterating multiple
source characters into one or multiple destination characters. Is it
a bug in transliteration implementation? Or maybe in the
specification, including POSIX standard?
The definition of transliteration says that it is one-to-one mapping
of graphemes while a grapheme may be one or multiple characters. It
does not have to be always mapping one-to-one character. Should we
fix this bug first, make uk_UA transliteration work, and only then
add a generic Cyrillic transliteration? Egor's patch already
contains transliteration of "У" + combining acute accent to "Ú" which
most probably will not work.
I still think that in the longer term all existing custom
transliterations of Cyrillic alphabets should be ported to a
modification of your patch.
Post by Marko Myllynen
[...]
Given the amount of questions above I think the way forward is to try
follow the relevant standards as closely as possible and also check what
the other implementations (i.e., uconv(1)) do. For example, checking the
$ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Šema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Shema
$ uconv -V
uconv v2.1 ICU 50.1.2
I've played a little with uconv and unfortunately it does not look good
to me.
It does not have any fallback transliteration to plain ASCII. When it says
that 'Ш' is transliterated to 'Š' then it always uses 'Š' and if the target
$ echo Шема | uconv -f UTF-8 -t ASCII -x cyrillic-latin
Conversion from Unicode to codepage failed at output byte position 0.
Unicode: 0160 Error: Invalid character found
$ echo Шема | uconv -f UTF-8 -t ISO-8859-1 -x cyrillic-latin
Conversion from Unicode to codepage failed at output byte position 0.
Unicode: 0160 Error: Invalid character found
$ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin
�ema
$ echo Шема | uconv -f UTF-8 -t ISO-8859-2 -x cyrillic-latin | uconv -f
ISO-8859-2 -t UTF-8
Šema
It seems to follow ISO 9 (GOST 7.79) System A. However, the transliteration
$ echo нъе | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
nʺe
$ echo НЪЕ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Nʺ̱E
$ echo Ъ | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
ʺ̱
$ echo Ъ | uconv -f UTF-8 -t UTF-16 -x cyrillic-latin| hexdump -x
0000000 feff 02ba 0331 000a
0000008
02BA MODIFIER LETTER DOUBLE PRIME
0331 COMBINING MACRON BELOW
$ echo Шема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
Shema
$ echo Схема | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
Skhema
Converting 'х' to 'kh' seems to be common in English transliteration but
it does not follow any ISO standard.
$ echo ХА ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
KHA kha
This means that the choice whether a digraph in the output should be
all uppercase or maybe upper+lower is context based, something which we
probably cannot implement. But definitely a good thing.
$ echo Ещё | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
Yeshchë
$ echo Ещё | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN
Conversion from Unicode to codepage failed at output byte position 6.
Unicode: 00eb Error: Invalid character found
So the output is not plain ASCII.
$ echo е же ле не | uconv -f UTF-8 -t ASCII -x Russian-Latin/BGN
ye zhe le ne
it is 'ye' in the beginning of a word and 'e' otherwise.
$ uconv -V
uconv v2.1 ICU 60.2
It seems that uconv will not be a good hint about transliterating
to plain ASCII.
Also, the difference between uconv and iconv is that we can provide
multiple transliterations for any source character but we can't group
them into standards so we can't tell iconv to use this or another
system. It will just choose the best fitting the current output
character set and the only thing we can choose is the locale.
Regards,
Rafal
Rafal Luzynski
2018-12-08 01:15:40 UTC
Permalink
Post by Egor Kobylkin
[...]
1. lack of explicit control which transformation to use (System A or
System B) via //TRANSLIT
2. possibility of collision for System B if used CAP/low transcription
for capital letters
3. Cyrillic 'Х'/'х' (ha) never transcribes to 'H'/'h' as it should per
System B because it's equivalent 'X'/'x' from System A is always present
and takes precedence.
True.
Post by Egor Kobylkin
As a solution shouldn't we only keep System B in a new file
transcribe_cyrillic and put it in place as the explicit ASCII
transcription for targeted locales (as opposed to transliteration)?
We would keep System A as translit_cyrillic but won't include it into
this patch. Once you have resolved an issue of having two conflicting
rule-sets but only one key //TRANSLIT you could add the System A back.
Sounds like a good idea to provide those two files:

* translit_cyrillic_system_a,
* translit_cyrillic_system_b,

(or any other pair of names) and let the individual locales choose whether
they want to include System A or System B. For optimization, system_b
file could include system_a and modify it.
Post by Egor Kobylkin
The SH/Sh can be decided on either way - seems like an easy change any way.
I'm in favor of "Sh" because it will work fine for titlecased words
(where only the first letter is uppercase) but I'm aware it would be
a problem for uppercased words. Unfortunately, I think we are unable
to satisfy both cases.
Post by Egor Kobylkin
Post by Rafal Luzynski
Egor, while at this I was thinking about your idea to transliterate
letters like "Ш" (uppercase) to "SH" (always uppercase) in order to
distinguish between "Шема" (-> "SHema") and "Схема" (-> "Shema" or
"Sxema").
to clarify, this SH/Sh collision issue relates only to iconv -f UTF-8 -t
ASCII//TRANSLIT (i.e. System B transcription).
True.
Post by Egor Kobylkin
But it's not only SH/Sh, there are following combinations used to
YO, DJ, YE, TSH, DH, ZH, CZ, CH, SH, SHH, YU, YA, FH, YH, GH, NG, TCZ
Absolutely true. I skip the whole list only for the brevity: if we
find a solution for one letter the same solution will work fine for
all others.
Post by Egor Kobylkin
[...]
With transcription we are basically striping information from the data,
mapping it into a smaller character set. The idea to keep them in
CAP/CAP is to try to preserve as much information as possible.
I'm only afraid that things like "TWo CApitals" or "CamelCase" are
common among us computer geeks while they do not look great when
working with natural language and when displaying them to regular users
and even non-computer people.
Post by Egor Kobylkin
[...]
So in fact we have two rules for each letter in the same file (System A
and System B), where System A takes precedence.
I have a question then: isn't this more like a hack than a right thing
to do?
Shouldn't we have two explicit rules for transcription and
transliteration not dependent on a destination character set?
It's impossible with the current API of iconv. Maybe it would be
possible ever in future but that's a greater amount of work than what
we are doing here now. Again, for now different set of rules = different
locale.

I have another question: is it really a job of transliteration to preserve
all original information, to ensure no collisions and have the ability to
restore the original text? I'm afraid that as long as plain ASCII is the
destination charset whatever system we provide it will always be possible
to provide a malicious combination of the Cyrillic characters proving that
the system generates collisions.
Post by Egor Kobylkin
Post by Rafal Luzynski
I still don't like the idea to
put two uppercase letters in a beginning of a word in titlecase only
* drop the rule of transliterating "Х" to "H" and transliterate
always to "X",
This would contradict ISO 9.1995. (System A).
Yes, it would. I'm trying to find solution here since I think we have
proved that we can't implement a system which will handle System A,
System B, and ensure no collisions at the same time. At least one
requirement must be dropped (at least partially).
Post by Egor Kobylkin
System A was added on Marko's request (so setting him on TO:) I am
neutral on keeping it or dropping it, just to be clear.
I think I didn't see this Marko's request but I'm in favor of keeping
System A, too.

Marko, it would be good to hear your opinion about System A vs. System B
again.
Post by Egor Kobylkin
[...]
On the other hand, for my personal needs I care less about standards but
about current functionality and data loss because of missing
transcription altogether due to the BZ #2872.
I read this that you are open to a solution which is inspired by some
standards but does not implement them fully due to our technical
limitations.
Post by Egor Kobylkin
[...]
Marko,
Your example only covers _tansliteration_ to Latin Diacritics
[...]
while BZ #2872 is about _transcription_ to ASCII
[...]
So again, you are asking to have ISO 9.1995. System A but the bug is
about ISO 9.1995. System B (GOST 7.79-2000)
It's hard to say what the original bug reporter meant but I think that the
problem is that there is no transliteration from Cyrillic to any variant of
Latin, except in few locales. If System A was implemented but System B was
not then at least some characters would be handled correctly. Currently no
Cyrillic characters are handled.
Post by Egor Kobylkin
[...]
In any case once your patch lands I'm going to submit a follow-up patch
for fi_FI to make it compliant with the applicable national standard
(SFS 4900) which defines how to do Cyrillic transliteration /
transcription in the context Finnish.
I totally agree. As far as I can see, SFS 4900 is more similar to
System A (ISO 9) rather than System B, that is, it transliterates to Latin
characters with diacritics rather than plain ASCII. Marko, what is your
opinion about possible implementation of SFS 4900 in these cases:

* When the destination charset does not contain required Latin diacritic
characters (e.g., it is plain ASCII)?
* When the output is ambiguous, that means, when two different Cyrillic
strings produce the same Latin (or ASCII) output?

At the moment I am not curious about SFS 4900 but we are facing the same
problems now with ISO 9 and GOST 7.79.
Post by Egor Kobylkin
[...]
$ echo ХА ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
KHA kha
This means that the choice whether a digraph in the output should be
all uppercase or maybe upper+lower is context based, something which we
probably cannot implement. But definitely a good thing.
I forgot to include this test which is really interesting:

$ echo ХА Ха ха | uconv -f UTF-8 -t UTF-8 -x Russian-Latin/BGN
KHA Kha kha

which again confirms that the choice of all uppercase or just the first
letter uppercased is context based, a thing which we can't implement now.
Post by Egor Kobylkin
Post by Rafal Luzynski
[...]
Wouldn't it require to create 3 versions of every locale that would
OK, please read this as another brainstorming idea and let's just
forget it.
Post by Egor Kobylkin
[...]
An example from my experience as a user - a networked device or host
would often have the en_US as the default (only?) locale with no viable
way to change it or install cyrillic fonts. Anyway, this is the most
dire situation where the ASCII transliteration certainly helps most.
compiled by the distributor wouldn't help here, would it?
So the only useful scenario here would be to ship your locales with the
transliteration already included by default in en_US. This way the
distributor won't have to get active to include transliteration as
Having the idea of "@SystemA" and "@SystemB" dropped I don't think
implementing any solution in glibc would be helpful for your use case.
Two reasons:

1. I believe that sooner or later someone will develop a transliteration
system for en_US which will follow English transliteration of Russian
instead of any standard we are discussing here. That means, it would
transliterate 'Х' as 'Kh' rather than 'H' or 'X'.
2. Currently there is a trend not to install even en_US locales and leave
only C which is hardcoded into glibc binaries. OTOH, I wouldn't mind
if ISO 9 was hardcoded into C as well.
3. That's beyond Russian language but transliteration according to Serbian
or Bulgarian or Ukrainian or Kazakh rules still requires installing their
proper locales. I think that requiring ru_RU to be installed could be
reasonable especially if we end up with ru_RU somehow differing from
the default "translit_cyrillic".

BTW you don't need Cyrillic fonts to be installed on your server in order
to process the Cyrillic text correctly unless your server renders the text.
Post by Egor Kobylkin
Rafal,
Just to touch base on this, what is the best way forward? Did you get
any input/feedback on your questions below? Are you expecting input from
anyone but myself?
Yes, I expected some input from more experienced maintainers about whether
and how to write the tests but I'd rather start another thread about it
because this one is too long already.
Post by Egor Kobylkin
On the blocking issue #2: I really don’t see the connection to the uk_UA
locale that has its transliteration table inline and is explicitly
excluded from my patch. It may be revealing another issue you have with
glibc but wouldn’t that be better addressed in a new bug?
OK, I was not precise enough (I'm sorry about it) so I'd like to explain
here:

1. In the long term goal I would like to convert those excluded locales
to use your translit_cyrillic as well.
2. In order to ensure that change is not destructive for them I will need
automatic tests to prove that their transliteration rules work the
same good before the change and after the change.
3. It does not matter that converting those other locales is in a distant
future because we need the same tests for Russian language now.
4. Even although I have not started writing any tests I can see they
will be failing for uk_UA. The reason is that glibc transliteration
rules can handle transliterating single characters into single
characters,
single characters into multiple characters but not multiple characters
into multiple (or even single) characters.
5. We can ignore uk_UA but we will face the same case in ru_RU where
you had a case of 'У́ ' ('У' + 'COMBINING ACUTE ACCENT').
6. So the question was: how (and whether) to write the tests if we
already know they would be failing? Skip them? Resolve the other
issue first? Mark them as XFAIL?

In the meantime, you have removed the controversial conversion rule
Post by Egor Kobylkin
Again, in the v10 of my patch I have removed multicharacter source
graphemes, so that issue is moot there.
so we can move to the next step.
Post by Egor Kobylkin
If you’d like to overhaul the glibc translit system wouldn’t it be
better to commit the simple text file with the Cyrillic
translit(transcription) table first, fix the bug from the year 2006 and
then proceed from there all due diligence?
I agree and we are now one step forward.
Post by Egor Kobylkin
The same with having both System A and System B. Initially I went along
with the suggestion to include the system A but it is clear now that it
doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose
to set it aside for the moment and use the v10 without the system A.
That is the whole reason I have submitted it, to be superclear on that.
OK, I think that now I understand your reason to drop System A better.
But still I'd like to rethink implementing System A somehow and drop
(or rather: implement only partially) System B.
Post by Egor Kobylkin
Now you saw that uconv is transcribing «ХА» as KHA (cap/cap/cap) that
should mitigate your concern about that issue too (somewhat, anyway).
Making it context based would also be about adding new code, see above.
It would also require the changes in the syntax of the source code
of locale data and possibly breaking the POSIX compatibility which
I think would be unacceptable.
Post by Egor Kobylkin
Let me know if there’s anything I can help with getting more progress
with the decision
I'm afraid you can't help more. I'd like to hear some feedback from other
people. Due to some minor obstacles we can't resolve this issue being only
two here.

Regards,

Rafal
Marko Myllynen
2018-12-10 21:20:33 UTC
Permalink
Hi,
Post by Rafal Luzynski
Post by Egor Kobylkin
The SH/Sh can be decided on either way - seems like an easy change any way.
I'm in favor of "Sh" because it will work fine for titlecased words
(where only the first letter is uppercase) but I'm aware it would be
a problem for uppercased words. Unfortunately, I think we are unable
to satisfy both cases.
I think I'm in favor of "Sh" as well, although not perfect I'd assume
it's probably going to be correct in more cases than SH.
Post by Rafal Luzynski
Post by Egor Kobylkin
System A was added on Marko's request (so setting him on TO:) I am
neutral on keeping it or dropping it, just to be clear.
I think I didn't see this Marko's request but I'm in favor of keeping
System A, too.
Marko, it would be good to hear your opinion about System A vs. System B
again.
I think System A is a better option as it should be the same as ISO 9
and perhaps also produces results in some cases which are more expected
than with System B (if the Wikipedia ISO 9 article is to be believed).

Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also
deviate from it if needed, however with System A + ASCII fallback
definitions the RFE should be satisfied as well?
Post by Rafal Luzynski
Post by Egor Kobylkin
[...]
In any case once your patch lands I'm going to submit a follow-up patch
for fi_FI to make it compliant with the applicable national standard
(SFS 4900) which defines how to do Cyrillic transliteration /
transcription in the context Finnish.
I totally agree. As far as I can see, SFS 4900 is more similar to
System A (ISO 9) rather than System B, that is, it transliterates to Latin
characters with diacritics rather than plain ASCII. Marko, what is your
* When the destination charset does not contain required Latin diacritic
characters (e.g., it is plain ASCII)?
This would be according to http://jkorpela.fi/iso9.html8 so for example
instead of ž -> zh and instead of štš -> shtsh.
Post by Rafal Luzynski
* When the output is ambiguous, that means, when two different Cyrillic
strings produce the same Latin (or ASCII) output?
This is a good point and one I haven't considered but I'm not sure is
there anything we can do about this (at least without major locale
system internals work)? Do you have any rough idea how frequently this
could happen or is this more a theoretical issue? (Sorry if I've missed
earlier comments about this, it's been a long thread.)
Post by Rafal Luzynski
Post by Egor Kobylkin
The same with having both System A and System B. Initially I went along
with the suggestion to include the system A but it is clear now that it
doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose
to set it aside for the moment and use the v10 without the system A.
That is the whole reason I have submitted it, to be superclear on that.
OK, I think that now I understand your reason to drop System A better.
But still I'd like to rethink implementing System A somehow and drop
(or rather: implement only partially) System B.
Yes, I also think System A AKA ISO 9 would be a better choice but I'll
leave the final decision for you two (and others who might weigh in).

Thanks,
--
Marko Myllynen
Egor Kobylkin
2018-11-19 11:10:50 UTC
Permalink
Changelog v10:
* Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin
with diacritics) as conflicting with System B within glibc mechanics and
not solving BZ #2872
* Edited below email, commit message, comment in translit_cyrillic to
reflect System A removal
* Removed <U0423><U0301> and <U0443><U0301> (Cyrillic U with acute,
using composition) as composing is not covered by current glibc
conversion mechanics

Changelog v9:
* Fixed formatting (trailing spaces etc.)
* Put commit summary in the patch file, now it is generated completely
by git format-patch

Changelog v8:
* Re-added missing translit_cyrillic in patch v7 (due to missing "git
add" in the script).

Changelog v7:
* Generated against git://sourceware.org/git/glibc.git master with git
format-patch.
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.

Changelog v6:
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* Consistently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".

Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration table translit_cyrillic file [7]
to localedata/locales/ and include it in all your locales going forward.

The patch is attached.

From this patch I have excluded locales that already mention cyrillic or
have a transliteration table for it:

mn_MN
sr_RS
tg_TJ
tk_TM
tt_RU
uk_UA
uz_UZ
***@cyrillic
uk_UA

Their maintainers are requested to make an explicit decision on how and
whether at all to include this patch.

Current bug effect:

The glibc wiki explicitly lists this use case as the test example

https://sourceware.org/glibc/wiki/Locales#Testing_Locales :

LC_ALL=$LOCALE.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt

currently it fails on Cyrillic texts in most locales including ru_RU [1]
[8] [9]:

LC_ALL=ru_RU.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT <
translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- It produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here. Furthermore it has to be referenced/included into the
active locale at the compilation time to be used by iconv.



COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Example: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The transliteration table itself is in the file translit_cyrillic [7].
Its content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 System B official source (Federal Agency on
Technical Regulating and Metrology Of Russian Federation [2]).
Technically an independent but mostly identical source [3] was used and
prepared in a spreadsheet [6].

The transliteration of Cyrillic to ASCII according to GOST 7.79-2000
System B represents what is actually called transcription (preserving
phonemes), while System A is the transliteration (preserving graphemes).
There is no meaningful way to preserve graphemes converting Cyrillic to
ASCII and thus the System B is chosen. [11]

The documentation suggests that the transliteration tables inclusion is
done by adding *include "translit_cyrillic";""* string into LC_CTYPE
translit_start section
http://man7.org/linux/man-pages/man5/locale.5.html [5]
Practically all locales that already have 'include .*translit.*;""'
string were identified and included into this patch.

The Cyrillic transliteration of e.g. Russian text may have already
worked to some extent for mn_MN, sr_RS, tk_TM, uz_UZ, uk_UA locales that
have their transliteration tables included inline.

I am excluding these locales from this proposed patch. I have written
directly to locale maintainer emails listed in the files. Volodymyr
Lisivka <***@gmail.com>, Max Kutny <***@gmail.com> (uk_UA),
ДаМОлП КегаМ <***@gnome.org> (sr_RS) have confirmed the
exclusion.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1
[7] translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
[11]
https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=gslmka8xq3

Best regards,
Egor Kobylkin
Rafal Luzynski
2018-12-07 23:35:56 UTC
Permalink
Post by Egor Kobylkin
* Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin
with diacritics) as conflicting with System B within glibc mechanics and
not solving BZ #2872
I'm in favor of implementing System A and dropping System B instead.
If I understand correctly, System A is actually ISO 9, therefore it is
international, universal, and neutral, while System B is a GOST standard
and therefore used only in Russia (also adopted in several other countries
as well).

It's true that we can't handle both System A and System B. What we
would like to have is:


System A
/============> OUTPUT: Latin with diacritics
INPUT < System B
\============> OUTPUT: Plain ASCII (fallback)

That means: use one system but if the output can't handle it then switch
to another system.

But what we can actually have is either:

System A Fallback
INPUT ============> OUTPUT: Latin with diacritics ============> Plain
ASCII

or:

System B
INPUT ============> OUTPUT: Plain ASCII

That means, we can only provide a fallback for individual characters,
we can't provide a fallback algorithm (that is, we can't switch to
transliterating 'Х' as 'X' instead of 'H' just because we can't
transliterate
'Ш' as 'Š' and switch to 'SH' instead).

Wouldn't it be better to implement ISO 9 (System A) instead and provide
a fallback ASCII transliteration which could be similar but not identical
to System B? Is it necessary to provide plain ASCII transliteration
conforming to System B even if that means that we would have not to
implement System A? If yes, would it be correct to provide System B
for ru_RU (and maybe few more locales) but include System A in all other
locales (except few which we exclude already)?
Post by Egor Kobylkin
* Edited below email, commit message, comment in translit_cyrillic to
reflect System A removal
* Removed <U0423><U0301> and <U0443><U0301> (Cyrillic U with acute,
using composition) as composing is not covered by current glibc
conversion mechanics
OK, thank you, I like this change.
Post by Egor Kobylkin
[...]
The transliteration of Cyrillic to ASCII according to GOST 7.79-2000
System B represents what is actually called transcription (preserving
phonemes), while System A is the transliteration (preserving graphemes).
There is no meaningful way to preserve graphemes converting Cyrillic to
ASCII and thus the System B is chosen. [11]
I'm not sure it should be actually called transcription. IIUC,
transcription
reflects pronunciation, something we can't easily implement in glibc.
As long as we convert letters to letters (or group of letters to group
of letters) without taking pronunciation into account it should be
called transliteration. OTOH, I agree that it is rather uncommon in
Russian language to find an example where pronunciation is not perfectly
reflected in spelling.
Post by Egor Kobylkin
+% Generated from UnicodeData.txt with a spreadsheet referenced
+% in that bugs doclet
The previous versions of your patch had "in that bug's doclet" here
which I think is correct.

I like the version 9 of your patch more so I'm going to write a more
thorough review of it.

Regards,

Rafal
Egor Kobylkin
2018-12-08 21:51:47 UTC
Permalink
Rafal, Dmitry, Marko, Mike
Post by Rafal Luzynski
Changelog v10: * Removed ISO 9.1995 GOST 7.79-2000 System A
(transliteration to Latin with diacritics) as conflicting with
System B within glibc mechanics and not solving BZ #2872
I'm in favor of implementing System A and dropping System B instead.
The BZ #2872 bug name is explicitly "Transliteration Cyrillic -> ASCII
fails". The ISO 9 System A does not map to ASCII so it is not a solution
to BZ #2872 at all.

I was scratching my head as to how can we avoid the explosion of the
scope for this patch. And then it appeared to me that it was wrong to
target all the present locales for the ASCII translit. This seems to be
the root cause for this prolonged A vs. B discussions. The proper target
for my table is actually the C locale translit file
(locale/C-translit.h.in). I will submit a proper patch shortly.

If anyone wants to keep working on the implementation of the Latin
Diacritics transliteration of the Cyrillic letters (System A) you are
welcome to use the tables I have submitted before (v9). That would be a
new feature for glibc as per my understanding. Let's just make super
clear the distinction of the System A (Latin with Diacritics, non-ASCII)
to the ASCII translit as mentioned in BZ #2872 (System B).

My focus is super sharp on helping with Cyrillic -> ASCII translit
availability for a default installation with glibc.

Hope this helps,
Egor
Egor Kobylkin
2018-12-08 22:28:17 UTC
Permalink
Changelog v11:
* Re-targeted the patch against locale/C-translit.h.in as the proper
file for the ASCII translit table.
* Correspondingly the patch now only contains the additional
Cyrillic-ASCII strings in the format of locale/C-translit.h.in table.
The 'include "translit_cyrillic";""' directives are not necessary in the
locale files and they are now all left intact.
* Also the file translit_cyrillic is not longer needed and is omitted.
* Edited below email, commit message.

Changelog v10:
* Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin
with diacritics) as conflicting with System B within glibc mechanics and
not solving BZ #2872
* Edited below email, commit message, comment in translit_cyrillic to
reflect System A removal
* Removed <U0423><U0301> and <U0443><U0301> (Cyrillic U with acute,
using composition) as composing is not covered by current glibc
conversion mechanics

Changelog v9:
* Fixed formatting (trailing spaces etc.)
* Put commit summary in the patch file, now it is generated completely
by git format-patch

Changelog v8:
* Re-added missing translit_cyrillic in patch v7 (due to missing "git
add" in the script).

Changelog v7:
* Generated against git://sourceware.org/git/glibc.git master with git
format-patch.
* The 'include "translit_cyrillic";""' now immediately follows last
'include "translit_XXX";""' string (was inserted just before
translit_end previously.)
* Only the locales already having 'include .*translit.*;""' are patched
(see the list for manual exclusions below, full list of included locales
at the end of the email in the commit section.)
* Excluded az_AZ completely to avoid circular reference from tr_TR via
“copy "tr_TR"”.

Changelog v6:
* Locales removed from the patch: C and sd_PK.
* Added locales: az_AZ and ky_KG.
* Consistently transliterate single uppercase Cyrillic letters
to sequences of all uppercase Latin letters in all languages (whenever
a Cyrillic letter is transliterated to more than one Latin letter),
for example "Ї" is now transliterated as "YI" rather than "Yi".

Dear locale maintainers,

fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"

https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]

add the Cyrillic transliteration rows to locale/C-translit.h.in.

The patch is attached.


Current bug effect:

The glibc wiki explicitly lists this use case as the test example and
currently it fails on Cyrillic texts [1] [8] [9]:

iconv -f UTF-8 -t ASCII//TRANSLIT < translit-test-input.txt |grep CYRILLIC

CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.

- it produces a string of question marks and spaces.

This is what it should produce and it does so after the patch applied:

CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
chayu.


The root problem and the fix:

The root problem is the missing transliteration table that I am
supplying here.


COMMIT MESSAGE:
This translit_cyrillic table enables conversion (e.g. with iconv) from a
UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.

Example: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
compatible transcription.

While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
a transliteration/transcription has only Latin/ASCII codes but still can
be read by a native speaker. Among other things it is useful for
processing the Cyrillic texts and filenames by programs or on systems
that are not specifically prepared to work with Cyrillic, don't have
corresponding fonts installed or can't handle UTF-8.

The patch content (mapping) is based on ISO 9.1995 standard [10] and its
derivative GOST 7.79-2000 System B official source (Federal Agency on
Technical Regulating and Metrology Of Russian Federation [2]).
Technically an independent but mostly identical source [3] was used and
prepared in a spreadsheet [6].

The transliteration of Cyrillic to ASCII according to GOST 7.79-2000
System B represents what is actually called transcription (preserving
phonemes), while System A is the transliteration (preserving graphemes).
There is no meaningful way to preserve graphemes converting Cyrillic to
ASCII and thus the System B is chosen [11]. To be super clear the System
A has nothing to do with this bug regardless it being a transliteration.

Those interested in implementing System A for transliteration of
Cyrillic to Latin with Diacritic as a new feature are welcome to use the
spreadsheet in [6] as a starting point.

Links:

[1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
[2] GOST 7.79-2000 official source
http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
available in low quality gif format)
[3] http://transliteration.ru/gost-7-79-2000/ and
http://www.yfermer.ru/specifications/285821.html
[4] Wikipedia article on Cyrillic transliteration with Latin alphabet
https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
[5] http://man7.org/linux/man-pages/man5/locale.5.html
[6] Spreadsheet for generating translit_cyrillic
https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1
[8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
[9] translit-test-input.txt
https://sourceware.org/bugzilla/attachment.cgi?id=11304
[10] https://en.wikipedia.org/wiki/ISO_9#GOST_7.79_System_B
[11]
https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=gslmka8xq3

Best regards,
Egor Kobylkin
Loading...