Discussion:
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc.
Paul Eggert
2015-05-04 01:13:10 UTC
Permalink
Although C-x 8 lets you insert arbitrary Unicode characters, it's
awkward to use this to insert commonly used symbols such as curved
quotes, the Euro symbol, etc. This patch adds simpler sequences for
ISO 8859-15 characters (which includes the Euro), plus characters that
are commonly found in English text and in basic math. For example,
assuming the Alt key works on your keyboard and iso-transl is loaded,
one can now type "A-[" instead of "A-RET LEFT SIN TAB RET" to get the
character "‘" (U+2018 LEFT SINGLE QUOTATION MARK).
* doc/emacs/mule.texi (Unibyte Mode), etc/NEWS: Latin-9 and a few
other printing characters now work too.
* lisp/international/iso-transl.el (iso-transl-char-map):
Also support ISO 8859-15 characters (e.g., "€"), plus the characters
"–—‘’“”†‡•′″←→↔−≈≠≤≥" which are commonly used in English text
or basic math.
This patch is a followup to Bug#20385; although it is a separate issue
and does not fix Bug#20385, it could make fixing Bug#20385 easier.
---
doc/emacs/mule.texi | 4 ++--
etc/NEWS | 2 ++
lisp/international/iso-transl.el | 33 ++++++++++++++++++++++++++++++++-
3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index de381df..03e70da 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -1660,8 +1660,8 @@ characters present directly on the keyboard or using @key{Compose} or
@cindex compose character
@cindex dead character
@item
-For Latin-1 only, you can use the key @kbd{C-x 8} as a ``compose
-character'' prefix for entry of non-@acronym{ASCII} Latin-1 printing
+You can use the key @kbd{C-x 8} as a ``compose character'' prefix for
+entry of non-@acronym{ASCII} Latin-1, Latin-9, and a few other printing
characters. @kbd{C-x 8} is good for insertion (in the minibuffer as
well as other buffers), for searching, and in any other context where
a key sequence is allowed.
diff --git a/etc/NEWS b/etc/NEWS
index 7497652..3313c56 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -213,6 +213,8 @@ successive char insertions.

** Unicode names entered via C-x 8 RET now use substring completion by default.

+** C-x 8 now has shorthands for Latin-9 and a few other commonly used chars.
+
** New minor mode global-eldoc-mode is enabled by default.

** Emacs now supports "bracketed paste mode" when running on a terminal
diff --git a/lisp/international/iso-transl.el b/lisp/international/iso-transl.el
index 73bcae0..ac91c1e 100644
--- a/lisp/international/iso-transl.el
+++ b/lisp/international/iso-transl.el
@@ -1,4 +1,4 @@
-;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
+;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-

;; Copyright (C) 1987, 1993-1999, 2001-2015 Free Software Foundation,
;; Inc.
@@ -36,6 +36,10 @@
;; to make all of the Alt keys autoload, and it is not clear
;; that the dead accent keys SHOULD autoload this package.

+;; This package supports all characters defined by ISO 8859-1 and ISO 8859-15,
+;; along with a few other ISO 10646 characters commonly used in English
+;; and computing text.
+
;;; Code:

;;; Provide some binding for startup:
@@ -192,6 +196,33 @@
("~o" . [?õ])
("~t" . [?þ])
("~~" . [?¬])
+ ("OE" . [?Œ])
+ ("Oe" . [?œ])
+ ("vS" . [?Š])
+ ("vs" . [?š])
+ ("\"Y" . [?Ÿ])
+ ("vZ" . [?Ž])
+ ("vz" . [?ž])
+ ("_n" . [?–])
+ ("_m" . [?—])
+ ("[" . [?‘])
+ ("]" . [?’])
+ ("{" . [?“])
+ ("}" . [?”])
+ ("1+" . [?†])
+ ("2+" . [?‡])
+ ("**" . [?•])
+ ("*'" . [?′])
+ ("*\"" . [?″])
+ ("*E" . [?€])
+ ("a<" . [?←])
+ ("a>" . [?→])
+ ("a=" . [?↔])
+ ("_-" . [?−])
+ ("~=" . [?≈])
+ ("/=" . [?≠])
+ ("_<" . [?≤])
+ ("_>" . [?≥])
("' " . "'")
("` " . "`")
("\" " . "\"")
--
2.1.0
Eli Zaretskii
2015-05-04 14:22:17 UTC
Permalink
Date: Sun, 3 May 2015 18:13:10 -0700
Although C-x 8 lets you insert arbitrary Unicode characters, it's
awkward to use this to insert commonly used symbols such as curved
quotes, the Euro symbol, etc. This patch adds simpler sequences for
ISO 8859-15 characters (which includes the Euro), plus characters that
are commonly found in English text and in basic math. For example,
assuming the Alt key works on your keyboard and iso-transl is loaded,
one can now type "A-[" instead of "A-RET LEFT SIN TAB RET" to get the
character "‘" (U+2018 LEFT SINGLE QUOTATION MARK).
* doc/emacs/mule.texi (Unibyte Mode), etc/NEWS: Latin-9 and a few
other printing characters now work too.
Also support ISO 8859-15 characters (e.g., "€"), plus the characters
"–—‘’“”†‡•′″←→↔−≈≠≤≥" which are commonly used in English text
or basic math.
Shouldn't we prefer input methods instead? We already have a plethora
of Latin-N-something input methods (including latin-9-prefix), so why
not add more characters there, instead of using iso-transl?

I think input methods generally get less in your way.
Ivan Shmakov
2015-05-04 15:20:56 UTC
Permalink
severity 20499 wishlist
merge 16082 20499
thanks
Post by Eli Zaretskii
Post by Paul Eggert
From: Paul Eggert Date: Sun, 3 May 2015 18:13:10 -0700
Although C-x 8 lets you insert arbitrary Unicode characters, it's
awkward to use this to insert commonly used symbols such as curved
quotes, the Euro symbol, etc. This patch adds simpler sequences for
ISO 8859-15 characters (which includes the Euro), plus characters
that are commonly found in English text and in basic math. For
example, assuming the Alt key works on your keyboard and iso-transl
is loaded, one can now type "A-[" instead of "A-RET LEFT SIN TAB
RET" to get the character "‘" (U+2018 LEFT SINGLE QUOTATION MARK).
First of all, isn’t this essentially the same suggestion as the
one of bug#16082? (FWIW, I’ve requested the reports to be
merged; feel free to unmerge if I’ve missed something.)

[…]
Post by Eli Zaretskii
Shouldn't we prefer input methods instead? We already have a
plethora of Latin-N-something input methods (including
latin-9-prefix), so why not add more characters there, instead of
using iso-transl?
I think input methods generally get less in your way.
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods? For one thing, I currently
use “no” input method for typing English /and/
russian-typewriter to type Russian.

With the proper Unicode quotes being available via some other
input method, how would I configure Emacs to switch between
/that/ input method and russian-typewriter?

The other side of the issue is that the dashes, arrows,
mathematical symbols, and the likes of them are cross-lingual,
and making them available via input methods will involve
duplication of many of the individual quail-define-rules entries
all around leim/quail/*.el. (If done the straightforward way;
AIUI, anyway.)
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Eli Zaretskii
2015-05-04 15:41:52 UTC
Permalink
Date: Mon, 04 May 2015 15:20:56 +0000
Post by Eli Zaretskii
Shouldn't we prefer input methods instead? We already have a
plethora of Latin-N-something input methods (including
latin-9-prefix), so why not add more characters there, instead of
using iso-transl?
I think input methods generally get less in your way.
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods?
I simply use "C-u C-\". Granted, if every 2nd character you type is
U+2018, switching input methods is gonna hurt. But that's not wwhat
happens normally, at least not to me, and you save those Alt-[
etc. for more useful tasks.
Ivan Shmakov
2015-05-04 16:12:28 UTC
Permalink
Post by Eli Zaretskii
Post by Ivan Shmakov
From: Ivan Shmakov Date: Mon, 04 May 2015 15:20:56 +0000
[…]
Post by Eli Zaretskii
Post by Ivan Shmakov
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods?
I simply use "C-u C-\".
Given that I edit texts which may be deemed bilingual (Russian
prose interspersed with source code or command line examples)
not just occasionally, /and/ need C-s, C-r at that, – no,
I don’t think it’d work all that well for me.
Post by Eli Zaretskii
Granted, if every 2nd character you type is U+2018, switching input
methods is gonna hurt.
It’s not that bad, but still; consider, e. g.:

«Ты пророк», вскричал я, «вещий! Птица ты иль дух зловещий,
Этим Небом, что над нами — Богом скрытым навсегда —
Заклинаю, умоляя, мне сказать, — в пределах Рая
Мне откроется ль святая, что средь ангелов всегда,
Та, которую Ленорой в небесах зовут всегда?»
Каркнул Ворон: «Никогда».

Nine such characters per 43 words.
Post by Eli Zaretskii
But that's not what happens normally, at least not to me, and you
save those Alt-[ etc. for more useful tasks.
My ‘Alt’ is ‘Meta’ most of the time, so it’s rather C-x 8 [,
C-x 8 ], etc. for me, and reserving that for typography isn’t
really a big deal.

--
FSF associate member #7257 htt
Eli Zaretskii
2015-05-04 16:31:34 UTC
Permalink
Date: Mon, 04 May 2015 16:12:28 +0000
Post by Eli Zaretskii
Post by Ivan Shmakov
From: Ivan Shmakov Date: Mon, 04 May 2015 15:20:56 +0000
[…]
Post by Eli Zaretskii
Post by Ivan Shmakov
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods?
I simply use "C-u C-\".
Given that I edit texts which may be deemed bilingual (Russian
prose interspersed with source code or command line examples)
not just occasionally, /and/ need C-s, C-r at that, – no,
I don’t think it’d work all that well for me.
Don't you have a dual-language keyboard on your system that can switch
languages without Emacs being involved? Input methods are for
characters not directly supported by your keyboard; most systems have
at least 2, sometimes 3 different languages switchable by a hot key.

IOW, I won't expect you to need an input method to type Cyrillic
characters.
Post by Eli Zaretskii
Granted, if every 2nd character you type is U+2018, switching input
methods is gonna hurt.
«Ты пророк», вскричал я, «вещий! Птица ты иль дух зловещий,
Этим Небом, что над нами — Богом скрытым навсегда —
Заклинаю, умоляя, мне сказать, — в пределах Рая
Мне откроется ль святая, что средь ангелов всегда,
Та, которую Ленорой в небесах зовут всегда?»
Каркнул Ворон: «Никогда».
Nine such characters per 43 words.
Those aren't quotes Paul was talking about. Those are Cyrillic-style
quotes frequently used in Cyrillic languages, and I'd expect them to
be directly available from your keyboard.

Paul's use case is with the original of this poem.
Post by Eli Zaretskii
But that's not what happens normally, at least not to me, and you
save those Alt-[ etc. for more useful tasks.
My ‘Alt’ is ‘Meta’ most of the time, so it’s rather C-x 8 [,
C-x 8 ], etc. for me, and reserving that for typography isn’t
really a big deal.
That's exactly the issue: most keyboards will have Alt taken already,
and typing "C-x 8 [" is a PITA, IMO. By contrast, 'C-\ "' is easy.

But if there are people who'd like to go iso-transl way, who am I to
object?
Ivan Shmakov
2015-05-04 18:12:27 UTC
Permalink
Post by Eli Zaretskii
Post by Eli Zaretskii
Post by Ivan Shmakov
From: Ivan Shmakov Date: Mon, 04 May 2015 16:12:28 +0000
From: Ivan Shmakov Date: Mon, 04 May 2015 15:20:56 +0000
[…]
Post by Eli Zaretskii
Post by Eli Zaretskii
Post by Ivan Shmakov
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods?
I simply use "C-u C-\".
Given that I edit texts which may be deemed bilingual (Russian prose
interspersed with source code or command line examples) not just
occasionally, /and/ need C-s, C-r at that, – no, I don’t think it’d
work all that well for me.
Don't you have a dual-language keyboard on your system that can
switch languages without Emacs being involved? Input methods are for
characters not directly supported by your keyboard; most systems have
at least 2, sometimes 3 different languages switchable by a hot key.
IOW, I won't expect you to need an input method to type Cyrillic
characters.
With tty frames, it /does/ make sense to use an input method.

Besides, C-u C-\ tends to be easier to use than the system’s
facility when I need to use some layout not otherwise typical to
my work. (Although I /do/ use setxkbmap(1) when it becomes
really necessary.)

[…]
Post by Eli Zaretskii
«Ты пророк», вскричал я, «вещий! Птица ты иль дух зловещий,
Этим Небом, что над нами — Богом скрытым навсегда —
Заклинаю, умоляя, мне сказать, — в пределах Рая
Мне откроется ль святая, что средь ангелов всегда,
Та, которую Ленорой в небесах зовут всегда?»
Каркнул Ворон: «Никогда».
Nine such characters per 43 words.
Those aren't quotes Paul was talking about. Those are Cyrillic-style
quotes frequently used in Cyrillic languages, and I'd expect them to
be directly available from your keyboard.
Paul's use case is with the original of this poem.
There’re no such quotation marks on the Cyrillic keyboard
layouts I’m aware of. It really is no different to the English
case — the only quotation mark you get “for free” is the good
old ‘"’. (And given that the Russian alphabet is 33 characters
– versus 26 for English – with the physical keyboard layout
being the same 104 keys, it’s actually a tad worse, with even
the comma typically bound to a shifted – Shift-. – key.)

These aren’t exactly “Cyrillic”, either, as both German and
French use exactly the same quotation marks.

Then, there’re the en and em dash characters, even though they
may not be (easily) discernible with a fixed-width font.

[…]
Post by Eli Zaretskii
My ‘Alt’ is ‘Meta’ most of the time, so it’s rather C-x 8 [,
C-x 8 ], etc. for me, and reserving that for typography isn’t really
a big deal.
That's exactly the issue: most keyboards will have Alt taken already,
and typing "C-x 8 [" is a PITA, IMO.
FWIW, I use C-x 8 <, > for years now.
Post by Eli Zaretskii
By contrast, 'C-\ "' is easy.
How do I define an input method so that ‘"’ is mapped to either
“ or ” depending on the context?
Post by Eli Zaretskii
But if there are people who'd like to go iso-transl way, who am I to
object?
I’m unsure on how much should the current list be expanded, but
I see no reason /not/ to support, say, C-x 8 1 / 8 for ⅛ when we
already support C-x 8 1 / 2, 4 for ½, ¼.

--
FSF associate member #7257 http:
Eli Zaretskii
2015-05-04 18:29:35 UTC
Permalink
Date: Mon, 04 May 2015 18:12:27 +0000
How do I define an input method so that ‘"’ is mapped to either
“ or ” depending on the context?
See texinfo.el for some ideas.
Stefan Monnier
2015-05-04 22:00:35 UTC
Permalink
Post by Ivan Shmakov
First of all, isn’t this essentially the same suggestion as the
one of bug#16082? (FWIW, I’ve requested the reports to be
merged; feel free to unmerge if I’ve missed something.)
Indeed. I'm not opposed to adding such things. I do wish C-x 8 was
changed to make use of the quail code somehow.

Also, I think it would be good to construct this table
semi-automatically, along the lines of what I've done for latin-ltx.el.
Post by Ivan Shmakov
I tend to agree with that, but is there currently an easy way to
switch between /two/ input methods? For one thing, I currently
use “no” input method for typing English /and/
russian-typewriter to type Russian.
Indeed. IIUC it would be trivial to let C-\ cycle between
a user-selected set of default input methods. Patch welcome.

I also wish it were possible to activate several input methods
at the same time. I don't (know how to) use state-based methods, but
for input methods like French or TeX, it isn't that hard to come up with
ways to create new input methods by combining or shifting (e.g. add
a prefix key, or drop a prefix) existing ones.
Post by Ivan Shmakov
The other side of the issue is that the dashes, arrows,
mathematical symbols, and the likes of them are cross-lingual,
and making them available via input methods will involve
duplication of many of the individual quail-define-rules entries
all around leim/quail/*.el. (If done the straightforward way;
AIUI, anyway.)
Indeed. Which is why I think it makes sense to try and develop ways to
create "partial input methods" and then combine them.


Stefan
Paul Eggert
2015-05-04 16:11:49 UTC
Permalink
Post by Eli Zaretskii
Shouldn't we prefer input methods instead?
Typically yes, but for common characters it's better to have a standard
way to input them in any context. The exact set of such characters is
of course debatable (and you could easily talk me out of the
more-obscure characters proposed), but quotes, dashes, and the Euro are
pretty basic to ordinary English text.

Also, Emacs has no English input method, which means Emacs users
currently have trouble writing good English text outside the ASCII
character set. I suppose we could add such a method, but that would
require more user training than the proposed approach. Anyway, Emacs is
natively English and support for basic English text should be available
everywhere.
Richard Stallman
2015-05-04 16:15:07 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

How about also adding s, t, S, T with cedilla, dotless i, and I with dot.
Also c and C with a hacek.

C-x 8 C-h is a good way of seeing what all the options are.
It may be worth documenting.

It would be nice to have C-u C-x = show the specific C-x 8 sequence
for a character, if there is one.

By the way, it would be good to have a file that consists of all of
unicode in numeric order. That would provide an easy way to pick some
unicode character (whose code you don't remember) and copying it into
some text.

¬
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-04 16:34:14 UTC
Permalink
Date: Mon, 04 May 2015 12:15:07 -0400
By the way, it would be good to have a file that consists of all of
unicode in numeric order.
Would admin/unidata/UnicodeData.txt do?
Ivan Shmakov
2015-05-04 16:48:39 UTC
Permalink
Post by Eli Zaretskii
Post by Richard Stallman
Date: Mon, 04 May 2015 12:15:07 -0400 From: Richard Stallman
By the way, it would be good to have a file that consists of all of
unicode in numeric order. That would provide an easy way to pick
some unicode character (whose code you don't remember) and copying
it into some text.
Would admin/unidata/UnicodeData.txt do?
I guess given the “copying” part, the request is more along the
lines of, say:

(let ((i #x100))
(while (< i #x180)
(when (zerop (mod i #x20))
(unless (eq ?\n (preceding-char))
(insert ?\n))
(insert (format "%06x" i) ?\s))
(insert ?\s i)
(setq i (+ 1 i))))
000100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
000120 Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
000140 ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
000160 Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ

I doubt we really need a file for that, though; rather, some
kind of a “Unicode browser” facility. (Not entirely unlike
list-colors-display, but with a dynamic list.)

--
FSF associate member #7257 http://am-1.org/~ivan/
Eli Zaretskii
2015-05-04 17:03:36 UTC
Permalink
Date: Mon, 04 May 2015 16:48:39 +0000
Post by Eli Zaretskii
Post by Richard Stallman
Date: Mon, 04 May 2015 12:15:07 -0400 From: Richard Stallman
By the way, it would be good to have a file that consists of all of
unicode in numeric order. That would provide an easy way to pick
some unicode character (whose code you don't remember) and copying
it into some text.
Would admin/unidata/UnicodeData.txt do?
I guess given the “copying” part, the request is more along the
We distribute that file with Emacs, so "copying" is irrelevant, I
think.
(let ((i #x100))
(while (< i #x180)
(when (zerop (mod i #x20))
(unless (eq ?\n (preceding-char))
(insert ?\n))
(insert (format "%06x" i) ?\s))
(insert ?\s i)
(setq i (+ 1 i))))
000100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
000120 Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
000140 ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
000160 Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
Did you try to make this longer than 4 lines in a well-covered part of
the BMP? Most of Unicode codepoints on most end-user machines will
display as glyphless boxes, and that's _after_ Emacs searches like
hell after each character system-wide. IOW, such a feature would be
an annoyance, IMO.

By contrast UnicodeData.txt is a pure-ASCII file, and includes
everything except the glyphs themselves.
Ivan Shmakov
2015-05-04 17:40:06 UTC
Permalink
Post by Eli Zaretskii
Post by Ivan Shmakov
Post by Eli Zaretskii
Post by Richard Stallman
From: Ivan Shmakov Date: Mon, 04 May 2015 16:48:39 +0000
Date: Mon, 04 May 2015 12:15:07 -0400 From: Richard Stallman
By the way, it would be good to have a file that consists of all
of unicode in numeric order. That would provide an easy way to
pick some unicode character (whose code you don't remember) and
copying it into some text.
Would admin/unidata/UnicodeData.txt do?
I guess given the “copying” part, the request is more along the
We distribute that file with Emacs, so "copying" is irrelevant,
I think.
You cannot /copy/ a random Unicode character from
UnicodeData.txt – precisely because there’re /no/ non-ASCII
characters in that file in the first place.

Arguably, you cannot pick one, either, if you only know how it
/looks/ – not how it’s named. (As in: named in English.)

Otherwise, I tend to keep a copy of [1] at hand, sure.

[1] http://unicode.org/Public/UNIDATA/NamesList.txt

[…]
Post by Eli Zaretskii
Did you try to make this longer than 4 lines in a well-covered part
of the BMP? Most of Unicode codepoints on most end-user machines
will display as glyphless boxes, and that's _after_ Emacs searches
like hell after each character system-wide. IOW, such a feature
would be an annoyance, IMO.
On a tty frame, it surely wouldn’t. But I’ve got your point.

One more reason to use a dynamic list, BTW. Even more so if
there’s a way to check whether the glyph is available (or,
rather, was available when Emacs last checked) from Lisp.

[…]
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Stefan Monnier
2015-05-04 22:00:03 UTC
Permalink
Post by Ivan Shmakov
Arguably, you cannot pick one, either, if you only know how it
/looks/ – not how it’s named. (As in: named in English.)
BTW, the completion in C-x 8 RET will not only show you the character
name but will also (try to) display the actual character as an
annotation in the *Completions* buffer.


Stefan
Richard Stallman
2015-05-05 14:38:53 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Ivan Shmakov
Post by Eli Zaretskii
Would admin/unidata/UnicodeData.txt do?
It doesn't do the job, becuase it doesn't contain the characters
themselves.
Post by Ivan Shmakov
000100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
That's what I have in mind. Perhaps we should have a command that
generates it.

However, in addition to these lines of characters, it should have
other lines with the names of the scripts and the languages they
belong to, so you can search for those.

If you type RET on a character, it should visit
admin/unidata/UnicodeData.txt and move to the corresponding line.

Likewise, admin/unidata/UnicodeData.txt could have a special major
mode, so that typing RET on the line describing some character
switches to the all-of-unicode buffer and goes to the right character
in it.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Ted Zlatanov
2015-05-05 14:49:36 UTC
Permalink
On Tue, 05 May 2015 10:38:53 -0400 Richard Stallman <***@gnu.org> wrote:

RS> If you type RET on a character, it should visit
RS> admin/unidata/UnicodeData.txt and move to the corresponding line.

Could something like eldoc be used instead to show the information and
the all the shortcuts to that character without switching buffers?

Ted
Eli Zaretskii
2015-05-05 15:32:39 UTC
Permalink
Date: Tue, 05 May 2015 10:49:36 -0400
Could something like eldoc be used instead to show the information and
the all the shortcuts to that character without switching buffers?
Sounds like a natural extension of "C-x =".

(And no, I don't think that showing that info without an explicit user
command is a good idea in this case. Eldoc has a very different use
case in mind.)
Ivan Shmakov
2015-05-05 16:05:37 UTC
Permalink
Post by Eli Zaretskii
Post by Ted Zlatanov
Date: Tue, 05 May 2015 10:49:36 -0400
Could something like eldoc be used instead to show the information
and the all the shortcuts to that character without switching
buffers?
Sounds like a natural extension of "C-x =".
Agreed.
Post by Eli Zaretskii
(And no, I don't think that showing that info without an explicit
user command is a good idea in this case. Eldoc has a very different
use case in mind.)
I’m not fond of Eldoc, but I presume that after an explicit user
M-x unicode-data-mode command – it could be fine.

I’d also prefer for that same mode to support NamesList.txt.
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Eli Zaretskii
2015-05-05 15:31:09 UTC
Permalink
Date: Tue, 05 May 2015 10:38:53 -0400
Post by Eli Zaretskii
Would admin/unidata/UnicodeData.txt do?
It doesn't do the job, becuase it doesn't contain the characters
themselves.
You mean, the glyphs? (It does show the codepoint, so you can easily
display the character via "C-x 8 RET".)

As for showing the glyphs, visiting a file with large number of
characters runs a high risk of being an annoyance due to the
corresponding fonts being unavailable on the system. E.g., "C-h H",
which only shows a small part of those, takes 4 sec on my system with
an optimized build, and about 6 in a non-optimized build.

So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.
If you type RET on a character, it should visit
admin/unidata/UnicodeData.txt and move to the corresponding line.
I'm not sure showing UnicodeData.txt in its raw form will be useful.
Most people won't know how to interpret the attributes encoded there,
about the only understandable parts are the codepoint and the name.
And we already show this in human-readable form in "C-u C-x =", so we
could simply reuse the same code here.
Ivan Shmakov
2015-05-05 16:20:50 UTC
Permalink
Post by Eli Zaretskii
Post by Richard Stallman
Date: Tue, 05 May 2015 10:38:53 -0400 From: Richard Stallman
[…]
Post by Eli Zaretskii
As for showing the glyphs, visiting a file with large number of
characters runs a high risk of being an annoyance due to the
corresponding fonts being unavailable on the system. E. g., "C-h H",
which only shows a small part of those, takes 4 sec on my system with
an optimized build, and about 6 in a non-optimized build.
So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.
No objection on my part, but I’d rather provide the “buttons” to
move to the previous and next blocks in that same buffer.

OTOH, what would it take to improve the display time in such a
case? Unless I be mistaken, other (as in: mainstream; think of,
say, Firefox) software generally /does/ handle that case
reasonably well.
Post by Eli Zaretskii
Post by Richard Stallman
If you type RET on a character, it should visit
admin/unidata/UnicodeData.txt and move to the corresponding line.
I'm not sure showing UnicodeData.txt in its raw form will be useful.
Most people won't know how to interpret the attributes encoded there,
about the only understandable parts are the codepoint and the name.
What about NamesList.txt?
Post by Eli Zaretskii
And we already show this in human-readable form in "C-u C-x =", so we
could simply reuse the same code here.
The problem with C-u C-x = is that it describes a single
character a time, while it may be beneficial to see some
“related” (in either name or number) characters as well.
--
FSF associate member #7257 http://am-1.org/~ivan/ … 3013 B6A0 230E 334A
Eli Zaretskii
2015-05-05 16:42:33 UTC
Permalink
Date: Tue, 05 May 2015 16:20:50 +0000
Post by Eli Zaretskii
So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.
No objection on my part, but I’d rather provide the “buttons” to
move to the previous and next blocks in that same buffer.
That could be okay, too, but it cannot be instead of a directly going
to a block. Imagine going all the way to, say, the Aegean Numbers
block by clicking Next, Next, Next, ...
OTOH, what would it take to improve the display time in such a
case?
How can you improve it when fonts don't exist on the target machine?
Unless I be mistaken, other (as in: mainstream; think of,
say, Firefox) software generally /does/ handle that case
reasonably well.
I don't know anything about that, except that Emacs uses the same
libraries for accessing fonts. Unfortunately, we don't have on board
an active enough maintainer who is knowledgeable about font handling
(both in general and in Emacs). Feel free to fill the niche.
Post by Eli Zaretskii
Post by Richard Stallman
If you type RET on a character, it should visit
admin/unidata/UnicodeData.txt and move to the corresponding line.
I'm not sure showing UnicodeData.txt in its raw form will be useful.
Most people won't know how to interpret the attributes encoded there,
about the only understandable parts are the codepoint and the name.
What about NamesList.txt?
What do you mean? NamesList.txt contains a different information, and
once again at least part of it will not be easily understood, or even
useful to most people, I think.
Post by Eli Zaretskii
And we already show this in human-readable form in "C-u C-x =", so we
could simply reuse the same code here.
The problem with C-u C-x = is that it describes a single
character a time, while it may be beneficial to see some
“related” (in either name or number) characters as well.
Well, loops are available... But I very much doubt you'll be able to
display enough useful information in a single line that way.
Richard Stallman
2015-05-06 13:09:09 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Ivan Shmakov
Post by Eli Zaretskii
I'm not sure showing UnicodeData.txt in its raw form will be useful.
Most people won't know how to interpret the attributes encoded there,
about the only understandable parts are the codepoint and the name.
Even if the user understands only those two, the feature is useful
nonetheless.

Some slightly different feature might be better. I am not addressing those
details.
Post by Ivan Shmakov
What about NamesList.txt?
I don't see a file named NamesList.txt there.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-06 15:33:48 UTC
Permalink
Date: Wed, 06 May 2015 09:09:09 -0400
Post by Ivan Shmakov
Post by Eli Zaretskii
I'm not sure showing UnicodeData.txt in its raw form will be useful.
Most people won't know how to interpret the attributes encoded there,
about the only understandable parts are the codepoint and the name.
Even if the user understands only those two, the feature is useful
nonetheless.
Then perhaps we should show only the parts that are easily
understandable.
Post by Ivan Shmakov
What about NamesList.txt?
I don't see a file named NamesList.txt there.
It's part of the Unicode Standard, you can find it here:

http://unicode.org/Public/UNIDATA/NamesList.txt
Drew Adams
2015-05-09 00:03:53 UTC
Permalink
The discussion has gone in a few directions beyond `C-x 8 shorthands'.

I understand that Richard would like a help buffer that groups
multiple glyphs together in blocks or in categories of various kinds.

I don't have that to offer, but maybe this would help in a different
way: library `apu.el' provides apropos help for Unicode chars.

Command `apropos-unicode' shows you the Unicode chars that match
an apropos pattern you specify: a regexp or a space-separated list
of words. The chars whose names match are shown in a help buffer,
along with the names and code points (decimal and hex).

You can keep several such buffers open, for use with different
subsets of chars you are interested in.

In the help buffer, you can use these keys to act on the char
described on the current line:

* `RET' or `mouse-2' - see info about it (`C-u C-x =' output).
* `i' - google for more information about it.
* `^' - insert it at point in the buffer where you invoked
`apropos-unicode'.
* `c' - define a command to insert it that has the same name.
E.g. `greek-small-letter-phi'. (You need library
`ucs-cmds.el' for this.)
* `k' - globally bind a key to insert it.
* `l' - locally bind a key to insert it.
* `M-w' - copy it to the `kill-ring'.
* `M-y' - copy it to the secondary selection.

The library is here: http://www.emacswiki.org/emacs/download/apu.el.

TODO maybe:

* Pop-up a glyph enlargement (e.g., by mouseover or key).
* Be able to match code points too in the pattern.
* Be able to choose chars of a given syntax class or other group.
* Add a header line and use it to sort by different columns.
* Add an option of patterns to exclude from matches, to exclude
things like `TAG' and `VARIATION SELECTOR'.
* Be able to easily match a base char. You can do this OK now
using a regexp such as ` \(BASE-CHAR \|$\)', but maybe there
is a better way.

Is there a good way to exclude chars whose glyphs are essentially
(apparently) whitespace, e.g., `MUSICAL SYMBOL END TIE'?

Is there a way to exclude chars that cannot be shown in the current
font? (Asked previously.)
Eli Zaretskii
2015-05-09 08:22:09 UTC
Permalink
Date: Fri, 8 May 2015 17:03:53 -0700 (PDT)
I understand that Richard would like a help buffer that groups
multiple glyphs together in blocks or in categories of various kinds.
I don't have that to offer, but maybe this would help in a different
way: library `apu.el' provides apropos help for Unicode chars.
Command `apropos-unicode' shows you the Unicode chars that match
an apropos pattern you specify: a regexp or a space-separated list
of words. The chars whose names match are shown in a help buffer,
along with the names and code points (decimal and hex).
I hope I've succeeded to explain in my previous messages that just
matching the name against a regexp is not enough: you will most of the
time get a lot of candidates. IOW, it's not focused enough, and the
reason is that the name of a character doesn't tell enough about the
character to be able to filter them only based on their names.

What we need is selection of candidates based on the character
attributes, and their language/script/block. This could, of course,
use the completion/apropos infrastructure, but the completion
predicates must be smarter, and we should have a suitable UI for the
user to specify her partial knowledge of the characters she is after.

If you or someone else wants to work on this, I can provide advice as
to how to use Unicode character properties for such filtering.
* Add an option of patterns to exclude from matches, to exclude
things like `TAG' and `VARIATION SELECTOR'.
The UI cannot be in these technical terms, because the user will most
probably fail to understand what that means for the search results.
E.g., it's quite probable that someone who wants an emoji characters
_will_ want the VARIATION SELECTOR included, but how many users will
understand that excluding it will not allow them to specify emoji
style of certain characters?
* Be able to easily match a base char. You can do this OK now
using a regexp such as ` \(BASE-CHAR \|$\)', but maybe there
is a better way.
I suggested the Custom-style interface using widgets.
Is there a good way to exclude chars whose glyphs are essentially
(apparently) whitespace, e.g., `MUSICAL SYMBOL END TIE'?
I'm not sure "mostly whitespace" is a good specification for those. I
suppose someone who wants musical symbols will want this one as well.
Is there a way to exclude chars that cannot be shown in the current
font? (Asked previously.)
Answered previously.
Richard Stallman
2015-05-06 13:09:26 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
Post by Richard Stallman
Post by Eli Zaretskii
Would admin/unidata/UnicodeData.txt do?
It doesn't do the job, becuase it doesn't contain the characters
themselves.
You mean, the glyphs?
Yes, exactly.

(It does show the codepoint, so you can easily
Post by Eli Zaretskii
display the character via "C-x 8 RET".)
You mean, one character at a time?

I want to be able to scan quickly through the buffer looking at
lots of characters to find the one I want. If I have to type
a command for _each character_, just to see it, that is useless
for the purpose.

C-x 8 RET is even worse than that, because it requires
_copying_ the name of the character. To actually see the character
point is on requires
M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC

I could make that a keyboard macro and repeat it many times
to get all these codes into the buffer. It would take a long time.
Furthermore, it would show only one character per line,
so few characters would appear on the screen at any time.
To look at them all would require lots of scrolling.

To do this job well requires output like that of the short Lisp
program someone sent, showing only characters and NOT the names,
with many characters per line.

The buffer shoulod be divided into stanzas, each one labeled with the
name of its script or portion thereof.
Post by Eli Zaretskii
As for showing the glyphs, visiting a file with large number of
characters runs a high risk of being an annoyance due to the
corresponding fonts being unavailable on the system.
We could set up a way to test whether a code point can be
displayed, and skip scripts that can't be displayed.

So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.

It is inconvenient to expect users to know the codepoint values.
Suppose I want to see Greek letters -- I have no idea what codepoints
those are, and I should not need to know them in order to specify
"Greek letters".

To specify a script by name as an argument would be ok,
but not very convenient. Here's a simpler and more convenient interface:

The header line for each script could have a [hide] or [show] button
to select visibility of that script. Initially they could all be
hidden, and the user would expose those that she is interested in.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-06 16:27:44 UTC
Permalink
Date: Wed, 06 May 2015 09:09:26 -0400
Post by Eli Zaretskii
Post by Richard Stallman
Post by Eli Zaretskii
Would admin/unidata/UnicodeData.txt do?
It doesn't do the job, becuase it doesn't contain the characters
themselves.
You mean, the glyphs?
Yes, exactly.
(It does show the codepoint, so you can easily
Post by Eli Zaretskii
display the character via "C-x 8 RET".)
You mean, one character at a time?
I want to be able to scan quickly through the buffer looking at
lots of characters to find the one I want. If I have to type
a command for _each character_, just to see it, that is useless
for the purpose.
Maybe I don't understand the use case you have in mind. I thought the
use case was that you already know the character's name, at least
approximately, and want to look up its code, to type is faster.
C-x 8 RET is even worse than that, because it requires
_copying_ the name of the character. To actually see the character
point is on requires
M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
"C-x 8 RET" accepts the codepoint in hex, so if you are already
looking at the line that defines the character, all you need is to
type a 4-, sometimes 5-hex-digit number.

And if you want to type the name, "C-x 8 RET" provides completion, so
no need for such a complicated dance for copying the name.
I could make that a keyboard macro and repeat it many times
to get all these codes into the buffer. It would take a long time.
Furthermore, it would show only one character per line,
so few characters would appear on the screen at any time.
To look at them all would require lots of scrolling.
I don't really see how looking for a character with your eyes could be
a convenient feature, except in very corner situations with a small
number of simply-looking characters. Even for Latin characters, there
are many similar shapes, like Ả and Ă or Ő and Ố, and they are spread
all over the Unicode range. How would you go about finding your
character, if all you have is some vague idea of its shape (which,
btw, could look quite different with different fonts)? Sounds like a
very inefficient way to me.

I think we must assume the user has some idea about the character:
either its approximate name, or at least the block or script to which
it belongs. Then we could display some reasonably manageable subset
of characters. We could further help by asking about the base
character (the above examples have either A or O as their base
character), because if the user knows that, with some scripts the
number of potential candidates will go down drastically. But even
when the base character is known, the number of candidates is not
negligible: e.g., there are 46 characters in the Unicode database that
are somehow related to A.
The buffer shoulod be divided into stanzas, each one labeled with the
name of its script or portion thereof.
Not sure what you mean by "script" here. Emacs currently knows about
almost 100 scripts defined by Unicode, so even displaying a couple of
lines for each one will make a large buffer. Isn't it better to allow
the user to specify one, with completion?
Post by Eli Zaretskii
As for showing the glyphs, visiting a file with large number of
characters runs a high risk of being an annoyance due to the
corresponding fonts being unavailable on the system.
We could set up a way to test whether a code point can be
displayed, and skip scripts that can't be displayed.
Alas, we don't know which cannot be displayed until we've tried and
failed.
So if we provide such a command, IMO we should prompt for a block of
codepoints, and display only that block.
It is inconvenient to expect users to know the codepoint values.
Unicode blocks have names, so providing completion for them would do
the job, I think. The entire Unicode codespace is divided into about
200 blocks, so if the user knows, or can guess the one she needs, that
will probably limit the search for the character to some reasonable
quantity.

Moreover, some scripts share the same blocks, and vice versa. So
being able to specify just scripts or just blocks is not enough; we
need both.

I think we need all these methods, possibly more, because you may not
necessarily know or guess easily where to look. For example, there
are certain characters that appear as mathematical symbols in addition
to their "normal" places, so unless the user already knows in which
block to look, they will find the "base character" method very useful,
and without it could very well miss their character.
Suppose I want to see Greek letters -- I have no idea what codepoints
those are, and I should not need to know them in order to specify
"Greek letters".
You'd only need to know "Greek", and all the Greek blocks will be
displayed. If you happen to know more, like "Greek Extended", it will
further limit the number of characters to view. And, of course, there
are complications: you might think it's a Greek character, but it
could really be a math symbol or a Cyrillic character instead.
The header line for each script could have a [hide] or [show] button
to select visibility of that script. Initially they could all be
hidden, and the user would expose those that she is interested in.
A 100-button buffer is not very convenient, especially when you have
only an approximate idea about the script you are after (e.g., is that
funny shape part of "Miscellaneous Technical" block or "Geometric
Shapes"?)
Richard Stallman
2015-05-07 22:22:25 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
Maybe I don't understand the use case you have in mind. I thought the
use case was that you already know the character's name, at least
approximately, and want to look up its code, to type is faster.
I know what the character looks like. It is NOT easy to guess
what the name would be. There are many possibilities.
Post by Eli Zaretskii
Post by Richard Stallman
C-x 8 RET is even worse than that, because it requires
_copying_ the name of the character. To actually see the character
point is on requires
M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
"C-x 8 RET" accepts the codepoint in hex, so if you are already
looking at the line that defines the character, all you need is to
type a 4-, sometimes 5-hex-digit number.
And if you want to type the name, "C-x 8 RET" provides completion, so
no need for such a complicated dance for copying the name.
Are you kidding? Just to see 32 characters' glyphs
I'd have to type 128 input characters.

The feature I want would show 32 glyphs on each line,
and many lines would fit on the screen at once.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-08 05:48:24 UTC
Permalink
Date: Thu, 07 May 2015 18:22:25 -0400
Post by Eli Zaretskii
Maybe I don't understand the use case you have in mind. I thought the
use case was that you already know the character's name, at least
approximately, and want to look up its code, to type is faster.
I know what the character looks like. It is NOT easy to guess
what the name would be. There are many possibilities.
If that's the use case (I don't think you described it before), then
we indeed need a convenient facility to browse character glyphs. But
that facility should allow to specify additional information, such as
the script name, or block name, or the base character, otherwise you
are likely to give up due to the sheer number of characters to view.
Post by Eli Zaretskii
Post by Richard Stallman
C-x 8 RET is even worse than that, because it requires
_copying_ the name of the character. To actually see the character
point is on requires
M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC
"C-x 8 RET" accepts the codepoint in hex, so if you are already
looking at the line that defines the character, all you need is to
type a 4-, sometimes 5-hex-digit number.
And if you want to type the name, "C-x 8 RET" provides completion, so
no need for such a complicated dance for copying the name.
Are you kidding? Just to see 32 characters' glyphs
I'd have to type 128 input characters.
No, you need to type much less. A codepoint, if you know it, is at
most 5 characters, and for name completion, typing something like

C-x 8 RET greek <TAB> <TAB>

(all in all 10 characters) will have the completions buffer pop up.
Each completion candidate has the character glyph displayed right next
to it, so you could use that for finding the one you are looking for.
Richard Stallman
2015-05-08 18:46:58 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
Post by Richard Stallman
Are you kidding? Just to see 32 characters' glyphs
I'd have to type 128 input characters.
No, you need to type much less. A codepoint, if you know it, is at
most 5 characters,
I miscalculated. C-x 8 RET codepoint RET is 8 characters (or 9).
Thus, to see 32 characters' glyphs that way, I'd need to type
between 256 and 288 input characters.
Post by Eli Zaretskii
and for name completion, typing something like
C-x 8 RET greek <TAB> <TAB>
That is a lot less input than the other method, and is sort of usable,
but inconvenient. I tried it in that very case.

It includes Coptic characters as well as Greek; I don't know why.
It also includes many punctuation characters, and letters with diacritics,
that are in a different part of Unicode, and are not normal Greek letters.

If I could see the glyphs of the area of Unicode which alpha is in, I could
easily see the character I want.

And when I want to enter some non-ASCII punctuator, if I could see
the glyphs of that part of Unicode, it would be easy.
I don't want to have to remember their official names.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-09 07:44:30 UTC
Permalink
Date: Fri, 08 May 2015 14:46:58 -0400
Post by Eli Zaretskii
Post by Richard Stallman
Are you kidding? Just to see 32 characters' glyphs
I'd have to type 128 input characters.
No, you need to type much less. A codepoint, if you know it, is at
most 5 characters,
I miscalculated. C-x 8 RET codepoint RET is 8 characters (or 9).
Thus, to see 32 characters' glyphs that way, I'd need to type
between 256 and 288 input characters.
If you are not looking for a single specific character by its
codepoint, then typing the codepoint makes no sense.
Post by Eli Zaretskii
and for name completion, typing something like
C-x 8 RET greek <TAB> <TAB>
That is a lot less input than the other method, and is sort of usable,
but inconvenient. I tried it in that very case.
It includes Coptic characters as well as Greek; I don't know why.
I don't know either. If I type TAB after just "greek", then I see no
Coptic characters in completion candidates. What did you type before
asking for completion?
It also includes many punctuation characters, and letters with
diacritics, that are in a different part of Unicode, and are not
normal Greek letters.
This is simple Emacs completion at work: it brings you every character
whose name begins with "GREEK".

In any case, when I complete on "greek", I see only punctuation and
diacriticals from the same block as alpha, so I don't think we show
irrelevant punctuation. We do show some ancient characters from other
Greek blocks than the one where alpha lives, but they are not
punctuation.

As for letters with diacriticals, how would Emacs know that you don't
need those? I think the use case where the user looks for characters
with diacriticals is much more plausible than when she looks for some
simple character like alpha. But if we think that looking for
characters "with diacriticals" or "without diacriticals" is an
important use case, we could provide that as well, based on the
'decomposition' property of the characters.
If I could see the glyphs of the area of Unicode which alpha is in, I could
easily see the character I want.
If you only want letters, you can give a more accurate spec to
completion: "C-x 8 RET greek*letter <TAB> <TAB>". (The asterisk is a
wildcard character.) That still produces quite a long list, but no
symbols, punctuation, or lone diacriticals.

Alternatively, you'd need to know the Unicode block in which those
characters live, or find it by completing on block names. (This
block's name is "Greek and Coptic".)
And when I want to enter some non-ASCII punctuator, if I could see
the glyphs of that part of Unicode, it would be easy.
I don't want to have to remember their official names.
Only a small part of (language- and script-agnostic) punctuation
characters have their own block. The language-specific punctuation is
in the same block as their main characters.

We could have a feature which would display punctuation characters,
either specific to a language/script or not. Such a feature would
need to use [:punct:] regexp (we'd need to extend [:punct:] to use
Unicode character properties). Similarly, using [:alpha:] would bring
only letters.

I hope you now agree that the use case of searching for a character
with only some vague idea about its appearance and/or name needs some
pretty sophisticated (and overlapping) capabilities for allowing the
user to specify what she knows, before showing the possible
candidates. I'm not really sure what would be a good UI for such
specifications; perhaps something using the widget library a-la
Customize, where you can check or uncheck certain options and specify
values for non-boolean fields.
Richard Stallman
2015-05-09 14:17:15 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
Post by Richard Stallman
That is a lot less input than the other method, and is sort of usable,
but inconvenient. I tried it in that very case.
It includes Coptic characters as well as Greek; I don't know why.
I don't know either. If I type TAB after just "greek", then I see no
Coptic characters in completion candidates. What did you type before
asking for completion?
I typed C-x 8 RET greek TAB TAB.

All the NAMES that appear start with "Greek", but when I inserted
GREEK CAPITAL LETTER HORI and examined it with C-u C-x =,
it said

name: COPTIC CAPITAL LETTER HORI
old-name: GREEK CAPITAL LETTER HORI

I didn't notice the old-name field the previous time. I suppose that
explains why it was included in that completion table. Anyway that
completion list is over 440 lines long, and not very useful.
Post by Eli Zaretskii
Post by Richard Stallman
It also includes many punctuation characters, and letters with
diacritics, that are in a different part of Unicode, and are not
normal Greek letters.
This is simple Emacs completion at work: it brings you every character
whose name begins with "GREEK".
Do you think I don't know that?

_Why_ it does what it does is not the issue. The only pertinent point
is that that it isn't a convenient way to do what I want to do.
Post by Eli Zaretskii
As for letters with diacriticals, how would Emacs know that you don't
need those?
That question is spurious. Remember, I don't want to enter a
character name at all. I want to see all the glyphs.

Someone else suggested that C-x 8 RET might be a convenient alternate
method. I am explaining why it isn't.

If I had the feature I want, I would see the segment including the
usual Greek letters, and the far more numerous diacriticalized ones
would not be there (because they come later in Unicode).
Post by Eli Zaretskii
If you only want letters, you can give a more accurate spec to
completion: "C-x 8 RET greek*letter <TAB> <TAB>". (The asterisk is a
wildcard character.) That still produces quite a long list,
Indeed, it is still inconvenient.
Post by Eli Zaretskii
I hope you now agree that the use case of searching for a character
with only some vague idea about its appearance and/or name needs some
pretty sophisticated (and overlapping) capabilities for allowing the
user to specify what she knows, before showing the possible
candidates.
We seem to be totally miscommunicating. I DON'T WANT to search for
them by name. I never asked for that.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-09 14:36:39 UTC
Permalink
Date: Sat, 09 May 2015 10:17:15 -0400
I typed C-x 8 RET greek TAB TAB.
All the NAMES that appear start with "Greek", but when I inserted
GREEK CAPITAL LETTER HORI and examined it with C-u C-x =,
it said
name: COPTIC CAPITAL LETTER HORI
old-name: GREEK CAPITAL LETTER HORI
I didn't notice the old-name field the previous time. I suppose that
explains why it was included in that completion table.
Yes. Greek and Coptic characters share the same Unicode block.
Post by Eli Zaretskii
Post by Richard Stallman
It also includes many punctuation characters, and letters with
diacritics, that are in a different part of Unicode, and are not
normal Greek letters.
This is simple Emacs completion at work: it brings you every character
whose name begins with "GREEK".
Do you think I don't know that?
Do you think I don't know you know?

You asked me some questions that you should be sure I knew also, and
yet I didn't react like that.

I find your attitude in this thread unnecessarily offensive.
Post by Eli Zaretskii
I hope you now agree that the use case of searching for a character
with only some vague idea about its appearance and/or name needs some
pretty sophisticated (and overlapping) capabilities for allowing the
user to specify what she knows, before showing the possible
candidates.
We seem to be totally miscommunicating. I DON'T WANT to search for
them by name. I never asked for that.
Where did I mentioned search by name? I didn't, because I really
don't think it's convenient enough. It's what we have now, but it is
not what I think should be the method of looking up an unknown
character.

But your idea of showing dozens or hundreds of characters isn't
workable, either.

Like I wrote elsewhere, we need a way for the user to specify what she
knows, and then show the characters that match the spec. The
specification could include one or more of the following:

. Script name
. Language name
. Unicode block name
. Character class (alphabetical, numerical, punctuation, etc.)
. Base character
. With/without diacriticals
Richard Stallman
2015-05-08 18:46:56 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
If that's the use case (I don't think you described it before), then
we indeed need a convenient facility to browse character glyphs. But
that facility should allow to specify additional information, such as
the script name, or block name, or the base character, otherwise you
are likely to give up due to the sheer number of characters to view.
I agree that those additional features would make it better.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Drew Adams
2015-05-08 20:18:40 UTC
Permalink
Post by Eli Zaretskii
Post by Richard Stallman
We could set up a way to test whether a code point can be
displayed, and skip scripts that can't be displayed.
Alas, we don't know which cannot be displayed until we've tried and
failed.
Where is this try-and-fail done? Is it only in C code, or is
there some Lisp function (predicate) that you can call to tell
you whether a given char can be displayed in a given (e.g. the
current) font.

Even if such a predicate would need to try displaying, to find
out whether it is possible, this could be useful.

It would be good if we could, for example, optionally show only
chars that the current font can display.
Eli Zaretskii
2015-05-09 07:59:36 UTC
Permalink
Date: Fri, 8 May 2015 13:18:40 -0700 (PDT)
Post by Eli Zaretskii
Post by Richard Stallman
We could set up a way to test whether a code point can be
displayed, and skip scripts that can't be displayed.
Alas, we don't know which cannot be displayed until we've tried and
failed.
Where is this try-and-fail done? Is it only in C code, or is
there some Lisp function (predicate) that you can call to tell
you whether a given char can be displayed in a given (e.g. the
current) font.
These two are not alternatives, they can (and do) live together.

The search for a suitable font is mostly in C, but we do have a
capability to test from Lisp whether a given character can be
displayed: 'char-displayable-p'. If you are interested in a specific
font, you can use 'font-get-glyphs' for a similar info.
Paul Eggert
2015-05-04 18:40:25 UTC
Permalink
Post by Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
How about also adding s, t, S, T with cedilla, dotless i, and I with dot.
Also c and C with a hacek.
Sure, I can look into that. Also the slashed L and l, perhaps, so that
we can spell names like Łukasiewicz. If we want to be more ambitious,
we could support the Latin letters in any ISO 8859 variant, which would
include the following additions (this includes all the letters you
mentioned):

ă Ă ą Ą ā Ā ḃ Ḃ ć Ć ĉ Ĉ č Č ċ Ċ ď Ď ḋ Ḋ đ Đ ě Ě ė Ė ę Ę ē Ē ḟ Ḟ ğ Ğ ĝ Ĝ
ġ Ġ ģ Ģ ĥ Ĥ ħ Ħ ĩ Ĩ į Į ī Ī ı İ ĵ Ĵ ķ Ķ ĺ Ĺ ľ Ľ ł Ł ļ Ļ ṁ Ṁ ń Ń ň Ň ņ Ņ
ŋ Ŋ ő Ő ō Ō ṗ Ṗ ĸ ŕ Ŕ ř Ř ŗ Ŗ ś Ś ŝ Ŝ ṡ Ṡ ş Ş ť Ť ṫ Ṫ ŧ Ŧ ţ Ţ ŭ Ŭ ů Ů ű
Ű ũ Ũ ų Ų ū Ū ẃ Ẃ ẁ Ẁ ŵ Ŵ ẅ Ẅ ỳ Ỳ ŷ Ŷ ź Ź ż Ż

It may be difficult to fit all these into the existing C-x 8 space, though.
Post by Richard Stallman
C-x 8 C-h is a good way of seeing what all the options are.
It may be worth documenting.
It is documented in the manual now.
Post by Richard Stallman
It would be nice to have C-u C-x = show the specific C-x 8 sequence
for a character, if there is one.
Yes, that'd be nice to add.
Post by Richard Stallman
it would be good to have a file that consists of all of
unicode in numeric order. That would provide an easy way to pick some
unicode character (whose code you don't remember) and copying it into
some text.
Although Eli mentioned that we already have such a file, it isn't
installed. Perhaps we could install it in the etc directory (next to
AUTHORS, CONTRIBUTE, etc.) and then have 'C-h u' visit it.
Paul Eggert
2015-05-05 06:03:19 UTC
Permalink
Post by Richard Stallman
How about also adding s, t, S, T with cedilla, dotless i, and I with dot.
Also c and C with a hacek.
Sure, I can look into that. Also the slashed L and l, perhaps, so that we can
spell names like Łukasiewicz.
Attached is a revised patch that adds support for the abovementioned characters,
plus other Latin characters that might be encountered by people mentioning
foreign names. It makes room by rejiggering three of the less-commonly used
entries in the C-x 8 table.
Ivan Shmakov
2015-05-06 22:20:54 UTC
Permalink
Post by Paul Eggert
Post by Paul Eggert
Post by Richard Stallman
How about also adding s, t, S, T with cedilla, dotless i, and I
with dot. Also c and C with a hacek.
Sure, I can look into that. Also the slashed L and l, perhaps, so
that we can spell names like Łukasiewicz.
Attached is a revised patch that adds support for the abovementioned
characters, plus other Latin characters that might be encountered by
people mentioning foreign names. It makes room by rejiggering three
of the less-commonly used entries in the C-x 8 table.
--------------090904020002020306060104
Content-Type: text/x-patch;
name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus
does no decoding, and Emacs shows the contents with the likes of
\304\260.
Post by Paul Eggert
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment;
filename="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
From aafde36c45bd0341b07707409873fb93cbbb33f1 Mon Sep 17 00:00:00 2001
Date: Mon, 4 May 2015 22:41:20 -0700
Subject: [PATCH] C-x 8 shorthands for curved quotes, Euro, etc.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
I presume that /this/ was intended to be the MIME part /header/,
yet it ended up being in the part /body./
Post by Paul Eggert
+ withdrawn still works character
+ C-x 8 . C-x 8 . SPC · U+00B7 MIDDLE DOT
+ C-x 8 = C-x 8 = SPC ¯ U+00AF SPACING MACRON
+ C-x 8 u C-x 8 m µ U+00B5 MICRO SIGN
I believe that both C-x 8 . and C-x 8 u are too convenient to be
dropped without more discussion. For one thing, · seems more
“common” a character than İ. Other than that, C-x 8 . . feels
easier to type than C-x 8 SPC.
Post by Paul Eggert
-;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
+;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
I guess we may safely state “ISO 10646” here.
Post by Paul Eggert
+;; This package supports all characters defined by ISO 8859-1,
+;; along with many other Latin characters and a few other characters
+;; commonly used in English and basic math.

 And may also mention it here.
Post by Paul Eggert
("-" . [?­])
- ("*." . [?·])
The removal above doesn’t seem to be strictly necessary. The
same for the *= and *u ones.
Post by Paul Eggert
("~~" . [?¬])
+ ("=A" . [?Ā])
+ ("=a" . [?ā])
+ ("uA" . [?Ă])
+ ("ua" . [?ă])
+ ("gA" . [?Ą])

 Also, did you consider generating this list automatically,
based on the codepoint properties already known to Emacs?
Something along the lines of the function MIMEd, which readily
produces a list of entries for the following 133 characters.
(Three spaces added for symmetry purposes.)

À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
à á â ã À Ú é ê ë ì í î ï ñ ò ó Î õ ö ù ú û Ì Ü
ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ä€ Ä¥ Äš Ä© Ī Ä« ÄŽ ĵ Ĺ ĺ
Äœ ÄŸ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Å  Å¡ Å€ Å¥ Åš Å© Ū Å« ÅŽ ŵ Ŷ Å·
Åž Ź ź Åœ ÅŸ Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ ÇŠ ǧ Çš Ç© Ç° ÇŽ ǵ Çž ǹ Ș ș Ț ț
Ȟ ȟ Ȳ ȳ
--
FSF associate member #7257 http://am-1.org/~ivan/ 
 3013 B6A0 230E 334A
Eli Zaretskii
2015-05-07 04:05:29 UTC
Permalink
Date: Wed, 06 May 2015 22:20:54 +0000
Post by Paul Eggert
-;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
+;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
I guess we may safely state “ISO 10646” here.
Actually, we should drop the "ISO" part completely. Characters don't
belong to any encoding, they are entities that exists independently of
any encoding.
Ivan Shmakov
2015-05-07 07:14:34 UTC
Permalink
Post by Eli Zaretskii
Post by Paul Eggert
From: Ivan Shmakov Date: Wed, 06 May 2015 22:20:54 +0000
-;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
+;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
I guess we may safely state “ISO 10646” here.
Actually, we should drop the "ISO" part completely. Characters don't
belong to any encoding, they are entities that exists independently
of any encoding.
ISO 10646 is also a /repertoire/ of characters; so unless
'iso-transl is going to get support for characters outside this
particular set, the above will still be justified. Albeit
mildly redundant, I guess.
--
FSF associate member #7257 np. Computer Eyes — Ayreon … 3013 B6A0 230E 334A
Eli Zaretskii
2015-05-07 14:33:26 UTC
Permalink
Date: Thu, 07 May 2015 07:14:34 +0000
Post by Eli Zaretskii
Actually, we should drop the "ISO" part completely. Characters don't
belong to any encoding, they are entities that exists independently
of any encoding.
ISO 10646 is also a /repertoire/ of characters; so unless
'iso-transl is going to get support for characters outside this
particular set, the above will still be justified. Albeit
mildly redundant, I guess.
We are splitting hair. But as long as we do, I see no reason to
promise or assume that iso-transl will always support only Unicode
codepoints; e.g., "C-x 8 RET" already supports more.

So I'd rather we dropped that reference entirely.
Paul Eggert
2015-05-07 07:53:32 UTC
Permalink
Post by Ivan Shmakov
I believe that both C-x 8 . and C-x 8 u are too convenient to be
dropped without more discussion. For one thing, · seems more
“common” a character than İ.
In Turkish and Azerbaijani the reverse is true. And since RMS requested dotted
I and dotless i my assumption was that Turkish is of some importance. Dotted
sequences are the natural ways to type these characters as well as other dotted
letters ĊċĖėĠġĿŀŻŌ in the proposal (used variously in Lithuanian, Maltese, and
Polish), so there is a pretty strong case to usurp "C-x 8 .".

The case for usurping "C-x 8 u" is even stronger, since it's equivalent to the
equally-short "C-x 8 m", some easily-typed symbol is needed to denote breve, and
"u" looks more like breve than any other ASCII character does.
Post by Ivan Shmakov
Other than that, C-x 8 . . feels
easier to type than C-x 8 SPC.
Good point, and I've done this in the attached patch.
Post by Ivan Shmakov
Post by Paul Eggert
-;;; iso-transl.el --- keyboard input definitions for ISO 8859-1 -*- coding: utf-8 -*-
+;;; iso-transl.el --- keyboard input for ISO characters -*- coding: utf-8 -*-
I guess we may safely state “ISO 10646” here.
Thanks, done in the attached patch.
Post by Ivan Shmakov
Post by Paul Eggert
+;; This package supports all characters defined by ISO 8859-1,
+;; along with many other Latin characters and a few other characters
+;; commonly used in English and basic math.

 And may also mention it here.
Thanks, also done.
Post by Ivan Shmakov
Post by Paul Eggert
("-" . [?­])
- ("*." . [?·])
The removal above doesn’t seem to be strictly necessary. The
same for the *= and *u ones.
Thanks, fixed in the attached patch.
Post by Ivan Shmakov

 Also, did you consider generating this list automatically,
based on the codepoint properties already known to Emacs?
Something along the lines of the function MIMEd, which readily
produces a list of entries for the following 133 characters.
(Three spaces added for symmetry purposes.)
À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
à á â ã À Ú é ê ë ì í î ï ñ ò ó Î õ ö ù ú û Ì Ü
ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ä€ Ä¥ Äš Ä© Ī Ä« ÄŽ ĵ Ĺ ĺ
Äœ ÄŸ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Å  Å¡ Å€ Å¥ Åš Å© Ū Å« ÅŽ ŵ Ŷ Å·
Åž Ź ź Åœ ÅŸ Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ ÇŠ ǧ Çš Ç© Ç° ÇŽ ǵ Çž ǹ Ș ș Ț ț
Ȟ ȟ Ȳ ȳ
Sorry, I don't really follow the code that you attached. Although I suppose it
comes from a decomposition table, I don't know what the table was designed for,
and it's not clear to me how it's relevant. Anyway, most of those letters are
either in iso-transl.el now, or are in the previously proposed patch. Here are
the exceptional (i.e., missing even in the previously proposed patch) letters,
Post by Ivan Shmakov
Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Çž ǹ
These are for toned Pinyin but this list is incomplete. If we wanted to cover
toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ. Coming up with two-character
abbreviations for all these might be tricky. Most Pinyin usage omits the tones.
Post by Ivan Shmakov
NJ ǧ ǚ ǩ
These are Skolt Sami but this list is also incomplete; we'd also need Ʒ ǀ ǥ Ǯ ǯ
ʒ at least.
Post by Ivan Shmakov
Ç°
What language uses this? I couldn't find one.
Post by Ivan Shmakov
ǎ ǵ
Good catch. These are used for transliteration from Serbian and Macedonian. We
should also include áž° áž± as they are also needed. Included in the attached patch.
Post by Ivan Shmakov
Ȟ ȟ
Used in Finnish Kalo, which is quite obscure.
Post by Ivan Shmakov
Ȳ ȳ
Used in Livonian, but for that we'd also need a whole bunch of other letters,
including Ǟ ǟ ថ ᾑ Ȫ È« Ȭ È­ È® ȯ È° and I've probably omitted some. Plus, modern
Livonian doesn't seem to be using Ȳ ȳ any more....

Anyway, part of what's going on here is that the proposed list doesn't cover
every Latin character in the ISO 10646 repertoire (that'd be a large set), but
instead is limited to what appear to be reasonably commonly letters. Admittedly
this is not universal but one must cut things off somewhere, and it would be odd
to add only partial coverage for toned Pinyin, Livonian, etc.
Post by Ivan Shmakov
Post by Paul Eggert
--------------090904020002020306060104
Content-Type: text/x-patch;
name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus
does no decoding, and Emacs shows the contents with the likes of
\304\260.
Hmm, it works for me. I use Thunderbird to read the top level message, and it
spins off an Emacs to display the attachment with no problem. The web-site
archive at <http://bugs.gnu.org/20499#60> also works for me with Firefox.

It's common for people to send the output of "git send-email" as attachments; if
this doesn't work with Gnus I suppose a Gnus user (i.e. not me :-) should file a
bug report. I looked around the net and found other Gnus users with similar
problems and some code that worked for them; please see
<http://bewatermyfriend.org/p/2011/00a/> and/or
<http://blog.printf.net/articles/tag/emacs/>. But this stuff appeared to be
several years old and this leads me to hope that maybe recent-enough Gnus
versions will do the right thing already.
Ivan Shmakov
2015-05-07 10:00:38 UTC
Permalink
[…]
Post by Paul Eggert
… Also, did you consider generating this list automatically, based
on the codepoint properties already known to Emacs? Something along
the lines of the function MIMEd, which readily produces a list of
entries for the following 133 characters. (Three spaces added for
symmetry purposes.)
À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
Ÿ Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
Ȟ ȟ Ȳ ȳ
Sorry, I don't really follow the code that you attached.
Which part, specifically?

It just iterates over the range given (or U+00A8 through U+02AF
by default) and maps “LATIN + COMBINING” decompositions to
'iso-transl entries. For example, it maps the (?g #x327)
decomposition (U+0327 being COMBINING CEDILLA) for U+0123 into
an (",g" . ģ) entry.

Or, rather, it /should/, for my code has an obvious typo:

(`(,c #x30c) (string ?v c))
(`(,c #x326) (string 59 c))
- (`(,c #x326) (string ?, c)))))
+ (`(,c #x327) (string ?, c)))))

Other possible additions (assuming we’ll agree on C-x 8 u,
C-x 8 .) are:

(`(,c #x304) (string ?= c))
+ (`(,c #x306) (string ?u c))
+ (`(,c #x307) (string ?. c))
(`(,c #x308) (string 34 c))
+ (`(,c #x30b) (string ?2 c))
(`(,c #x30c) (string ?v c))
Post by Paul Eggert
Although I suppose it comes from a decomposition table, I don't know
what the table was designed for, and it's not clear to me how it's
relevant.
I hope someone more knowledgeable could comment on this. Still,
this (ab)use of the data seem to work well in practice.
Post by Paul Eggert
Anyway, most of those letters are either in iso-transl.el now,
The point is to /remove/ them from 'iso-transl, as these entries
duplicate, in a way, a part of the decomposition table already
present in Emacs.

[…]
Post by Paul Eggert
Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǹ ǹ
These are for toned Pinyin but this list is incomplete. If we wanted
to cover toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ. Coming up
with two-character abbreviations for all these might be tricky.
But are we actually limited to two-character abbreviations only?
Why not allow for, say, C-x 8 " ' u?

[…]
Post by Paul Eggert
ǰ
What language uses this? I couldn't find one.
To quote NamesList.txt:

01F0 LATIN SMALL LETTER J WITH CARON
* IPA and many languages
Post by Paul Eggert
Ǵ ǵ
Good catch. These are used for transliteration from Serbian and
Macedonian. We should also include Ḱ ḱ as they are also needed.
Included in the attached patch.
The code I’ve suggested could be used to scan the U+1Exx range
just as well, thus resulting in the following set.

Ḑ ḑ Ḡ ḡ Ḧ ḧ Ḩ ḩ Ḱ ḱ Ḿ ḿ Ṕ ṕ Ṽ ṽ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẍ ẍ Ẑ ẑ ẗ Ẽ ẽ Ỳ ỳ Ỹ ỹ

[…]
Post by Paul Eggert
Anyway, part of what's going on here is that the proposed list
doesn't cover every Latin character in the ISO 10646 repertoire
(that'd be a large set), but instead is limited to what appear to be
reasonably commonly letters. Admittedly this is not universal but
one must cut things off somewhere, and it would be odd to add only
partial coverage for toned Pinyin, Livonian, etc.
When it comes to the LATIN … LETTER WITH … letters, my proposal
for such a cut off would be to satisfy /both/ of the following
criteria:

• only cover specific Unicode ranges; such as, for instance,
U+00A8 through U+02AF, U+1E00 … U+1EFF, perhaps 2C60 … 2C7F;

• only cover the letters which can be represented with a
sufficiently general C-x 8 ⟨diacritic⟩+ ⟨ASCII-latin⟩ pattern.

Other characters deemed common may be added to the list.
Post by Paul Eggert
Post by Paul Eggert
--------------090904020002020306060104
Content-Type: text/x-patch;
name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus does
no decoding, and Emacs shows the contents with the likes of
\304\260.
Hmm, it works for me. I use Thunderbird to read the top level
message, and it spins off an Emacs to display the attachment with no
problem.
I can “spin off” cat(1) to read the offending MIME part, too:
Emacs will feed it raw-text, and interpret the result as UTF-8
(the default.)

It still does /not/ comply with the MIME specification.
Consider section 4.1.2 of RFC 2046:

RFC> […] The default character set, which must be assumed in the
RFC> absence of a charset parameter, is US-ASCII.

RFC 6657 updates this as follows:

RFC> Each subtype of the "text" media type that uses the "charset"
RFC> parameter can define its own default value for the "charset"
RFC> parameter, including the absence of any default.

However, given that ‘text/x-patch’ is not a /registered/ MIME
type, I believe the above does not apply.
Post by Paul Eggert
The web-site archive at <http://bugs.gnu.org/20499#60> also works for
me with Firefox.
It's common for people to send the output of "git send-email" as
attachments;
If Thunderbird /knows/ the encoding (“character set”) of the
contents of the MIME part, it /should/ specify it in the MIME
part header. If the said contents is strictly 7-bit, it /could/
omit that (given that it’s more than likely to be US-ASCII.)
Otherwise, I guess Thunderbird should either ask the user for
the encoding /or/ send the part as application/octet-stream.

[…]
--
FSF associate member #7257 np. Satellite one — Purple Motion B6A0 230E 334A
Eli Zaretskii
2015-05-07 14:44:25 UTC
Permalink
Date: Thu, 07 May 2015 10:00:38 +0000
Post by Paul Eggert
Although I suppose it comes from a decomposition table, I don't know
what the table was designed for, and it's not clear to me how it's
relevant.
I hope someone more knowledgeable could comment on this.
I'm not sure I'm your man, or what needs to be commented on, but I
will try nonetheless ;-)

The 'decomposition property of a character (as every other property
accessed by get-char-code-property) comes directly from Unicode
database. In this case, you will see that some characters in
UnicodeData.txt have this part non-empty:

1E99;LATIN SMALL LETTER Y WITH RING ABOVE;Ll;0;L;0079 030A;;;;N;;;;;
^^^^^^^^^
This gives the so-called "canonical decomposition" of the character;
in this case, we are told that U+1E99's decomposition is a sequence of
U+0079 (lower-case y) followed by U+030A (combining ring above).

Some characters have "compatibility decompositions" instead, like
this:

1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 02BE;;;;N;;;;;
^^^^^^^^^^^^^^^^^^
which is useful for collation-driven sorting and for loose comparisons
a-la string-collate-lessp.

For more details about this, see http://unicode.org/reports/tr44/, the
Unicode Technical Report that describes the Unicode Character
Database.
Stefan Monnier
2015-05-07 17:03:33 UTC
Permalink
Post by Paul Eggert
… Also, did you consider generating this list automatically,
based on the codepoint properties already known to Emacs?
[...]
Post by Paul Eggert
Sorry, I don't really follow the code that you attached. Although I suppose
it comes from a decomposition table, I don't know what the table was
designed for, and it's not clear to me how it's relevant. Anyway, most of
I'm not sure exactly what he wanted to say, but it sounds to me like
it's going in the same direction as my earlier request to replace the
hard-coded table by code that auto-generates the cases.
There is already similar code in latin-ltx.el (written by yours truly).


Stefan
Paul Eggert
2015-05-11 00:51:50 UTC
Permalink
Post by Stefan Monnier
I'm not sure exactly what he wanted to say, but it sounds to me like
it's going in the same direction as my earlier request to replace the
hard-coded table by code that auto-generates the cases.
There is already similar code in latin-ltx.el (written by yours truly).
OK, thanks, in that case this will need some thinking, since the code in
latin-ltx.el suffers from the same problems I mentioned in
<http://bugs.gnu.org/20499#105>: from a user's point of view the supported
characters are a haphazard list. E.g., it adds some chars for Pinyin tones but
not others. Partly the problem is that it adds "easy" Latin letters like ȳ even
though nobody uses them, but not "hard" ones like ǚ even though they're actually
used on occasion.

Fixing this will take some thinking, because we'll need to devise ways to type
the "hard" Latin letters. I suppose latin-ltx and iso-transl should use similar
approaches here.

In the meantime, though, there is a need to type non-Latin punctuation like
dashes and quotation marks. That part of the patch seems relatively independent
of the Latin-letter issue, so I installed the attached. I hope to look into the
Latin-letter issue later.
Stefan Monnier
2015-05-11 02:25:41 UTC
Permalink
Post by Paul Eggert
Fixing this will take some thinking, because we'll need to devise ways to
type the "hard" Latin letters.
Indeed.
Post by Paul Eggert
I suppose latin-ltx and iso-transl should use similar approaches here.
Of course, in my ideal world, iso-transl and latin-ltx should not just
use similar approaches, but C-x 8 should basically work like a kind of
"enable TeX input method just for this char, and pre-insert \".


Stefan
Paul Eggert
2015-05-11 01:28:17 UTC
Permalink
Post by Eli Zaretskii
your idea of showing dozens or hundreds of characters isn't
workable, either.
It sounds workable to me, as I've used similar interfaces elsewhere, and they
work reasonably well. They're not as good as an input method if you're an
expert in the method, but they're much better than nothing when you're a
non-expert and don't have the time to learn an input method but just want to
enter a few unusual characters.

For example, if I visit English Wikipedia page for Emacs:

http://en.wikipedia.org/wiki/Emacs

and push the "Edit" button, I'll get to this page:

http://en.wikipedia.org/w/index.php?title=Emacs&action=edit

which gives me a list of buttons for inserting any of "– — ° ′ ″ ≈ ≠ ≤ ≥ ± − × ÷
← → · §", which I can just push directly to insert the corresponding character.
Or I can push the "Latin" button and then insert any of:

A a Á á À à Â â Ä ä Ǎ ǎ Ă ă Ā ā Ã ã Å å Ą ą Æ æ Ǣ ǣ B b C c Ć ć Ċ ċ Ĉ ĉ Č č
Ç ç D d Ď ď Đ đ Ḍ ḍ Ð ð E e É é È è Ė ė Ê ê Ë ë Ě ě Ĕ ĕ Ē ē Ẽ ẽ Ę ę Ẹ ẹ Ɛ ɛ
Ǝ ǝ Ə ə F f G g Ġ ġ Ĝ ĝ Ğ ğ Ģ ģ H h Ĥ ĥ Ħ ħ Ḥ ḥ I i İ ı Í í Ì ì Î î Ï ï
Ǐ ǐ Ĭ ĭ Ī ī Ĩ ĩ Į į Ị   ị J j Ĵ ĵ K k Ķ ķ L l Ĺ ĺ Ŀ ŀ Ľ ľ Ļ ļ Ł ł Ḷ ḷ Ḹ ḹ
M m Ṃ ṃ N n Ń ń Ň ň Ñ ñ Ņ ņ Ṇ ṇ Ŋ ŋ O o Ó ó Ò ò Ô ô Ö ö Ǒ ǒ Ŏ ŏ Ō ō Õ õ Ǫ
ǫ Ọ ọ Ő ő Ø ø Œ œ Ɔ ɔ P p Q q R r Ŕ ŕ Ř ř Ŗ ŗ Ṛ ṛ Ṝ ṝ S s Ś ś Ŝ ŝ Š š
Ş ş Ș ș Ṣ ṣ ß T t Ť ť Ţ ţ Ț ț Ṭ ṭ Þ þ U u Ú ú Ù ù Û û Ü ü Ǔ ǔ Ŭ ŭ Ū ū Ũ ũ Ů
ů Ų ų Ụ ụ Ű ű Ǘ ǘ Ǜ ǜ Ǚ ǚ Ǖ ǖ V v W w Ŵ ŵ X x Y y Ý ý Ŷ ŷ Ÿ ÿ Ỹ ỹ Ȳ ȳ
Z z Ź ź Ż ż Ž ž ß Ð ð Þ þ Ŋ ŋ Ə ə

This is all easy to do even if I don't remember the editing interface, and
unlike Emacs's C-x 8 it handles Pinyin tones, dotless i, etc., etc. This seems
to be the sort of thing that RMS is asking for, and I don't see why it wouldn't
work for Emacs.
Eli Zaretskii
2015-05-11 14:54:58 UTC
Permalink
Date: Sun, 10 May 2015 18:28:17 -0700
your idea of showing dozens or hundreds of characters isn't
workable, either.
It sounds workable to me, as I've used similar interfaces elsewhere, and they work reasonably well. They're not as good as an input method if you're an expert in the method, but they're much better than nothing when you're a non-expert and don't have the time to learn an input method but just want to enter a few unusual characters.
At least the last part of this thread was about _finding_ the
character, if you have only partial information about it. My comment
above was about that use case, and that use case only. You seem to be
talking about a different use case: when the user already knows quite
well which character she wants.
http://en.wikipedia.org/wiki/Emacs
http://en.wikipedia.org/w/index.php?title=Emacs&action=edit
which gives me a list of buttons for inserting any of "– — ° ′ ″ ≈ ≠ ≤ ≥ ± − × ÷ ← → · §", which I can just push directly to insert the corresponding character.
This is the case where you know a very small subset of characters from
which to choose. But even here, how do you know whether you need '–',
'—', or '−'? Or maybe you want '⸺' or even '⸻' instead (they are not
shown in the list offered by Wikipedia)? Likewise, there are many
more quote characters than the above offers.

In general, punctuation characters fill 2 full blocks of codepoints,
so finding the one you need is more than just selecting out of less
than 20 characters someone decided for you they are all you'll need.
A a Á á À à  â Ä ä Ǎ ǎ Ă ă Ā ā à ã Å å Ą ą Æ æ Ǣ ǣ B b C c Ć ć Ċ ċ Ĉ ĉ Č č Ç ç D d Ď ď Đ đ Ḍ ḍ Ð ð E e É é È è Ė ė Ê ê Ë ë Ě ě Ĕ ĕ Ē ē Ẽ ẽ Ę ę Ẹ ẹ Ɛ ɛ Ǝ ǝ Ə ə F f G g Ġ ġ Ĝ ĝ Ğ ğ Ģ ģ H h Ĥ ĥ Ħ ħ Ḥ ḥ I i İ ı Í í Ì ì Î î Ï ï Ǐ ǐ Ĭ ĭ Ī ī Ĩ ĩ Į į Ị   ị J j Ĵ ĵ K k Ķ ķ L l Ĺ ĺ Ŀ ŀ Ľ ľ Ļ ļ Ł ł Ḷ ḷ Ḹ ḹ M m Ṃ ṃ N n Ń ń Ň ň Ñ ñ Ņ ņ Ṇ ṇ Ŋ ŋ O o Ó ó Ò ò Ô ô Ö ö Ǒ ǒ Ŏ ŏ Ō ō Õ õ Ǫ ǫ Ọ ọ Ő ő Ø ø Œ œ Ɔ ɔ P p Q q R r Ŕ ŕ Ř ř Ŗ ŗ Ṛ ṛ Ṝ ṝ S s Ś ś Ŝ ŝ Š š Ş ş Ș ș Ṣ ṣ ß T t Ť ť Ţ ţ Ț ț Ṭ ṭ Þ þ U u Ú ú Ù ù Û û Ü ü Ǔ ǔ Ŭ ŭ Ū ū Ũ ũ Ů ů Ų ų Ụ ụ Ű ű Ǘ ǘ Ǜ ǜ Ǚ ǚ Ǖ ǖ V v W w Ŵ ŵ X x Y y Ý ý Ŷ ŷ Ÿ ÿ Ỹ ỹ Ȳ ȳ Z z Ź ź Ż ż Ž ž ß Ð ð Þ þ Ŋ ŋ Ə ə
Again, this is a different use case: you need already to know your
character is one of the "Latin" characters. And they cheat: what you
see is a subset of the characters that someone decided for you they
are all you need. (For example, "Math and logic" has ∫, ∬, and ∭, but
not ⨌; "Latin" lacks the entire Latin Extended-B, -C, -D, and Latin
Extended Additional blocks; etc.)

IOW, the above selection is highly filtered using some unspecified
rules, and therefore it at best emulates a use case where the user has
a pretty good knowledge about what she wants to find. And still, you
need to select out of about 300 characters.

How's that workable, except in very simple use cases?
This is all easy to do even if I don't remember the editing interface, and unlike Emacs's C-x 8 it handles Pinyin tones, dotless i, etc., etc. This seems to be the sort of thing that RMS is asking for, and I don't see why it wouldn't work for Emacs.
It would work for Emacs. The question is, would it be convenient for
users?

We should be able to do better than the example you show, i.e. allow
the user to define what she knows about the character she is looking
for, and then present the characters matching that description. (I
presented earlier the provisional list of attributes I think will be
useful as part of such a description.) We definitely shouldn't assume
we know better than the user which characters she might or might not
want the way Wikipedia does. And we should allow the users to
leverage more accurate information, if they have it. For example, if
you know that the character you are looking for is some form of a
Latin 'a', then we could present only those (there are 36 of them in
the current UCD).
Stefan Monnier
2015-05-11 15:52:40 UTC
Permalink
Post by Eli Zaretskii
IOW, the above selection is highly filtered using some unspecified
rules, and therefore it at best emulates a use case where the user has
a pretty good knowledge about what she wants to find. And still, you
need to select out of about 300 characters.
How's that workable, except in very simple use cases?
It's workable in the following way:
- first time around, you'll have to scan all those chars, which will
take a little while.
- second time around you'll also have to scan them, but it will take
a bit less time.
- ...
- Nth time around, you'll either know more or less where the char is so
you don't need to scan all those chars any more, or you'll have
learned some other way to insert the char.

That's what I do every once in a while using the symbols.dvi document,
looking for how to enter some funny-looking math symbols in LaTeX.
I generally have no clue whatsoever how the symbol might be called when
I do such searches.

And I agree that further refinement (such as restricting the display to
those glyphs that have an "e" in them, which would include all the
weirdly accented forms of "e" and probably the upper case forms as
well) would be a nice addition.

E.g. it would be great to be able to say "it's char that has a > in its
glyph" and then be presented with things like ≥, right angle brackets,
right arrows, ...


Stefan
Eli Zaretskii
2015-05-11 16:16:14 UTC
Permalink
Date: Mon, 11 May 2015 11:52:40 -0400
Post by Eli Zaretskii
IOW, the above selection is highly filtered using some unspecified
rules, and therefore it at best emulates a use case where the user has
a pretty good knowledge about what she wants to find. And still, you
need to select out of about 300 characters.
How's that workable, except in very simple use cases?
- first time around, you'll have to scan all those chars, which will
take a little while.
- second time around you'll also have to scan them, but it will take
a bit less time.
- ...
- Nth time around, you'll either know more or less where the char is so
you don't need to scan all those chars any more, or you'll have
learned some other way to insert the char.
That's what I do every once in a while using the symbols.dvi document,
looking for how to enter some funny-looking math symbols in LaTeX.
I admire your patience. When I need to do this, I generally give up
in despair very quickly. And unless I need the same character over
and over again, my Nth time looks very similar to my first.
And I agree that further refinement (such as restricting the display to
those glyphs that have an "e" in them, which would include all the
weirdly accented forms of "e" and probably the upper case forms as
well) would be a nice addition.
I can try writing a back-end (that thing that takes a list of criteria
and returns a list of codepoints or ranges to display) for this, if
someone will then add a UI for the user to specify the constraints and
for display of the results.
E.g. it would be great to be able to say "it's char that has a > in its
glyph" and then be presented with things like ≥, right angle brackets,
right arrows, ...
Yep.
Paul Eggert
2015-05-11 18:48:36 UTC
Permalink
IOW, the above selection is highly filtered using some unspecified rules
Sure, and I expect that what Wikipedia has done is seen which characters
get used the most, give a trivial UI for the most-commonly used dozen or
so non-ASCII characters, a simple UI for the most-commonly used
few-hundred non-ASCII characters, and a more-complex UI for the rest.
It's a reasonable design approach.
For example, if you know that the character you are looking for is
some form of a Latin 'a', then we could present only those (there are
36 of them in the current UCD).
That all sounds good, for users who know that there's a way to get that
list of "A"-like characters. It would be good also to cater to people
who are less expert, and who only know something simple like "type the
Alt-FOO key if you want to type weird characters". Perhaps a top-level
menu that gives a dozen or so of the most-common characters and also
says "type an "A" to get the "A"-like letters", and "press this button
to get Greek", etc.
Eli Zaretskii
2015-05-11 19:10:49 UTC
Permalink
Date: Mon, 11 May 2015 11:48:36 -0700
IOW, the above selection is highly filtered using some unspecified rules
Sure, and I expect that what Wikipedia has done is seen which characters
get used the most, give a trivial UI for the most-commonly used dozen or
so non-ASCII characters, a simple UI for the most-commonly used
few-hundred non-ASCII characters, and a more-complex UI for the rest.
It's a reasonable design approach.
But it's not Emacsy, not to my palate. Emacs never arbitrarily limits
the user without offering some ways to lift the limits.
For example, if you know that the character you are looking for is
some form of a Latin 'a', then we could present only those (there are
36 of them in the current UCD).
That all sounds good, for users who know that there's a way to get that
list of "A"-like characters.
The way I envision it, the UI to specify the characters you are
looking for will have a widget named "Looks like ..." or "Base
character", and users who are looking for 'a' with some diacriticals
will type "a" there.
Perhaps a top-level menu that gives a dozen or so of the most-common
characters
I think "most-common characters" can only be reasonably offered once
the user supplied a language or script. Most-common Latin characters
are different from most-common Cyrillic characters or Greek or Hebrew
or Math symbols.
and also says "type an "A" to get the "A"-like letters", and "press
this button to get Greek", etc.
I don't think a single button will do. At least it should be possible
to press both "Greek" and "with/without diacriticals", and possibly
also other constraints, like with/without punctuation.

IOW, we need to let users specify several constraints, and display
whatever matches them. If they only specify the script, like "Latin",
they will see the list similar to what you presented, perhaps in
several parts with a "more" button.
Richard Stallman
2015-05-12 08:56:20 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
At least the last part of this thread was about _finding_ the
character, if you have only partial information about it. My comment
above was about that use case, and that use case only. You seem to be
talking about a different use case: when the user already knows quite
well which character she wants.
This seems like a misunderstanding about the word "find".

In general I know what the character looks like.
I expect I would spot it immediately if I saw it.
For instance, it wouldn't be hard to recognize the dotless i
in a list of lowercase non-ASCII letters. Especially if it is
in some sort of order.

I'm afraid you've been looking for a solution to some problem
that I wasn't talking about.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2015-05-12 16:13:37 UTC
Permalink
Date: Tue, 12 May 2015 04:56:20 -0400
Post by Eli Zaretskii
At least the last part of this thread was about _finding_ the
character, if you have only partial information about it. My comment
above was about that use case, and that use case only. You seem to be
talking about a different use case: when the user already knows quite
well which character she wants.
This seems like a misunderstanding about the word "find".
I don't think so.
In general I know what the character looks like.
I expect I would spot it immediately if I saw it.
For instance, it wouldn't be hard to recognize the dotless i
in a list of lowercase non-ASCII letters.
I presume that when you say "non-ASCII" you really mean "non-ASCII
Latin", since the number of lowercase non-ASCII characters is rather
large (about 1400, if I'm not mistaken).

There are 581 characters in the Unicode database that are lowercase
non-ASCII Latin letters. While it's possible to go through this long
list looking for the one character you are after, it's hardly
convenient or efficient, IMO.

So I think IWBNI Emacs could help the user by showing less than this
amount. For example, if you know it's some form of i, IWBNI Emacs
allowed you to say that, and be presented only with characters which
match that description (there are only 29 of them).
Especially if it is in some sort of order.
The order in which to present the characters is also not trivial. The
easiest one is the order of codepoints, but I presume it would be
better to group characters by their base character, i.e. all forms of
i together.

Richard Stallman
2015-05-11 18:27:36 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Indeed, that is what I'd like.
--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org www.gnu.org
Skype: No way! See stallman.org/skype.html.
Paul Eggert
2015-05-11 01:55:34 UTC
Permalink
Post by Ivan Shmakov
It just iterates over the range given (or U+00A8 through U+02AF
by default) and maps “LATIN + COMBINING” decompositions to
'iso-transl entries.
Thanks for the explanation.
Post by Ivan Shmakov
But are we actually limited to two-character abbreviations only?
Why not allow for, say, C-x 8 " ' u?
We can do that, but only if the combining prefixes are distinct from the letters
themselves. My previous proposal didn't do that, e.g., it used "u" for breve,
which would make things like "C-x 8 , u E" ambiguous (is that u with a cedilla
followed by plain E, or E with a cedilla and breve?). So I guess more thought
is needed.
Post by Ivan Shmakov
However, given that ‘text/x-patch’ is not a /registered/ MIME
type, I believe the above does not apply.
Once one starts using x-* types anything goes, is my impression.
Post by Ivan Shmakov
If Thunderbird /knows/ the encoding (“character set”) of the
contents of the MIME part,
It doesn't, which is why Thunderbird doesn't say.

Regardless of what one's opinion of what the standard says or should say, it's
pretty clear that these sorts of attachments are often sent and generally work;
if they don't work with Gnus then that's probably a Gnus bug report worth
filing. The Gnus manual says one should report a bug with "M-x gnus-bug". I
tried that, but it complained "Gnus has been shut down", so I gave up. Since
you're a Gnus user, I hope you can take on the task of filing a bug report.
Loading...