Discussion:
Bug#545258: heirloom-mailx: fails to set the charset to UTF-8 in From
(too old to reply)
Martin-Éric Racine
2009-09-06 01:40:05 UTC
Permalink
Package: heirloom-mailx
Version: 12.4-1.1+b1
Severity: normal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

When piping text into heirloom-mailx, it fails to specify the charset used with
the From: line if the name inherited from /etc/passwd is in UTF-8.

cat file | mail -s "some subject" ***@domain.ltd

q-funk:x:1000:1000:Martin-Éric Racine,,,:/home/q-funk:/bin/bash

Setting "set sendcharsets=utf-8" only seems to affect the MIME declaration that
appears in the message headers:

User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

However, it does not solve the above issue:

Message-Id: <***@iki.fi>
From: Martin-?ric Racine <q-***@iki.fi>

Yet, we see that:

$ file /etc/passwd
/etc/passwd: UTF-8 Unicode text

Is there something I'm missing?

- -- System Information:
Debian Release: squeeze/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.30-020630-generic (SMP w/1 CPU core)
Locale: LANG=fi_FI.UTF-8, LC_CTYPE=fi_FI.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages heirloom-mailx depends on:
ii base-files 5.0.0 Debian base system miscellaneous f
ii libc6 2.9-25 GNU C Library: Shared libraries
ii libgssapi-krb5-2 1.7dfsg~beta3-1 MIT Kerberos runtime libraries - k
ii libssl0.9.8 0.9.8k-4 SSL shared libraries

heirloom-mailx recommends no packages.

Versions of packages heirloom-mailx suggests:
ii nullmailer [mail-transport-ag 1:1.04-1.1 simple relay-only mail transport a

- -- no debconf information

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkqjD5EACgkQeXr56x4Muc3csgCfTAG3dc6f83Va4obf7AxgCKvv
zicAn2ZsQes2yjLcRJipIsz3Hkhfywv0
=N5oH
-----END PGP SIGNATURE-----
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Hilko Bengen
2009-09-08 10:40:08 UTC
Permalink
Post by Martin-Éric Racine
When piping text into heirloom-mailx, it fails to specify the charset used with
the From: line if the name inherited from /etc/passwd is in UTF-8.
q-funk:x:1000:1000:Martin-Éric Racine,,,:/home/q-funk:/bin/bash
As far as I know, using non-ASCII characters in the GECOS field of
/etc/passwd is not specified at all. So far, I haven't found anything
Debian's main policy file, passwd(5), the adduser(8) and useradd(8)
manpages, nor the documentation of base-passwd. (If you have found more
than I, let me know.)
Post by Martin-Éric Racine
From an application's standpoint, I'd tend to assume the GECOS field
either to be a comma-sparated string of ASCII characters or a
comma-separated string of byte values.

Yes, I can see how useless both options are is your case. :-)

Basing mailx' interpretation of the GECOS field on the sendcharset
variable, as you suggested is probably not a good idea.
As a workaround, please try setting your real name to a pre-encoded
string in the .mailrc.

Cheers,
-Hilko
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Martin-Éric Racine
2009-09-08 11:10:37 UTC
Permalink
Post by Hilko Bengen
Post by Martin-Éric Racine
When piping text into heirloom-mailx, it fails to specify the charset used with
the From: line if the name inherited from /etc/passwd is in UTF-8.
q-funk:x:1000:1000:Martin-Éric Racine,,,:/home/q-funk:/bin/bash
As far as I know, using non-ASCII characters in the GECOS field of
/etc/passwd is not specified at all. So far, I haven't found anything
Debian's main policy file, passwd(5), the adduser(8) and useradd(8)
manpages, nor the documentation of base-passwd. (If you have found more
than I, let me know.)
While it is not specified, it has become a de-facto standard in Debian
and its derivatives to use UTF-8 for everything, including the real
name that appears in the GECOS field of /etc/passwd.
Post by Hilko Bengen
From an application's standpoint, I'd tend to assume the GECOS field
either to be a comma-sparated string of ASCII characters or a
comma-separated string of byte values.
We cannot assume that anymore now that Debian uses UTF-8 for everything.
Post by Hilko Bengen
Basing mailx' interpretation of the GECOS field on the sendcharset
variable, as you suggested is probably not a good idea.
Why not?
Post by Hilko Bengen
As a workaround, please  try setting your real name to a pre-encoded
string in the .mailrc.
Do you really expect all users on a given system to start doing that,
just because their name includes non-ascii characters? Please remember
that both Debian and Ubuntu nowadays allow non-ascii GECOS content
under the presumption that it will be in UTF-8.

Martin-Éric
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Hilko Bengen
2009-09-08 16:00:15 UTC
Permalink
Post by Martin-Éric Racine
Post by Hilko Bengen
As far as I know, using non-ASCII characters in the GECOS field of
/etc/passwd is not specified at all. So far, I haven't found anything
Debian's main policy file, passwd(5), the adduser(8) and useradd(8)
manpages, nor the documentation of base-passwd. (If you have found more
than I, let me know.)
While it is not specified, it has become a de-facto standard in Debian
and its derivatives to use UTF-8 for everything, including the real
name that appears in the GECOS field of /etc/passwd.
I had also thought about UTF-8 becoming the standard encoding in many
places in Debian, be it de-iure or de-facto. But I am not going to
assume that this extends to /etc/passwd.

And how should non-ASCII characters in other kinds of user databases be
treated, such as NIS or LDAP?
Post by Martin-Éric Racine
Post by Hilko Bengen
From an application's standpoint, I'd tend to assume the GECOS field
either to be a comma-sparated string of ASCII characters or a
comma-separated string of byte values.
We cannot assume that anymore now that Debian uses UTF-8 for everything.
If you can point me to a text passage in the policy (or any relevant
discussion on the mailing lists), I will be happy to reconsider my
opinion.
Post by Martin-Éric Racine
Post by Hilko Bengen
Basing mailx' interpretation of the GECOS field on the sendcharset
variable, as you suggested is probably not a good idea.
Why not?
(it's sendcharsets, sorry for the typo)

sendcharsets is about the target charset.
Post by Martin-Éric Racine
Post by Hilko Bengen
As a workaround, please try setting your real name to a pre-encoded
string in the .mailrc.
Do you really expect all users on a given system to start doing that,
just because their name includes non-ascii characters?
Not at all. I just thought that this workaround might be helpful for you
until the larger issues get sorted out. Feel free to ignore my
suggestion. :-)
Post by Martin-Éric Racine
Please remember that both Debian and Ubuntu nowadays allow non-ascii
GECOS content under the presumption that it will be in UTF-8.
They have alway allowed non-ascii content in the GECOS field, but I see
no such presumption.
Post by Martin-Éric Racine
From the sources I have seen, existing tools for manipulating
/etc/passwd will happily accept *any* byte sequence from the terminal.
If an administrator has still set his console to iso-8859-1, that's what
is used, without conversion.

-Hilko
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Martin-Éric Racine
2009-09-08 16:50:09 UTC
Permalink
Post by Hilko Bengen
Post by Martin-Éric Racine
Post by Hilko Bengen
As far as I know, using non-ASCII characters in the GECOS field of
/etc/passwd is not specified at all. So far, I haven't found anything
Debian's main policy file, passwd(5), the adduser(8) and useradd(8)
manpages, nor the documentation of base-passwd. (If you have found more
than I, let me know.)
While it is not specified, it has become a de-facto standard in Debian
and its derivatives to use UTF-8 for everything, including the real
name that appears in the GECOS field of /etc/passwd.
I had also thought about UTF-8 becoming the standard encoding in many
places in Debian, be it de-iure or de-facto. But I am not going to
assume that this extends to /etc/passwd.
And how should non-ASCII characters  in other kinds of user databases be
treated, such as NIS or LDAP?
That's of course slightly more complicated. However, as far as
/etc/passwd is concerned, testing the content with 'file' would be a
rather easy way to determine whether to use UTF-8 or something else.
Anyhow, the current approach in the global config to try iso-8859-1
and then utf-8 is broken, because it only works for non-EURO western
languages. The only correct assumption to make is utf-8 and if that
fails, then parse 'env' for whatever deprecated locale the user
currently has.
Post by Hilko Bengen
Post by Martin-Éric Racine
Post by Hilko Bengen
From an application's standpoint, I'd tend to assume the GECOS field
either to be a comma-sparated string of ASCII characters or a
comma-separated string of byte values.
We cannot assume that anymore now that Debian uses UTF-8 for everything.
If you can point me to a text passage in the policy (or any relevant
discussion on the mailing lists), I will be happy to reconsider my
opinion.
Every default installation of Debian or Ubuntu writes GECOS content in
UTF-8 based on the fullname that is given when creating the account,
because a Debian or Ubuntu system nowadays defaults to UTF-8 locales
and uses that to produce the GECOS info.
Post by Hilko Bengen
Post by Martin-Éric Racine
Post by Hilko Bengen
Basing mailx' interpretation of the GECOS field on the sendcharset
variable, as you suggested is probably not a good idea.
Why not?
(it's sendcharsets, sorry for the typo)
sendcharsets is about the target charset.
OK, what should we parse then to make a correct guess?
Post by Hilko Bengen
Post by Martin-Éric Racine
Post by Hilko Bengen
As a workaround, please try setting your real name to a pre-encoded
string in the .mailrc.
Do you really expect all users on a given system to start doing that,
just because their name includes non-ascii characters?
Not at all. I just thought that this workaround might be helpful for you
until the larger issues get sorted out. Feel free to ignore my
suggestion. :-)
Well, I'd indeed hope we can sort this out. Besides, I have a dozen
of different environment variables that already set my real name.
Can't we parse any of those? Oh, but that would probably fail too,
because it doesn't tell the encoding either, right?
Post by Hilko Bengen
Post by Martin-Éric Racine
Please remember that both Debian and Ubuntu nowadays allow non-ascii
GECOS content under the presumption that it will be in UTF-8.
They have alway allowed non-ascii content in the GECOS field, but I see
no such presumption.
From the sources I have seen, existing tools for manipulating
/etc/passwd will happily accept *any* byte sequence from the terminal.
If an administrator has still set his console to iso-8859-1, that's what
is used, without conversion.
It indeed accepts anything and, there days, with the locales using
UTF-8 variants be default, it really does get anything. :-)

Martin-Éric
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Martin-Éric Racine
2019-12-01 18:40:01 UTC
Permalink
Post by Martin-Éric Racine
Package: heirloom-mailx
Version: 12.4-1.1+b1
Severity: normal
When piping text into heirloom-mailx, it fails to specify the charset used with
the From: line if the name inherited from /etc/passwd is in UTF-8.
q-funk:x:1000:1000:Martin-Éric Racine,,,:/home/q-funk:/bin/bash
Hello Martin-Éric,
This bug is now >10 years old, so I somewhat hope it got fixed as the
general adoption of UTF-8 increased in the last years. I tried to
reproduce the issue you described with the latest version of s-nail in
unstable (14.9.15-1), but I'm not sure I'm doing it right. The best
thing would be if you could try to reproduce with a more recent version
on s-nail and report back. Do you think you could give it a shot?
All my hosts currently run the traditional bsd-mailx.

Martin-Éric
Steffen Nurpmeso
2019-12-02 21:50:02 UTC
Permalink
Martin-Éric Racine wrote in <CAPZXPQeWka2gtdYjnKUpAgC4behjG3hx3+5ns0K9H\
oNYD-***@mail.gmail.com>:
|su 1. jouluk. 2019 klo 18.44 Paride Legovini (***@ninthfloor.org) kirjoitti:
|> On Sun, 06 Sep 2009 <q-***@iki.fi> wrote:
|>> Package: heirloom-mailx
|>> Version: 12.4-1.1+b1
|>> Severity: normal
|>>
|>> When piping text into heirloom-mailx, it fails to specify the charset \
|>> used with
|>> the From: line if the name inherited from /etc/passwd is in UTF-8.
|>>
|>> cat file | mail -s "some subject" ***@domain.ltd
|>>
|>> q-funk:x:1000:1000:Martin-Éric Racine,,,:/home/q-funk:/bin/bash
|>
|> Hello Martin-Éric,
|>
|> This bug is now >10 years old, so I somewhat hope it got fixed as the
|> general adoption of UTF-8 increased in the last years. I tried to
|> reproduce the issue you described with the latest version of s-nail in
|> unstable (14.9.15-1), but I'm not sure I'm doing it right. The best
|> thing would be if you could try to reproduce with a more recent version
|> on s-nail and report back. Do you think you could give it a shot?
|
|All my hosts currently run the traditional bsd-mailx.

Hilko Bengen is possibly sometimes a bit difficult to get on. If
i read the bugs' thread, he referred to sendcharsets= as the
target character set, and this is correct. What we need to know
to get your (former) problem right is the input / source character
set.

So as long as your locale (man 7 locale) states the correct
character set (for example doing "export LC_ALL=fr_FR.utf8" in the
shell, before starting this MUA, for example), or you set the
ttycharset variable directly (it is normally derived from the
user's locale), this should just work. (Iff conversion to
sendcharsets= is possible, iff that is necessary.)
This is true for all programs which adhere to locale(7).

(Note this MUA now supports a mime-force-sendout variable which
can be used to enforce sending out data, even if character set
conversion fails. For unattended usage on systems which may
generate logs in whatever encoding.)

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

Loading...