Discussion:
Test Latin-1 in Google Groups (no yY, now for real)
(too old to reply)
Ruud Harmsen
2018-11-02 19:06:28 UTC
Permalink
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252

Accent grave: àèìòù
Accent aigu: áéíóú
Accent circonflex: âêôîû
Diaeresis: äëïöü
Tilde: ãõ
Cedilla: ç

Uppercase:
Accent grave: ÀÈÌÒÙ
Accent aigu: ÁÉÍÓÚ
Accent circonflex: ÂÊÔÎÛ
Diaeresis: ÄËÏÖÜ
Tilde: ÃÕ
Cedilla: Ç



àèìòùáéíóúâêôîûäëïöüãõç
--
Ruud Harmsen, http://rudhar.com
Luuk
2018-11-02 19:33:30 UTC
Permalink
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: àèìòù
Accent aigu: áéíóú
Accent circonflex: âêôîû
Diaeresis: äëïöü
Tilde: ãõ
Cedilla: ç
Accent grave: ÀÈÌÒÙ
Accent aigu: ÁÉÍÓÚ
Accent circonflex: ÂÊÔÎÛ
Diaeresis: ÄËÏÖÜ
Tilde: ÃÕ
Cedilla: Ç
àèìòùáéíóúâêôîûäëïöüãõç
Why not post in UTF-8?
Ruud Harmsen
2018-11-02 20:24:01 UTC
Permalink
Post by Luuk
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: àèìòù
Accent aigu: áéíóú
Accent circonflex: âêôîû
Diaeresis: äëïöü
Tilde: ãõ
Cedilla: ç
Accent grave: ÀÈÌÒÙ
Accent aigu: ÁÉÍÓÚ
Accent circonflex: ÂÊÔÎÛ
Diaeresis: ÄËÏÖÜ
Tilde: ÃÕ
Cedilla: Ç
àèìòùáéíóúâêôîûäëïöüãõç
Why not post in UTF-8?
Agent 1.93 can't.
--
Ruud Harmsen, http://rudhar.com
Christian Weisgerber
2018-11-02 23:04:16 UTC
Permalink
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: àèìòù
... and Google Groups transcodes it to Cyrillic again. This is bizarre.

It seems Google Groups doesn't see or trust the charset parameter
in the Content-Type header and instead tries to "guess" the character
set by some heuristic. Your test messages run afoul of that because
nobody writes something like àèìòù in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.

(I checked KOI8-R, ISO 8859-5, and Windows 1251. It's definitely
treated as the latter.)

There was a time when some people would post articles encoded in a
8-bit character set without a Content-Type header, and this was
tolerated to various degress in different hierarchies (IIRC, relcom.*
officially defaulted to KOI8-R), so applying such a heuristic to
unmarked articles makes some degree of sense. Applying it to those
with a Content-Type header is nonsensical, though.
--
Christian "naddy" Weisgerber ***@mips.inka.de
António Marques
2018-11-03 00:41:16 UTC
Permalink
Let’s try.
áéíóú
Peter T. Daniels
2018-11-03 02:14:43 UTC
Permalink
Post by António Marques
Let’s try.
áéíóú
accents aigus
Peter T. Daniels
2018-11-03 02:14:09 UTC
Permalink
Post by Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: àèìòù
... and Google Groups transcodes it to Cyrillic again. This is bizarre.
I see accents graves there, but cyrillic letters in Ruud's message that
you quoted, so whatever Ruud does wrong, you've corrected. That might
give you enough data to figure it out.
Post by Christian Weisgerber
It seems Google Groups doesn't see or trust the charset parameter
in the Content-Type header and instead tries to "guess" the character
set by some heuristic. Your test messages run afoul of that because
nobody writes something like àèìòù in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.
(I checked KOI8-R, ISO 8859-5, and Windows 1251. It's definitely
treated as the latter.)
There was a time when some people would post articles encoded in a
8-bit character set without a Content-Type header, and this was
tolerated to various degress in different hierarchies (IIRC, relcom.*
officially defaulted to KOI8-R), so applying such a heuristic to
unmarked articles makes some degree of sense. Applying it to those
with a Content-Type header is nonsensical, though.
Ruud Harmsen
2018-11-03 06:03:26 UTC
Permalink
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: àèìòù
... and Google Groups transcodes it to Cyrillic again. This is bizarre.
It seems Google Groups doesn't see or trust the charset parameter
in the Content-Type header and instead tries to "guess" the character
set by some heuristic. Your test messages run afoul of that because
nobody writes something like àèìòù in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.
(I checked KOI8-R, ISO 8859-5, and Windows 1251. It's definitely
treated as the latter.)
Yes. Probably ISO-8859-1 is now no longer an official encoding for
Usenet (it has been replaced by Windows-1252 for the web too; the
Validator warns against it).
Ruud Harmsen
2018-11-03 06:17:19 UTC
Permalink
Post by Ruud Harmsen
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: אטלעש
... and Google Groups transcodes it to Cyrillic again. This is bizarre.
It seems Google Groups doesn't see or trust the charset parameter
in the Content-Type header and instead tries to "guess" the character
set by some heuristic. Your test messages run afoul of that because
nobody writes something like אטלעש in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.
(I checked KOI8-R, ISO 8859-5, and Windows 1251. It's definitely
treated as the latter.)
Yes. Probably ISO-8859-1 is now no longer an official encoding for
Usenet (it has been replaced by Windows-1252 for the web too; the
Validator warns against it).
And now Google displays it as Hebrew, according to http://czyborra.com/charsets/iso8859.html#ISO-8859-8 . Weird.
Peter T. Daniels
2018-11-03 12:26:25 UTC
Permalink
Post by Ruud Harmsen
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: אטלעש
Those have now turned into Hebrew: right to left: aleph tet lamed `ayin shin

Ruud's system is really sick.
Post by Ruud Harmsen
Post by Christian Weisgerber
... and Google Groups transcodes it to Cyrillic again. This is bizarre.
It seems Google Groups doesn't see or trust the charset parameter
in the Content-Type header and instead tries to "guess" the character
set by some heuristic. Your test messages run afoul of that because
nobody writes something like אטלעש in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.
(I checked KOI8-R, ISO 8859-5, and Windows 1251. It's definitely
treated as the latter.)
Yes. Probably ISO-8859-1 is now no longer an official encoding for
Usenet (it has been replaced by Windows-1252 for the web too; the
Validator warns against it).
Ruud Harmsen
2018-11-03 23:14:13 UTC
Permalink
Sat, 3 Nov 2018 05:26:25 -0700 (PDT): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: ?????
Those have now turned into Hebrew: right to left: aleph tet lamed `ayin shin
Ruud's system is really sick.
HAVE YOU STILL NOT SEEN THAT THIS SICKNESS IS IN GOOGLE GROUPS AND NOT
IN ANY OF MY PROGRAMS???

It's right before your eyes, I gave you the links to look at it, and
still you continue to falsely accuse me.

Why do you do that? Stupidity of malice? Whose razor should I use?
Arnaud Fournet
2018-11-04 11:50:16 UTC
Permalink
Post by Ruud Harmsen
Sat, 3 Nov 2018 05:26:25 -0700 (PDT): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: ?????
Those have now turned into Hebrew: right to left: aleph tet lamed `ayin shin
Ruud's system is really sick.
HAVE YOU STILL NOT SEEN THAT THIS SICKNESS IS IN GOOGLE GROUPS AND NOT
IN ANY OF MY PROGRAMS???
It's right before your eyes, I gave you the links to look at it, and
still you continue to falsely accuse me.
Why do you do that? Stupidity of malice? Whose razor should I use?
ah, maybe, you Ruud are finally realizing that PTD is an asshole.
Peter T. Daniels
2018-11-04 14:39:24 UTC
Permalink
Post by Ruud Harmsen
Sat, 3 Nov 2018 05:26:25 -0700 (PDT): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Ruud Harmsen
Testing accented letters, encoded in Latin-1, ISO8859-1, Windows
1252/CP1252
Accent grave: ?????
Those have now turned into Hebrew: right to left: aleph tet lamed `ayin shin
Ruud's system is really sick.
HAVE YOU STILL NOT SEEN THAT THIS SICKNESS IS IN GOOGLE GROUPS AND NOT
IN ANY OF MY PROGRAMS???
It's right before your eyes, I gave you the links to look at it, and
still you continue to falsely accuse me.
See reply in other thread. Your fault is unique. If it were in GG, it
would happen at least occasionally in other people's messages.
Post by Ruud Harmsen
Why do you do that? Stupidity of malice? Whose razor should I use?
occam is in AUE.
Ruud Harmsen
2018-11-04 15:08:03 UTC
Permalink
Sun, 4 Nov 2018 06:39:24 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
It's right before your eyes, I gave you the links to look at it, and
still you continue to falsely accuse me.
See reply in other thread. Your fault is unique.
That does not prove that I caused it.
Post by Peter T. Daniels
If it were in GG, it
would happen at least occasionally in other people's messages.
It is a objectively provable fact, that Google Groups received valid
ISO-8859-1 characters, marked as such by a header, and nevertheless
turns them into Russian or Hebrew which it is isn't.

So the error is with Google Groups. Clear and simple.
--
Ruud Harmsen, http://rudhar.com
Ruud Harmsen
2018-11-04 15:10:37 UTC
Permalink
Sun, 4 Nov 2018 06:39:24 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
HAVE YOU STILL NOT SEEN THAT THIS SICKNESS IS IN GOOGLE GROUPS AND NOT
IN ANY OF MY PROGRAMS???
It's right before your eyes, I gave you the links to look at it, and
still you continue to falsely accuse me.
See reply in other thread. Your fault is unique. If it were in GG, it
would happen at least occasionally in other people's messages.
Post by Ruud Harmsen
Why do you do that? Stupidity of malice? Whose razor should I use?
occam is in AUE.
Hanlon's is here.

Problem is, I don't know you as stupid. And not as malicious either.
So what's left?
--
Ruud Harmsen, http://rudhar.com
Peter T. Daniels
2018-11-04 17:04:10 UTC
Permalink
Post by Ruud Harmsen
Sun, 4 Nov 2018 06:39:24 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
HAVE YOU STILL NOT SEEN THAT THIS SICKNESS IS IN GOOGLE GROUPS AND NOT
IN ANY OF MY PROGRAMS???
It's right before your eyes, I gave you the links to look at it, and
still you continue to falsely accuse me.
See reply in other thread. Your fault is unique. If it were in GG, it
would happen at least occasionally in other people's messages.
Post by Ruud Harmsen
Why do you do that? Stupidity of malice? Whose razor should I use?
occam is in AUE.
Hanlon's is here.
Problem is, I don't know you as stupid. And not as malicious either.
So what's left?
You brag about using antiquated equipment, and you refuse to believe it
might be faulty.
Ruud Harmsen
2018-11-04 18:23:26 UTC
Permalink
Sun, 4 Nov 2018 09:04:10 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Hanlon's is here.
Problem is, I don't know you as stupid. And not as malicious either.
So what's left?
You brag about using antiquated equipment, and you refuse to believe it
might be faulty.
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
--
Ruud Harmsen, http://rudhar.com
Peter T. Daniels
2018-11-04 20:50:17 UTC
Permalink
Post by Ruud Harmsen
Sun, 4 Nov 2018 09:04:10 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Problem is, I don't know you as stupid. And not as malicious either.
So what's left?
You brag about using antiquated equipment, and you refuse to believe it
might be faulty.
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?

What would someone have to do to duplicate the Ruud Effect?
Christian Weisgerber
2018-11-04 21:48:55 UTC
Permalink
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
It was me who laid out the facts and presented that conclusion, so
really you should be having this argument with me. But I will not
participate ad infinitum in a thread where I explain the facts and
you ignore them.
--
Christian "naddy" Weisgerber ***@mips.inka.de
Peter T. Daniels
2018-11-05 04:05:04 UTC
Permalink
Post by Christian Weisgerber
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
It was me who laid out the facts and presented that conclusion, so
really you should be having this argument with me. But I will not
participate ad infinitum in a thread where I explain the facts and
you ignore them.
Do you have an answer to the question, "How could someone replicate the
effect?" That would be doing science.
Ruud Harmsen
2018-11-05 07:25:04 UTC
Permalink
Sun, 4 Nov 2018 21:48:55 -0000 (UTC): Christian Weisgerber
Post by Christian Weisgerber
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
It was me who laid out the facts and presented that conclusion,
It was. But if I remember well, that was when my test message still
contained an uppercase Y with diaeresis, which is valid in Windows
code page 1252 but not in ISO-8859-1. Later on, we had the phenomenon
also when I posted strictly only valid ISO-8859-1.
Post by Christian Weisgerber
so really you should be having this argument with me. But I will not
participate ad infinitum in a thread where I explain the facts and
you ignore them.
--
Ruud Harmsen, http://rudhar.com
Ruud Harmsen
2018-11-05 07:21:41 UTC
Permalink
Sun, 4 Nov 2018 12:50:17 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
Anyone who is not blind, and who can see the difference between the
raw message as received (as displayed, on request, by Google Groups),
and how Google Groups routinely displays the message. You didn't
really look at those, did you? Should I look up the links (which I
already posted)?
Post by Peter T. Daniels
What would someone have to do to duplicate the Ruud Effect?
If you had followed the discussion, you'd know: post a message with a
string of accented letters that doesn't normally occur in normal
running text, strictly using only valid and defined ISO-8859-1
characters, labelling that message with a header that says
"ISO-8859-1" (not all software can do that last one; some switch to
UTF-8).

But because the GG bug is heuristic, erratic, and therefore
unpredictable, that doesn't guarantee that the effect will occur.
--
Ruud Harmsen, http://rudhar.com
Peter T. Daniels
2018-11-05 12:16:17 UTC
Permalink
Post by Ruud Harmsen
Sun, 4 Nov 2018 12:50:17 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
Anyone who is not blind, and who can see the difference between the
raw message as received (as displayed, on request, by Google Groups),
and how Google Groups routinely displays the message. You didn't
really look at those, did you? Should I look up the links (which I
already posted)?
"Routinely" is simply false. Never before have accented letters turned
into ccyrillic or Hebrew.
Post by Ruud Harmsen
Post by Peter T. Daniels
What would someone have to do to duplicate the Ruud Effect?
If you had followed the discussion, you'd know: post a message with a
string of accented letters that doesn't normally occur in normal
running text, strictly using only valid and defined ISO-8859-1
characters, labelling that message with a header that says
"ISO-8859-1" (not all software can do that last one; some switch to
UTF-8).
Only someone accustomed to programming can do all that.
Post by Ruud Harmsen
But because the GG bug is heuristic, erratic, and therefore
unpredictable, that doesn't guarantee that the effect will occur.
"Erratic" doesn't sound like something computers can do.
Ruud Harmsen
2018-11-05 12:32:26 UTC
Permalink
Mon, 5 Nov 2018 04:16:17 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Sun, 4 Nov 2018 12:50:17 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
Anyone who is not blind, and who can see the difference between the
raw message as received (as displayed, on request, by Google Groups),
and how Google Groups routinely displays the message. You didn't
really look at those, did you? Should I look up the links (which I
already posted)?
"Routinely" is simply false. Never before have accented letters turned
into ccyrillic or Hebrew.
OK, wrong choice of word. What I meant is the normal display which GG
normally presents to users, other than the display that if offered
when users explicitly ask for the message as Google in fact received
it from the Usenet network.

I'll look up the links and present them again in a next message.
Post by Peter T. Daniels
Post by Ruud Harmsen
Post by Peter T. Daniels
What would someone have to do to duplicate the Ruud Effect?
If you had followed the discussion, you'd know: post a message with a
string of accented letters that doesn't normally occur in normal
running text, strictly using only valid and defined ISO-8859-1
characters, labelling that message with a header that says
"ISO-8859-1" (not all software can do that last one; some switch to
UTF-8).
Only someone accustomed to programming can do all that.
No, anyone with a Usenet client program can do that. Perhaps even
users of GG itself can do it, unless GG always sends as UTF-8.
Post by Peter T. Daniels
Post by Ruud Harmsen
But because the GG bug is heuristic, erratic, and therefore
unpredictable, that doesn't guarantee that the effect will occur.
"Erratic" doesn't sound like something computers can do.
Windows does it all the time. GG in this case too. Demonstrably.
https://www.collinsdictionary.com/dictionary/english/erratic
"Something that is erratic does not follow a regular pattern, but
happens at unexpected times or moves along in an irregular way."

Precisely the right word. Spot on.
--
Ruud Harmsen, http://rudhar.com
Ruud Harmsen
2018-11-05 12:47:43 UTC
Permalink
Post by Ruud Harmsen
Mon, 5 Nov 2018 04:16:17 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Sun, 4 Nov 2018 12:50:17 -0800 (PST): "Peter T. Daniels"
Post by Peter T. Daniels
Post by Ruud Harmsen
Wrong. I am quite willing to believe it might faulty, but the
investigation has not revealed a fault. That same investigation has
unambiguously shown that GG makes the mistakes.
Whom have you persuaded of that conclusion?
Anyone who is not blind, and who can see the difference between the
raw message as received (as displayed, on request, by Google Groups),
and how Google Groups routinely displays the message. You didn't
really look at those, did you? Should I look up the links (which I
already posted)?
"Routinely" is simply false. Never before have accented letters turned
into ccyrillic or Hebrew.
OK, wrong choice of word. What I meant is the normal display which GG
normally presents to users, other than the display that if offered
when users explicitly ask for the message as Google in fact received
it from the Usenet network.
I'll look up the links and present them again in a next message.
This is what the antiquated program I use, Agent 1.93, actually sent
on my behalf, so this is also what Google Groups (as it itself shows
and thereby admits) has received:
https://groups.google.com/forum/#!original/sci.lang/ltdKYlcevwY/MuthmjmbAwAJ

What you see it what you get: valid ISO-8859-1, nothing outside that
scope, ISO-8859-1 as announced in the header, and as displayed. Skip
to just under "Testing accented letters, encoded in Latin-1,
ISO8859-1, Windows" [...] to see it.

And this is what Google Group shows in the normal, regular, every day,
day to day, common display for non-suspecting users:
https://groups.google.com/d/msg/sci.lang/ltdKYlcevwY/MuthmjmbAwAJ

Cyrillic. And that's wrong. A bug, by GG, not by Agent, not by me.
QED.

And this the proof for the Hebrew case:
https://groups.google.com/d/msg/sci.lang/ltdKYlcevwY/gUQ26QC_AwAJ
vs.
https://groups.google.com/forum/#!original/sci.lang/ltdKYlcevwY/gUQ26QC_AwAJ

Proof of the erratic behaviour:
This
https://groups.google.com/forum/#!original/sci.lang/ltdKYlcevwY/y36C3Ei_AwAJ
is NOT mangled into Russian of Hebrew, but remains broken Portuguese:
https://groups.google.com/d/msg/sci.lang/ltdKYlcevwY/y36C3Ei_AwAJ
even though the situation is the same.

So GG is trying to be smart and human-like, by guessing what the
sender meant, but it fails. Much better to do what the header says.
--
Ruud Harmsen, http://rudhar.com
Ruud Harmsen
2018-11-03 06:05:32 UTC
Permalink
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Christian Weisgerber
Your test messages run afoul of that because
nobody writes something like àèìòù in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.
So perhaps if I send running text con charácteres occasional, sim,
não, ocasiões, ¿they mîght survive Google's scrutiny?
Ruud Harmsen
2018-11-03 06:29:11 UTC
Permalink
Post by Ruud Harmsen
Fri, 2 Nov 2018 23:04:16 -0000 (UTC): Christian Weisgerber
Post by Christian Weisgerber
Your test messages run afoul of that because
nobody writes something like àèìòù in normal text. Google Groups
concludes that the text is encoded in Windows 1251 instead, and for
the web interface performs a conversion from Windows 1251 to UTF-8.
So perhaps if I send running text con charácteres occasional, sim,
não, ocasiões, ¿they mîght survive Google's scrutiny?
They do. So as Christian Weisgerber suggested, Google Groups is
definitely doing heuristics.

https://tools.ietf.org/html/rfc8187
==
Appendix A. Changes from RFC 5987
[...]
o The requirement to support the "ISO-8859-1" encoding was
removed.
==
Christian Weisgerber
2018-11-03 12:41:40 UTC
Permalink
Post by Ruud Harmsen
https://tools.ietf.org/html/rfc8187
==
Appendix A. Changes from RFC 5987
[...]
o The requirement to support the "ISO-8859-1" encoding was
removed.
==
I have no idea what you think a specification for "Indicating
Character Encoding and Language for HTTP Header Field Parameters"
has to do with the topic at hand.
--
Christian "naddy" Weisgerber ***@mips.inka.de
António Marques
2018-11-03 00:43:18 UTC
Permalink
First test didn’t work, it was converted.
áéíóú
António Marques
2018-11-03 00:45:07 UTC
Permalink
So was the second one. Heck.
á a
António Marques
2018-11-03 00:46:20 UTC
Permalink
Post by António Marques
So was the second one. Heck.
á a
Yet this one wasn’t!
ú
António Marques
2018-11-03 00:47:30 UTC
Permalink
Post by António Marques
Post by António Marques
So was the second one. Heck.
á a
Yet this one wasn’t!
ú u
áéíóú u
António Marques
2018-11-03 00:49:02 UTC
Permalink
And what of
ú u
António Marques
2018-11-03 00:49:08 UTC
Permalink
And what of
ú u
António Marques
2018-11-03 00:52:07 UTC
Permalink
Post by António Marques
And what of
ú u
It seems I’m unable to send áéíóú in Latin-1 with this client.
Peter T. Daniels
2018-11-03 02:16:08 UTC
Permalink
Post by António Marques
Post by António Marques
And what of
ú u
It seems I’m unable to send áéíóú in Latin-1 with this client.
accents aigus

so your machine is broken too?
Ruud Harmsen
2018-11-03 06:33:58 UTC
Permalink
Fri, 2 Nov 2018 19:16:08 -0700 (PDT): "Peter T. Daniels"
Post by Peter T. Daniels
Post by António Marques
And what of
ú u
It seems I’m unable to send áéíóú in Latin-1 with this client.
accents aigus
so your machine is broken too?
No, his Usenet program decides to send as UTF-8 in this case.
Peter T. Daniels
2018-11-03 02:15:06 UTC
Permalink
Post by António Marques
First test didn’t work, it was converted.
áéíóú
accents aigus
Loading...